natural hazards earthquakes
Reading
This dataset contains 3,742 records of significant earthquakes, with numeric measurements (magnitude, depth, latitude, longitude), location text fields, and a date column. Magnitude is tightly clustered between 4.5 and 5.1 (median 4.8) but reaches up to 8.2, producing 184 high-end outliers worth a closer look. Depth is highly skewed (skew 3.07) with a median of 10 km but a max of 248.7 km and 314 outliers, suggesting a mix of shallow and deep events. Geographically, the data is heavily concentrated around Alaska — 'alaska' appears in 1,991 place names, with Canada and Mexico trailing far behind — and longitudes sit firmly in the western hemisphere (median -144.2). Note that 'category' is a single constant value and 'earthquake_type' is 99.9% 'earthquake', so neither adds analytic signal.
citing: row_count · column_count · depth_km · magnitude · latitude · longitude · place · earthquake_type · category
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 4.5 – 4.593 | 752 |
| 4.593 – 4.685 | 601 |
| 4.685 – 4.777 | 445 |
| 4.777 – 4.87 | 340 |
| 4.87 – 4.963 | 286 |
| 4.963 – 5.055 | 254 |
| 5.055 – 5.147 | 218 |
| 5.147 – 5.24 | 177 |
| 5.24 – 5.332 | 137 |
| 5.332 – 5.425 | 104 |
| 5.425 – 5.518 | 85 |
| 5.518 – 5.61 | 66 |
| 5.61 – 5.702 | 53 |
| 5.702 – 5.795 | 3 |
| 5.795 – 5.887 | 38 |
| 5.887 – 5.98 | 46 |
| 5.98 – 6.072 | 25 |
| 6.072 – 6.165 | 17 |
| 6.165 – 6.257 | 16 |
| 6.257 – 6.35 | 11 |
| 6.35 – 6.442 | 15 |
| 6.442 – 6.535 | 11 |
| 6.535 – 6.627 | 10 |
| 6.627 – 6.72 | 4 |
| 6.72 – 6.812 | 6 |
| 6.812 – 6.905 | 5 |
| 6.905 – 6.997 | 0 |
| 6.997 – 7.09 | 3 |
| 7.09 – 7.182 | 3 |
| 7.182 – 7.275 | 3 |
| 7.275 – 7.367 | 1 |
| 7.367 – 7.46 | 0 |
| 7.46 – 7.552 | 1 |
| 7.552 – 7.645 | 1 |
| 7.645 – 7.737 | 0 |
| 7.737 – 7.83 | 2 |
| 7.83 – 7.922 | 2 |
| 7.922 – 8.015 | 0 |
| 8.015 – 8.107 | 0 |
| 8.107 – 8.2 | 1 |
Show data table
| bin | count |
|---|---|
| -2.261 – 4.013 | 219 |
| 4.013 – 10.29 | 1730 |
| 10.29 – 16.56 | 370 |
| 16.56 – 22.84 | 258 |
| 22.84 – 29.11 | 230 |
| 29.11 – 35.38 | 250 |
| 35.38 – 41.66 | 167 |
| 41.66 – 47.93 | 129 |
| 47.93 – 54.21 | 56 |
| 54.21 – 60.48 | 31 |
| 60.48 – 66.75 | 43 |
| 66.75 – 73.03 | 27 |
| 73.03 – 79.3 | 19 |
| 79.3 – 85.58 | 29 |
| 85.58 – 91.85 | 21 |
| 91.85 – 98.12 | 19 |
| 98.12 – 104.4 | 24 |
| 104.4 – 110.7 | 12 |
| 110.7 – 116.9 | 9 |
| 116.9 – 123.2 | 14 |
| 123.2 – 129.5 | 13 |
| 129.5 – 135.8 | 19 |
| 135.8 – 142 | 14 |
| 142 – 148.3 | 5 |
| 148.3 – 154.6 | 6 |
| 154.6 – 160.9 | 0 |
| 160.9 – 167.1 | 7 |
| 167.1 – 173.4 | 4 |
| 173.4 – 179.7 | 0 |
| 179.7 – 186 | 1 |
| 186 – 192.2 | 5 |
| 192.2 – 198.5 | 1 |
| 198.5 – 204.8 | 3 |
| 204.8 – 211.1 | 2 |
| 211.1 – 217.3 | 3 |
| 217.3 – 223.6 | 1 |
| 223.6 – 229.9 | 0 |
| 229.9 – 236.2 | 0 |
| 236.2 – 242.4 | 0 |
| 242.4 – 248.7 | 1 |
Show data table
| bin | count |
|---|---|
| 20.02 – 21.26 | 35 |
| 21.26 – 22.51 | 28 |
| 22.51 – 23.75 | 42 |
| 23.75 – 25 | 88 |
| 25 – 26.24 | 91 |
| 26.24 – 27.49 | 35 |
| 27.49 – 28.73 | 43 |
| 28.73 – 29.98 | 39 |
| 29.98 – 31.22 | 52 |
| 31.22 – 32.46 | 83 |
| 32.46 – 33.71 | 52 |
| 33.71 – 34.95 | 26 |
| 34.95 – 36.2 | 67 |
| 36.2 – 37.44 | 43 |
| 37.44 – 38.69 | 54 |
| 38.69 – 39.93 | 22 |
| 39.93 – 41.18 | 131 |
| 41.18 – 42.42 | 52 |
| 42.42 – 43.66 | 96 |
| 43.66 – 44.91 | 177 |
| 44.91 – 46.15 | 4 |
| 46.15 – 47.4 | 7 |
| 47.4 – 48.64 | 34 |
| 48.64 – 49.89 | 117 |
| 49.89 – 51.13 | 151 |
| 51.13 – 52.38 | 288 |
| 52.38 – 53.62 | 423 |
| 53.62 – 54.86 | 349 |
| 54.86 – 56.11 | 223 |
| 56.11 – 57.35 | 201 |
| 57.35 – 58.6 | 75 |
| 58.6 – 59.84 | 104 |
| 59.84 – 61.09 | 116 |
| 61.09 – 62.33 | 117 |
| 62.33 – 63.58 | 120 |
| 63.58 – 64.82 | 33 |
| 64.82 – 66.06 | 35 |
| 66.06 – 67.31 | 25 |
| 67.31 – 68.55 | 29 |
| 68.55 – 69.8 | 35 |
Show data table
| chars | count |
|---|---|
| 4 – 5 | 1 |
| 5 – 7 | 0 |
| 7 – 8 | 1 |
| 8 – 10 | 0 |
| 10 – 11 | 0 |
| 11 – 12 | 0 |
| 12 – 14 | 2 |
| 14 – 15 | 11 |
| 15 – 16 | 40 |
| 16 – 18 | 1 |
| 18 – 19 | 20 |
| 19 – 20 | 19 |
| 20 – 22 | 8 |
| 22 – 23 | 219 |
| 23 – 25 | 60 |
| 25 – 26 | 122 |
| 26 – 27 | 543 |
| 27 – 29 | 499 |
| 29 – 30 | 823 |
| 30 – 32 | 325 |
| 32 – 33 | 362 |
| 33 – 34 | 378 |
| 34 – 36 | 105 |
| 36 – 37 | 40 |
| 37 – 38 | 37 |
| 38 – 40 | 25 |
| 40 – 41 | 34 |
| 41 – 42 | 3 |
| 42 – 44 | 0 |
| 44 – 45 | 15 |
| 45 – 47 | 22 |
| 47 – 48 | 14 |
| 48 – 49 | 3 |
| 49 – 51 | 1 |
| 51 – 52 | 2 |
| 52 – 54 | 5 |
| 54 – 55 | 0 |
| 55 – 56 | 0 |
| 56 – 58 | 0 |
| 58 – 59 | 2 |
Show data table
| value | count | share |
|---|---|---|
| earthquake | 3739 | 99.9% |
| explosion | 2 | 0.1% |
| landslide | 1 | 0.0% |
Schema
11 columns| Alerts | ||||
|---|---|---|---|---|
| latitude | numeric | 0.0% | 3,627 |
|
| longitude | numeric | 0.0% | 3,668 |
|
| name | text | 0.0% | 3,002 |
multilingual
|
| description | text | 0.0% | 3,591 |
near_unique
|
| category | categorical | 0.0% | 1 |
imbalance
|
| date | text | 0.0% | 3,741 |
near_unique
one_word
allcaps
short_text
|
| country | unknown | 0.0% | — |
skipped
|
| magnitude | numeric | 0.0% | 123 |
|
| depth_km | numeric | 0.0% | 1,505 |
high_skew
outliers
|
| place | text | 0.0% | 3,002 |
multilingual
|
| earthquake_type | categorical | 0.0% | 3 |
imbalance
|
latitude
numeric featureGeographic latitude coordinates ranging from 20.02 to 69.80, with a mean of 48.53 and median of 52.40. The distribution is moderately left-skewed (-0.76), concentrated in northern mid-to-high latitudes (Q1=41.34, Q3=55.90), suggesting a Europe/North America-heavy sample with no southern hemisphere points. Near-unique values (3627/3742) and zero nulls indicate clean, granular location data. Treatment: Pair with longitude for geospatial features; consider binning or projecting rather than using raw value in linear models.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,627
- min
- 20.02
- max
- 69.8
- mean
- 48.53
- median
- 52.4
- std
- 11.58
- q1
- 41.34
- q3
- 55.9
- iqr
- 14.56
- skew
- -0.7591
- kurtosis
- -0.2887
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
longitude
numeric featureGeographic longitude coordinates, all negative and ranging from -169.997 to -65.039, placing every record in the western hemisphere. The distribution is wide (IQR ~34.8°) with a slight right skew (0.449) and only 26 mild outliers (0.69%); near-uniqueness (3668 unique of 3742) suggests point-level locations rather than a small set of sites. Treatment: Pair with latitude as a 2-D geospatial feature; avoid treating as a standalone scalar in models.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,668
- min
- -170
- max
- -65.04
- mean
- -140.1
- median
- -144.2
- std
- 21.81
- q1
- -159.9
- q3
- -125.1
- iqr
- 34.84
- skew
- 0.4489
- kurtosis
- -0.4302
- n_outliers
- 26
- outlier_rate
- 0.006948
- zero_rate
- 0
name
text metadata multilingualThis column holds short geographic descriptions of event locations, almost certainly earthquake place names (e.g. '104 km SSW of Nikolski, Alaska', 'off the coast of Oregon'), with mean length 29 chars and 6 words. Of 3742 rows, only 3002 are unique and 740 are duplicates (19.8%), with 'off the coast of Oregon' alone repeating 151 times. The language detector flags a multilingual mix but it is overwhelmingly English (3719) with trace de/es/ja/ceb counts likely false positives on place tokens. Treatment: Parse into structured fields (distance, bearing, place) or geocode rather than using the raw string as a feature.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,002
- len_min
- 4
- len_max
- 59
- len_mean
- 29.47
- len_median
- 29
- len_p95
- 36
- word_mean
- 6.293
- word_median
- 6
- n_empty
- 0
- n_duplicates
- 740
- duplicate_rate
- 0.1978
- vocab_size
- 1,036
- readability_flesch_mean
- 69.91
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.0005345
- allcaps_rate
- 0
- boilerplate_rate
- 0
description
text free_text near_uniqueThis is a templated text description of seismic events, with every row mentioning 'earthquake', 'magnitude', and 'depth:' and lengths tightly clustered between 45 and 100 characters (median 72). Despite the formulaic structure, 3591 of 3742 values are unique because the embedded magnitude and depth numbers vary; still, 151 exact duplicates (4.0%) exist. Alaska appears in 1973 rows and a 10km depth in 1216, suggesting the corpus is dominated by shallow Alaskan quakes. Treatment: Parse out magnitude, depth, and region with regex into structured features rather than embedding the raw string.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,591
- len_min
- 45
- len_max
- 100
- len_mean
- 71.71
- len_median
- 72
- len_p95
- 79
- word_mean
- 12.29
- word_median
- 12
- n_empty
- 0
- n_duplicates
- 151
- duplicate_rate
- 0.04035
- vocab_size
- 2,674
- readability_flesch_mean
- 63.23
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
category
categorical metadata imbalanceThis column is a constant tag labeling every row as "significant_earthquakes" across all 3742 records, with cardinality of 1 and entropy of 0. It carries no information for modelling or analysis since the top_rate is 1.0 with no nulls. Treatment: Drop; constant column with a single value.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- significant_earthquakes
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
date
text identifier near_unique one_word allcaps short_textDespite its name, this column holds malformed date-like strings rather than parseable timestamps — every value follows a 'NNNNNNNNNNNNN-01-01' pattern with a 13-digit prefix (likely a millisecond epoch) glued to a literal '-01-01' suffix. Values are 18-19 chars, one token each, and 3741 of 3742 are unique with only one duplicate ('1614452365296-01-01' appears twice). Stored as text, not a date type, so no temporal ordering or range is usable as-is. Treatment: Parse the leading 13-digit epoch as ms-since-epoch into a real timestamp, or treat as a near-unique id and drop.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,741
- len_min
- 18
- len_max
- 19
- len_mean
- 18.95
- len_median
- 19
- len_p95
- 19
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 1
- duplicate_rate
- 0.0002672
- vocab_size
- 3,741
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
country
unknown feature skippedThe column is named "country" and has 3742 non-null entries with a null_rate of 0.0, but saturn skipped detailed profiling (kind is "unknown") so cardinality and value distribution are not available. Without n_unique or category stats, we cannot tell whether this is a clean ISO code field, free-text country names, or a near-constant. The lack of any descriptive statistics is itself the main signal. Treatment: Re-profile or manually inspect to determine cardinality before deciding whether to one-hot encode or normalize to ISO codes.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- —
magnitude
numeric featureThis is an earthquake-style magnitude reading, bounded between 4.5 and 8.2 with a mean of 4.92 and median of 4.8. The distribution is heavily right-skewed (skew 1.97, kurtosis 5.58) with a tight IQR of 0.5 and 184 outliers (4.9%) — consistent with a catalog truncated at magnitude 4.5 where large events are rare but extreme. Treatment: Treat as a right-skewed numeric feature; consider modelling tail events separately rather than transforming, since magnitude is already on a log scale.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 123
- min
- 4.5
- max
- 8.2
- mean
- 4.917
- median
- 4.8
- std
- 0.462
- q1
- 4.6
- q3
- 5.1
- iqr
- 0.5
- skew
- 1.97
- kurtosis
- 5.583
- n_outliers
- 184
- outlier_rate
- 0.04917
- zero_rate
- 0
depth_km
numeric feature high_skew outliersThis is almost certainly the focal depth in kilometres for seismic events, with values ranging from -2.261 to 248.7 and a median of 10.0. The distribution is heavily right-skewed (skew 3.07, kurtosis 11.6) with 314 outliers (8.39% of rows) and a small number of negative depths that warrant inspection. Half the records sit between q1=10.0 and q3=29.1015, suggesting a strong concentration of shallow events with a long deep tail. Treatment: Clip or investigate negative values, then log-transform (e.g., log(depth+c)) before modelling.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 1,505
- min
- -2.261
- max
- 248.7
- mean
- 23.71
- median
- 10
- std
- 28.79
- q1
- 10
- q3
- 29.1
- iqr
- 19.1
- skew
- 3.072
- kurtosis
- 11.61
- n_outliers
- 314
- outlier_rate
- 0.08391
- zero_rate
- 0.002672
place
text metadata multilingualShort geographic descriptors of earthquake locations, typically formatted as 'km of , ' (e.g., '104 km SSW of Nikolski, Alaska'), with mean length 29 characters and median 6 words. Alaska dominates (1,991 of 3,742 rows mention it), and a single string 'off the coast of Oregon' repeats 151 times, contributing to a 19.8% duplicate rate across 3,002 unique values. The 'multilingual' alert is driven by a tiny non-English fringe (16 es, 3 de, 1 ja, 1 ceb) against 3,719 English rows, so it is effectively monolingual. Treatment: Parse into structured fields (distance_km, bearing, place_name, region) rather than using the raw string as a categorical.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,002
- len_min
- 4
- len_max
- 59
- len_mean
- 29.47
- len_median
- 29
- len_p95
- 36
- word_mean
- 6.293
- word_median
- 6
- n_empty
- 0
- n_duplicates
- 740
- duplicate_rate
- 0.1978
- vocab_size
- 1,036
- readability_flesch_mean
- 69.91
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.0005345
- allcaps_rate
- 0
- boilerplate_rate
- 0
earthquake_type
categorical label imbalanceThis is a categorical event-type label with only 3 distinct values across 3742 rows and no nulls. The distribution is essentially degenerate: 'earthquake' covers 99.92% of rows, leaving just 2 'explosion' and 1 'landslide' records, yielding an entropy ratio of 0.006. The column carries almost no information as-is. Treatment: Drop or collapse to a binary rare-event flag; near-constant for modelling.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3
- top_value
- earthquake
- top_rate
- 0.9992
- cardinality
- 3
- entropy
- 0.01014
- entropy_ratio
- 0.006396