usgs significant earthquakes usgs significant earthquakes
Reading
This dataset contains 3,742 records of significant earthquakes from USGS, with 11 columns covering location (latitude, longitude, place/name), magnitude, depth, and event metadata. Magnitude is tightly clustered between 4.5 and 5.1 (median 4.8) but has a long right tail reaching 8.2, with 184 outliers worth examining for the rare large events. Depth_km is highly skewed (skew 3.07) with a median of 10 km but a max of 248.7 km and 314 outliers, suggesting a mix of shallow and deep-focus quakes. Geographically, the data is heavily concentrated around Alaska — 'alaska' appears in 1,991 place names and 'off the coast of Oregon' alone accounts for 151 records — so this is effectively a North Pacific / Alaska-dominated sample rather than a global one. Note that the `category` column is constant ('significant_earthquakes') and `earthquake_type` is 99.9% 'earthquake', so neither will be useful for segmentation.
citing: row_count · column_count · depth_km · magnitude · latitude · longitude · name · place · earthquake_type · category
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 4.5 – 4.593 | 752 |
| 4.593 – 4.685 | 601 |
| 4.685 – 4.777 | 445 |
| 4.777 – 4.87 | 340 |
| 4.87 – 4.963 | 286 |
| 4.963 – 5.055 | 254 |
| 5.055 – 5.147 | 218 |
| 5.147 – 5.24 | 177 |
| 5.24 – 5.332 | 137 |
| 5.332 – 5.425 | 104 |
| 5.425 – 5.518 | 85 |
| 5.518 – 5.61 | 66 |
| 5.61 – 5.702 | 53 |
| 5.702 – 5.795 | 3 |
| 5.795 – 5.887 | 38 |
| 5.887 – 5.98 | 46 |
| 5.98 – 6.072 | 25 |
| 6.072 – 6.165 | 17 |
| 6.165 – 6.257 | 16 |
| 6.257 – 6.35 | 11 |
| 6.35 – 6.442 | 15 |
| 6.442 – 6.535 | 11 |
| 6.535 – 6.627 | 10 |
| 6.627 – 6.72 | 4 |
| 6.72 – 6.812 | 6 |
| 6.812 – 6.905 | 5 |
| 6.905 – 6.997 | 0 |
| 6.997 – 7.09 | 3 |
| 7.09 – 7.182 | 3 |
| 7.182 – 7.275 | 3 |
| 7.275 – 7.367 | 1 |
| 7.367 – 7.46 | 0 |
| 7.46 – 7.552 | 1 |
| 7.552 – 7.645 | 1 |
| 7.645 – 7.737 | 0 |
| 7.737 – 7.83 | 2 |
| 7.83 – 7.922 | 2 |
| 7.922 – 8.015 | 0 |
| 8.015 – 8.107 | 0 |
| 8.107 – 8.2 | 1 |
Show data table
| bin | count |
|---|---|
| -2.261 – 4.013 | 219 |
| 4.013 – 10.29 | 1730 |
| 10.29 – 16.56 | 370 |
| 16.56 – 22.84 | 258 |
| 22.84 – 29.11 | 230 |
| 29.11 – 35.38 | 250 |
| 35.38 – 41.66 | 167 |
| 41.66 – 47.93 | 129 |
| 47.93 – 54.21 | 56 |
| 54.21 – 60.48 | 31 |
| 60.48 – 66.75 | 43 |
| 66.75 – 73.03 | 27 |
| 73.03 – 79.3 | 19 |
| 79.3 – 85.58 | 29 |
| 85.58 – 91.85 | 21 |
| 91.85 – 98.12 | 19 |
| 98.12 – 104.4 | 24 |
| 104.4 – 110.7 | 12 |
| 110.7 – 116.9 | 9 |
| 116.9 – 123.2 | 14 |
| 123.2 – 129.5 | 13 |
| 129.5 – 135.8 | 19 |
| 135.8 – 142 | 14 |
| 142 – 148.3 | 5 |
| 148.3 – 154.6 | 6 |
| 154.6 – 160.9 | 0 |
| 160.9 – 167.1 | 7 |
| 167.1 – 173.4 | 4 |
| 173.4 – 179.7 | 0 |
| 179.7 – 186 | 1 |
| 186 – 192.2 | 5 |
| 192.2 – 198.5 | 1 |
| 198.5 – 204.8 | 3 |
| 204.8 – 211.1 | 2 |
| 211.1 – 217.3 | 3 |
| 217.3 – 223.6 | 1 |
| 223.6 – 229.9 | 0 |
| 229.9 – 236.2 | 0 |
| 236.2 – 242.4 | 0 |
| 242.4 – 248.7 | 1 |
Show data table
| bin | count |
|---|---|
| 20.02 – 21.26 | 35 |
| 21.26 – 22.51 | 28 |
| 22.51 – 23.75 | 42 |
| 23.75 – 25 | 88 |
| 25 – 26.24 | 91 |
| 26.24 – 27.49 | 35 |
| 27.49 – 28.73 | 43 |
| 28.73 – 29.98 | 39 |
| 29.98 – 31.22 | 52 |
| 31.22 – 32.46 | 83 |
| 32.46 – 33.71 | 52 |
| 33.71 – 34.95 | 26 |
| 34.95 – 36.2 | 67 |
| 36.2 – 37.44 | 43 |
| 37.44 – 38.69 | 54 |
| 38.69 – 39.93 | 22 |
| 39.93 – 41.18 | 131 |
| 41.18 – 42.42 | 52 |
| 42.42 – 43.66 | 96 |
| 43.66 – 44.91 | 177 |
| 44.91 – 46.15 | 4 |
| 46.15 – 47.4 | 7 |
| 47.4 – 48.64 | 34 |
| 48.64 – 49.89 | 117 |
| 49.89 – 51.13 | 151 |
| 51.13 – 52.38 | 288 |
| 52.38 – 53.62 | 423 |
| 53.62 – 54.86 | 349 |
| 54.86 – 56.11 | 223 |
| 56.11 – 57.35 | 201 |
| 57.35 – 58.6 | 75 |
| 58.6 – 59.84 | 104 |
| 59.84 – 61.09 | 116 |
| 61.09 – 62.33 | 117 |
| 62.33 – 63.58 | 120 |
| 63.58 – 64.82 | 33 |
| 64.82 – 66.06 | 35 |
| 66.06 – 67.31 | 25 |
| 67.31 – 68.55 | 29 |
| 68.55 – 69.8 | 35 |
Show data table
| chars | count |
|---|---|
| 4 – 5 | 1 |
| 5 – 7 | 0 |
| 7 – 8 | 1 |
| 8 – 10 | 0 |
| 10 – 11 | 0 |
| 11 – 12 | 0 |
| 12 – 14 | 2 |
| 14 – 15 | 11 |
| 15 – 16 | 40 |
| 16 – 18 | 1 |
| 18 – 19 | 20 |
| 19 – 20 | 19 |
| 20 – 22 | 8 |
| 22 – 23 | 219 |
| 23 – 25 | 60 |
| 25 – 26 | 122 |
| 26 – 27 | 543 |
| 27 – 29 | 499 |
| 29 – 30 | 823 |
| 30 – 32 | 325 |
| 32 – 33 | 362 |
| 33 – 34 | 378 |
| 34 – 36 | 105 |
| 36 – 37 | 40 |
| 37 – 38 | 37 |
| 38 – 40 | 25 |
| 40 – 41 | 34 |
| 41 – 42 | 3 |
| 42 – 44 | 0 |
| 44 – 45 | 15 |
| 45 – 47 | 22 |
| 47 – 48 | 14 |
| 48 – 49 | 3 |
| 49 – 51 | 1 |
| 51 – 52 | 2 |
| 52 – 54 | 5 |
| 54 – 55 | 0 |
| 55 – 56 | 0 |
| 56 – 58 | 0 |
| 58 – 59 | 2 |
Show data table
| value | count | share |
|---|---|---|
| earthquake | 3739 | 99.9% |
| explosion | 2 | 0.1% |
| landslide | 1 | 0.0% |
Schema
11 columns| Alerts | ||||
|---|---|---|---|---|
| latitude | numeric | 0.0% | 3,627 |
|
| longitude | numeric | 0.0% | 3,668 |
|
| name | text | 0.0% | 3,002 |
multilingual
|
| description | text | 0.0% | 3,591 |
near_unique
|
| category | categorical | 0.0% | 1 |
imbalance
|
| date | text | 0.0% | 3,741 |
near_unique
one_word
allcaps
short_text
|
| country | unknown | 0.0% | — |
skipped
|
| magnitude | numeric | 0.0% | 123 |
|
| depth_km | numeric | 0.0% | 1,505 |
high_skew
outliers
|
| place | text | 0.0% | 3,002 |
multilingual
|
| earthquake_type | categorical | 0.0% | 3 |
imbalance
|
latitude
numeric featureGeographic latitude coordinate, with values ranging from 20.02 to 69.7975 and a median of 52.40, consistent with locations spanning roughly the tropics to the Arctic Circle. The distribution is left-skewed (skew -0.76), concentrated in northern mid-to-high latitudes (Q1-Q3: 41.34-55.90), suggesting a Europe/North America bias. No nulls or outliers, and 3627 unique values across 3742 rows indicate near-row-level granularity. Treatment: Pair with longitude for geospatial features; consider binning or clustering rather than treating as a plain scalar.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,627
- min
- 20.02
- max
- 69.8
- mean
- 48.53
- median
- 52.4
- std
- 11.58
- q1
- 41.34
- q3
- 55.9
- iqr
- 14.56
- skew
- -0.7591
- kurtosis
- -0.2887
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
longitude
numeric featureGeographic longitude coordinates, all in the western hemisphere with values ranging from -169.997 to -65.039 (mean -140.10, median -144.22). The distribution is mildly right-skewed (0.45) with 3668 unique values across 3742 rows, suggesting near-distinct location points. The 26 outliers (0.69%) likely correspond to easternmost points well outside the dense western cluster bounded by Q1=-159.93 and Q3=-125.10. Treatment: Pair with latitude for geospatial features (e.g., binning, clustering, or distance computation) rather than treating as a standalone scalar.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,668
- min
- -170
- max
- -65.04
- mean
- -140.1
- median
- -144.2
- std
- 21.81
- q1
- -159.9
- q3
- -125.1
- iqr
- 34.84
- skew
- 0.4489
- kurtosis
- -0.4302
- n_outliers
- 26
- outlier_rate
- 0.006948
- zero_rate
- 0
name
text metadata multilingualThis 'name' column reads as human-readable seismic event location descriptions (e.g., '104 km SSW of Nikolski, Alaska'), dominated by distance-and-bearing phrases — 'km', 'of', and cardinal directions like 'SSW'/'SSE' top the word list. Of 3742 rows there are only 3002 uniques and a 19.8% duplicate rate, with 'off the coast of Oregon' appearing 151 times, suggesting many events share the same place label. Despite a 'multilingual' flag, 3719 of 3742 strings are detected as English with only a handful tagged de/es/ja/ceb — likely false positives on short toponyms rather than real language mixing. Treatment: Parse into structured fields (distance_km, bearing, place) rather than embedding the raw string.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,002
- len_min
- 4
- len_max
- 59
- len_mean
- 29.47
- len_median
- 29
- len_p95
- 36
- word_mean
- 6.293
- word_median
- 6
- n_empty
- 0
- n_duplicates
- 740
- duplicate_rate
- 0.1978
- vocab_size
- 1,036
- readability_flesch_mean
- 69.91
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.0005345
- allcaps_rate
- 0
- boilerplate_rate
- 0
description
text free_text near_uniqueTemplated earthquake event descriptions, e.g. magnitude/depth strings ending in locations like Alaska. Lengths are tightly clustered (min 45, max 100, p95 79) and every row contains 'magnitude', '-', and 'depth:', confirming a fixed format. Despite that template, 3591 of 3742 values are unique with only 151 duplicates (4.0%), so the free-text portion (location, magnitude, depth) carries the signal. Treatment: Parse out magnitude, depth, and location with a regex rather than embedding the templated string.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,591
- len_min
- 45
- len_max
- 100
- len_mean
- 71.71
- len_median
- 72
- len_p95
- 79
- word_mean
- 12.29
- word_median
- 12
- n_empty
- 0
- n_duplicates
- 151
- duplicate_rate
- 0.04035
- vocab_size
- 2,674
- readability_flesch_mean
- 63.23
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
category
categorical metadata imbalanceThis column tags every one of the 3,742 rows with the single value "significant_earthquakes", giving cardinality 1 and entropy 0. It carries no information for modelling and most likely encodes the dataset's source or slice rather than a per-row attribute. Treatment: Drop before modelling; retain only as a dataset-level provenance tag.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- significant_earthquakes
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
date
text identifier near_unique one_word allcaps short_textDespite the name, this column holds 18-19 character single-token strings shaped like '-01-01' rather than parseable dates — every value ends in '-01-01' and the prefix appears to be a 13-digit epoch-style integer. With 3741 unique values across 3742 rows (one duplicate) and a 100% one-word, all-caps rate, it behaves as a near-unique identifier, not a temporal feature. Treatment: Drop as-is or parse the leading numeric prefix into a real timestamp before any time-based use.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,741
- len_min
- 18
- len_max
- 19
- len_mean
- 18.95
- len_median
- 19
- len_p95
- 19
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 1
- duplicate_rate
- 0.0002672
- vocab_size
- 3,741
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
country
unknown metadata skippedThis column is labelled "country" and has 3742 fully populated rows with no nulls, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. Without cardinality or sample values, it is impossible to confirm whether entries are ISO codes, full names, or mixed representations. Treat the absence of stats as the main signal here. Treatment: Re-profile with type detection enabled to confirm cardinality and standardise country representation before use.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- —
magnitude
numeric featureThis is a numeric magnitude field, almost certainly earthquake magnitudes given the floor at 4.5 and ceiling at 8.2. Values cluster tightly (median 4.8, IQR 0.5) but the distribution is heavily right-skewed (skew 1.97, kurtosis 5.58) with 184 outliers (4.9%) in the upper tail. Only 123 unique values across 3742 rows suggests reporting at one-decimal precision. Treatment: Log-transform or bucket into magnitude bands before modelling to tame the right skew.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 123
- min
- 4.5
- max
- 8.2
- mean
- 4.917
- median
- 4.8
- std
- 0.462
- q1
- 4.6
- q3
- 5.1
- iqr
- 0.5
- skew
- 1.97
- kurtosis
- 5.583
- n_outliers
- 184
- outlier_rate
- 0.04917
- zero_rate
- 0
depth_km
numeric feature high_skew outliersNumeric depth measurements in kilometers, almost certainly earthquake hypocenter depths given the median of 10.0 and range from -2.261 to 248.7. The distribution is heavily right-skewed (skew 3.07, kurtosis 11.6) with 314 outliers (8.4%) and a Q1 equal to the median at 10.0, suggesting many events are pinned to a default 10 km depth. Negative minimum (-2.261) likely reflects events above sea level or reference datum. Treatment: Log-transform (after shifting for negatives) and flag the 10 km default-depth pile-up before modelling.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 1,505
- min
- -2.261
- max
- 248.7
- mean
- 23.71
- median
- 10
- std
- 28.79
- q1
- 10
- q3
- 29.1
- iqr
- 19.1
- skew
- 3.072
- kurtosis
- 11.61
- n_outliers
- 314
- outlier_rate
- 0.08391
- zero_rate
- 0.002672
place
text metadata multilingualFree-text place descriptions for seismic events, typically formatted as distance + direction + landmark (e.g. '104 km SSW of Nikolski, Alaska'), averaging 29 characters and 6 words. Alaska dominates with 1991 mentions, followed by Canada (472) and Mexico (412), and 'off the coast of Oregon' alone repeats 151 times, driving a 19.8% duplicate rate across 3002 unique values. The multilingual flag is essentially noise: 3719 of 3742 rows are English, with only a handful of de/es/ja/ceb tags likely from place names misread by the detector. Treatment: Parse into structured fields (distance_km, bearing, reference_place, region) rather than embedding the raw string.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3,002
- len_min
- 4
- len_max
- 59
- len_mean
- 29.47
- len_median
- 29
- len_p95
- 36
- word_mean
- 6.293
- word_median
- 6
- n_empty
- 0
- n_duplicates
- 740
- duplicate_rate
- 0.1978
- vocab_size
- 1,036
- readability_flesch_mean
- 69.91
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.0005345
- allcaps_rate
- 0
- boilerplate_rate
- 0
earthquake_type
categorical feature imbalanceClassifies seismic events into earthquake, explosion, or landslide, but is effectively a constant: 3,739 of 3,742 rows (top_rate 0.9992) are 'earthquake', with only 2 explosions and 1 landslide. Entropy of 0.0101 (entropy_ratio 0.0064) confirms there is virtually no information content here. Treatment: Drop as a predictor, or isolate the 3 non-earthquake rows as anomalies.
- n
- 3,742
- nulls
- 0 (0.0%)
- unique
- 3
- top_value
- earthquake
- top_rate
- 0.9992
- cardinality
- 3
- entropy
- 0.01014
- entropy_ratio
- 0.006396