saturn·

usgs significant earthquakes usgs significant earthquakes

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/usgs-significant-earthquakes/usgs_significant_earthquakes.json

Saturn profiled 3,742 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/usgs-significant-earthquakes/usgs_significant_earthquakes.json",
    "--findings", "usgs-significant-earthquakes-usgs_significant_earthquakes.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,742 records of significant earthquakes from USGS, with 11 columns covering location (latitude, longitude, place/name), magnitude, depth, and event metadata. Magnitude is tightly clustered between 4.5 and 5.1 (median 4.8) but has a long right tail reaching 8.2, with 184 outliers worth examining for the rare large events. Depth_km is highly skewed (skew 3.07) with a median of 10 km but a max of 248.7 km and 314 outliers, suggesting a mix of shallow and deep-focus quakes. Geographically, the data is heavily concentrated around Alaska — 'alaska' appears in 1,991 place names and 'off the coast of Oregon' alone accounts for 151 records — so this is effectively a North Pacific / Alaska-dominated sample rather than a global one. Note that the `category` column is constant ('significant_earthquakes') and `earthquake_type` is 99.9% 'earthquake', so neither will be useful for segmentation.

citing: row_count · column_count · depth_km · magnitude · latitude · longitude · name · place · earthquake_type · category

Out[4]:

saturn.schema() · 11 columns

column kind n null% unique alerts
latitude numeric 3,742 0.0% 3,627
longitude numeric 3,742 0.0% 3,668
name text 3,742 0.0% 3,002 multilingual
description text 3,742 0.0% 3,591 near_unique
category categorical 3,742 0.0% 1 imbalance
date text 3,742 0.0% 3,741 near_unique one_word allcaps short_text
country unknown 3,742 0.0% skipped
magnitude numeric 3,742 0.0% 123
depth_km numeric 3,742 0.0% 1,505 high_skew outliers
place text 3,742 0.0% 3,002 multilingual
earthquake_type categorical 3,742 0.0% 3 imbalance
Fig 1.
magnitude · Most events fall between 4.5 and 5.1; watch the long right tail toward magnitude 8.2.
Show data table
Histogram bins for magnitude (median: 4.8).
bincount
4.5 – 4.593752
4.593 – 4.685601
4.685 – 4.777445
4.777 – 4.87340
4.87 – 4.963286
4.963 – 5.055254
5.055 – 5.147218
5.147 – 5.24177
5.24 – 5.332137
5.332 – 5.425104
5.425 – 5.51885
5.518 – 5.6166
5.61 – 5.70253
5.702 – 5.7953
5.795 – 5.88738
5.887 – 5.9846
5.98 – 6.07225
6.072 – 6.16517
6.165 – 6.25716
6.257 – 6.3511
6.35 – 6.44215
6.442 – 6.53511
6.535 – 6.62710
6.627 – 6.724
6.72 – 6.8126
6.812 – 6.9055
6.905 – 6.9970
6.997 – 7.093
7.09 – 7.1823
7.182 – 7.2753
7.275 – 7.3671
7.367 – 7.460
7.46 – 7.5521
7.552 – 7.6451
7.645 – 7.7370
7.737 – 7.832
7.83 – 7.9222
7.922 – 8.0150
8.015 – 8.1070
8.107 – 8.21
Fig 2.
depth_km · Highly skewed — most quakes are shallow near 10 km, but 314 outliers extend to 248 km deep.
Show data table
Histogram bins for depth_km (median: 10.0).
bincount
-2.261 – 4.013219
4.013 – 10.291730
10.29 – 16.56370
16.56 – 22.84258
22.84 – 29.11230
29.11 – 35.38250
35.38 – 41.66167
41.66 – 47.93129
47.93 – 54.2156
54.21 – 60.4831
60.48 – 66.7543
66.75 – 73.0327
73.03 – 79.319
79.3 – 85.5829
85.58 – 91.8521
91.85 – 98.1219
98.12 – 104.424
104.4 – 110.712
110.7 – 116.99
116.9 – 123.214
123.2 – 129.513
129.5 – 135.819
135.8 – 14214
142 – 148.35
148.3 – 154.66
154.6 – 160.90
160.9 – 167.17
167.1 – 173.44
173.4 – 179.70
179.7 – 1861
186 – 192.25
192.2 – 198.51
198.5 – 204.83
204.8 – 211.12
211.1 – 217.33
217.3 – 223.61
223.6 – 229.90
229.9 – 236.20
236.2 – 242.40
242.4 – 248.71
Fig 3.
latitude · Latitudes cluster between 41° and 56° N, confirming a northern-hemisphere, Alaska-heavy sample.
Show data table
Histogram bins for latitude (median: 52.395700000000005).
bincount
20.02 – 21.2635
21.26 – 22.5128
22.51 – 23.7542
23.75 – 2588
25 – 26.2491
26.24 – 27.4935
27.49 – 28.7343
28.73 – 29.9839
29.98 – 31.2252
31.22 – 32.4683
32.46 – 33.7152
33.71 – 34.9526
34.95 – 36.267
36.2 – 37.4443
37.44 – 38.6954
38.69 – 39.9322
39.93 – 41.18131
41.18 – 42.4252
42.42 – 43.6696
43.66 – 44.91177
44.91 – 46.154
46.15 – 47.47
47.4 – 48.6434
48.64 – 49.89117
49.89 – 51.13151
51.13 – 52.38288
52.38 – 53.62423
53.62 – 54.86349
54.86 – 56.11223
56.11 – 57.35201
57.35 – 58.675
58.6 – 59.84104
59.84 – 61.09116
61.09 – 62.33117
62.33 – 63.58120
63.58 – 64.8233
64.82 – 66.0635
66.06 – 67.3125
67.31 – 68.5529
68.55 – 69.835
Fig 4.
place · Top places are dominated by Alaska and Oregon coast locations, indicating regional concentration.
Show data table
Character-length distribution for place (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592
Fig 5.
earthquake_type · Nearly all records (99.9%) are 'earthquake', with only a handful of explosions and one landslide.
Show data table
Top values for earthquake_type (3 unique shown, of 3 total).
valuecountshare
earthquake373999.9%
explosion20.1%
landslide10.0%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datetext0.0%
countryunknown0.0%
magnitudenumeric0.0%
depth_kmnumeric0.0%
placetext0.0%
earthquake_typecategorical0.0%
Fig 7.
Language mix across all text columns (per-string detection, sampled).
Show data table
Per-language counts (total 7,480 detected strings).
langcountshare
en743899.4%
es320.4%
de60.1%
ja20.0%
ceb20.0%
Fig 8.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
latitudelongitudemagnitudedepth_km
latitude+1.00-0.72-0.12+0.31
longitude-0.72+1.00+0.07-0.37
magnitude-0.12+0.07+1.00-0.03
depth_km+0.31-0.37-0.03+1.00

latitude numeric feature

Geographic latitude coordinate, with values ranging from 20.02 to 69.7975 and a median of 52.40, consistent with locations spanning roughly the tropics to the Arctic Circle. The distribution is left-skewed (skew -0.76), concentrated in northern mid-to-high latitudes (Q1-Q3: 41.34-55.90), suggesting a Europe/North America bias. No nulls or outliers, and 3627 unique values across 3742 rows indicate near-row-level granularity.

Treatment: Pair with longitude for geospatial features; consider binning or clustering rather than treating as a plain scalar.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["latitude"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,627
min 20.02
max 69.8
mean 48.53
median 52.4
std 11.58
q1 41.34
q3 55.9
iqr 14.56
skew -0.7591
kurtosis -0.2887
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 9.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 52.395700000000005).
bincount
20.02 – 21.2635
21.26 – 22.5128
22.51 – 23.7542
23.75 – 2588
25 – 26.2491
26.24 – 27.4935
27.49 – 28.7343
28.73 – 29.9839
29.98 – 31.2252
31.22 – 32.4683
32.46 – 33.7152
33.71 – 34.9526
34.95 – 36.267
36.2 – 37.4443
37.44 – 38.6954
38.69 – 39.9322
39.93 – 41.18131
41.18 – 42.4252
42.42 – 43.6696
43.66 – 44.91177
44.91 – 46.154
46.15 – 47.47
47.4 – 48.6434
48.64 – 49.89117
49.89 – 51.13151
51.13 – 52.38288
52.38 – 53.62423
53.62 – 54.86349
54.86 – 56.11223
56.11 – 57.35201
57.35 – 58.675
58.6 – 59.84104
59.84 – 61.09116
61.09 – 62.33117
62.33 – 63.58120
63.58 – 64.8233
64.82 – 66.0635
66.06 – 67.3125
67.31 – 68.5529
68.55 – 69.835

longitude numeric feature

Geographic longitude coordinates, all in the western hemisphere with values ranging from -169.997 to -65.039 (mean -140.10, median -144.22). The distribution is mildly right-skewed (0.45) with 3668 unique values across 3742 rows, suggesting near-distinct location points. The 26 outliers (0.69%) likely correspond to easternmost points well outside the dense western cluster bounded by Q1=-159.93 and Q3=-125.10.

Treatment: Pair with latitude for geospatial features (e.g., binning, clustering, or distance computation) rather than treating as a standalone scalar.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["longitude"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,668
min -170
max -65.04
mean -140.1
median -144.2
std 21.81
q1 -159.9
q3 -125.1
iqr 34.84
skew 0.4489
kurtosis -0.4302
n_outliers 26
outlier_rate 0.006948
zero_rate 0
Fig 10.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: -144.21715).
bincount
-170 – -167.4368
-167.4 – -164.7161
-164.7 – -162.1243
-162.1 – -159.5241
-159.5 – -156.9113
-156.9 – -154.3129
-154.3 – -151.6241
-151.6 – -149220
-149 – -146.477
-146.4 – -143.883
-143.8 – -141.129
-141.1 – -138.548
-138.5 – -135.941
-135.9 – -133.334
-133.3 – -130.6129
-130.6 – -128428
-128 – -125.4200
-125.4 – -122.889
-122.8 – -120.144
-120.1 – -117.5130
-117.5 – -114.9149
-114.9 – -112.3113
-112.3 – -109.6163
-109.6 – -107134
-107 – -104.433
-104.4 – -101.812
-101.8 – -99.1510
-99.15 – -96.5320
-96.53 – -93.94
-93.9 – -91.281
-91.28 – -88.652
-88.65 – -86.036
-86.03 – -83.411
-83.41 – -80.782
-80.78 – -78.163
-78.16 – -75.537
-75.53 – -72.918
-72.91 – -70.297
-70.29 – -67.665
-67.66 – -65.0414

name text metadata

This 'name' column reads as human-readable seismic event location descriptions (e.g., '104 km SSW of Nikolski, Alaska'), dominated by distance-and-bearing phrases — 'km', 'of', and cardinal directions like 'SSW'/'SSE' top the word list. Of 3742 rows there are only 3002 uniques and a 19.8% duplicate rate, with 'off the coast of Oregon' appearing 151 times, suggesting many events share the same place label. Despite a 'multilingual' flag, 3719 of 3742 strings are detected as English with only a handful tagged de/es/ja/ceb — likely false positives on short toponyms rather than real language mixing.

Treatment: Parse into structured fields (distance_km, bearing, place) rather than embedding the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[20]:

saturn.columns["name"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,002
len_min 4
len_max 59
len_mean 29.47
len_median 29
len_p95 36
word_mean 6.293
word_median 6
n_empty 0
n_duplicates 740
duplicate_rate 0.1978
vocab_size 1,036
readability_flesch_mean 69.91
emoji_rate 0
url_rate 0
one_word_rate 0.0005345
allcaps_rate 0
boilerplate_rate 0
alert: multilingual6 languages detected in sample
Fig 11.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592

description text free_text

Templated earthquake event descriptions, e.g. magnitude/depth strings ending in locations like Alaska. Lengths are tightly clustered (min 45, max 100, p95 79) and every row contains 'magnitude', '-', and 'depth:', confirming a fixed format. Despite that template, 3591 of 3742 values are unique with only 151 duplicates (4.0%), so the free-text portion (location, magnitude, depth) carries the signal.

Treatment: Parse out magnitude, depth, and location with a regex rather than embedding the templated string.

anthropic:claude-opus-4-7 · confidence high
Out[23]:

saturn.columns["description"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,591
len_min 45
len_max 100
len_mean 71.71
len_median 72
len_p95 79
word_mean 12.29
word_median 12
n_empty 0
n_duplicates 151
duplicate_rate 0.04035
vocab_size 2,674
readability_flesch_mean 63.23
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique96.0% of rows are unique strings
Fig 12.
Character-length distribution for description.
Show data table
Character-length distribution for description (mean: 71.7116515232496).
charscount
45 – 461
46 – 481
48 – 490
49 – 500
50 – 520
52 – 530
53 – 554
55 – 564
56 – 5719
57 – 5915
59 – 6020
60 – 629
62 – 6317
63 – 64158
64 – 6645
66 – 6773
67 – 68380
68 – 70350
70 – 71726
71 – 72389
72 – 74367
74 – 75605
75 – 77204
77 – 7891
78 – 7990
79 – 8121
81 – 8233
82 – 848
84 – 8525
85 – 8631
86 – 8824
88 – 8910
89 – 907
90 – 921
92 – 933
93 – 945
94 – 961
96 – 972
97 – 991
99 – 1002

category categorical metadata

This column tags every one of the 3,742 rows with the single value "significant_earthquakes", giving cardinality 1 and entropy 0. It carries no information for modelling and most likely encodes the dataset's source or slice rather than a per-row attribute.

Treatment: Drop before modelling; retain only as a dataset-level provenance tag.

anthropic:claude-opus-4-7 · confidence high
Out[26]:

saturn.columns["category"].stats

statvalue
n3,742
nulls0 (0.0%)
unique1
top_value significant_earthquakes
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 13.
Top values for category.
Show data table
Top values for category (1 unique shown, of 1 total).
valuecountshare
significant_earthquakes3742100.0%

date text identifier

Despite the name, this column holds 18-19 character single-token strings shaped like '-01-01' rather than parseable dates — every value ends in '-01-01' and the prefix appears to be a 13-digit epoch-style integer. With 3741 unique values across 3742 rows (one duplicate) and a 100% one-word, all-caps rate, it behaves as a near-unique identifier, not a temporal feature.

Treatment: Drop as-is or parse the leading numeric prefix into a real timestamp before any time-based use.

anthropic:claude-opus-4-7 · confidence high
Out[29]:

saturn.columns["date"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,741
len_min 18
len_max 19
len_mean 18.95
len_median 19
len_p95 19
word_mean 1
word_median 1
n_empty 0
n_duplicates 1
duplicate_rate 0.0002672
vocab_size 3,741
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 14.
Character-length distribution for date.
Show data table
Character-length distribution for date (mean: 18.951897381079636).
charscount
18 – 18180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 193562

country unknown metadata

This column is labelled "country" and has 3742 fully populated rows with no nulls, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. Without cardinality or sample values, it is impossible to confirm whether entries are ISO codes, full names, or mixed representations. Treat the absence of stats as the main signal here.

Treatment: Re-profile with type detection enabled to confirm cardinality and standardise country representation before use.

anthropic:claude-opus-4-7 · confidence low
Out[32]:

saturn.columns["country"].stats

statvalue
n3,742
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

magnitude numeric feature

This is a numeric magnitude field, almost certainly earthquake magnitudes given the floor at 4.5 and ceiling at 8.2. Values cluster tightly (median 4.8, IQR 0.5) but the distribution is heavily right-skewed (skew 1.97, kurtosis 5.58) with 184 outliers (4.9%) in the upper tail. Only 123 unique values across 3742 rows suggests reporting at one-decimal precision.

Treatment: Log-transform or bucket into magnitude bands before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["magnitude"].stats

statvalue
n3,742
nulls0 (0.0%)
unique123
min 4.5
max 8.2
mean 4.917
median 4.8
std 0.462
q1 4.6
q3 5.1
iqr 0.5
skew 1.97
kurtosis 5.583
n_outliers 184
outlier_rate 0.04917
zero_rate 0
Fig 15.
Distribution of magnitude. Vertical dash marks the median.
Show data table
Histogram bins for magnitude (median: 4.8).
bincount
4.5 – 4.593752
4.593 – 4.685601
4.685 – 4.777445
4.777 – 4.87340
4.87 – 4.963286
4.963 – 5.055254
5.055 – 5.147218
5.147 – 5.24177
5.24 – 5.332137
5.332 – 5.425104
5.425 – 5.51885
5.518 – 5.6166
5.61 – 5.70253
5.702 – 5.7953
5.795 – 5.88738
5.887 – 5.9846
5.98 – 6.07225
6.072 – 6.16517
6.165 – 6.25716
6.257 – 6.3511
6.35 – 6.44215
6.442 – 6.53511
6.535 – 6.62710
6.627 – 6.724
6.72 – 6.8126
6.812 – 6.9055
6.905 – 6.9970
6.997 – 7.093
7.09 – 7.1823
7.182 – 7.2753
7.275 – 7.3671
7.367 – 7.460
7.46 – 7.5521
7.552 – 7.6451
7.645 – 7.7370
7.737 – 7.832
7.83 – 7.9222
7.922 – 8.0150
8.015 – 8.1070
8.107 – 8.21

depth_km numeric feature

Numeric depth measurements in kilometers, almost certainly earthquake hypocenter depths given the median of 10.0 and range from -2.261 to 248.7. The distribution is heavily right-skewed (skew 3.07, kurtosis 11.6) with 314 outliers (8.4%) and a Q1 equal to the median at 10.0, suggesting many events are pinned to a default 10 km depth. Negative minimum (-2.261) likely reflects events above sea level or reference datum.

Treatment: Log-transform (after shifting for negatives) and flag the 10 km default-depth pile-up before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["depth_km"].stats

statvalue
n3,742
nulls0 (0.0%)
unique1,505
min -2.261
max 248.7
mean 23.71
median 10
std 28.79
q1 10
q3 29.1
iqr 19.1
skew 3.072
kurtosis 11.61
n_outliers 314
outlier_rate 0.08391
zero_rate 0.002672
alert: high_skewskew=+3.07
alert: outliers8.4% rows beyond 1.5 IQR
Fig 16.
Distribution of depth_km. Vertical dash marks the median.
Show data table
Histogram bins for depth_km (median: 10.0).
bincount
-2.261 – 4.013219
4.013 – 10.291730
10.29 – 16.56370
16.56 – 22.84258
22.84 – 29.11230
29.11 – 35.38250
35.38 – 41.66167
41.66 – 47.93129
47.93 – 54.2156
54.21 – 60.4831
60.48 – 66.7543
66.75 – 73.0327
73.03 – 79.319
79.3 – 85.5829
85.58 – 91.8521
91.85 – 98.1219
98.12 – 104.424
104.4 – 110.712
110.7 – 116.99
116.9 – 123.214
123.2 – 129.513
129.5 – 135.819
135.8 – 14214
142 – 148.35
148.3 – 154.66
154.6 – 160.90
160.9 – 167.17
167.1 – 173.44
173.4 – 179.70
179.7 – 1861
186 – 192.25
192.2 – 198.51
198.5 – 204.83
204.8 – 211.12
211.1 – 217.33
217.3 – 223.61
223.6 – 229.90
229.9 – 236.20
236.2 – 242.40
242.4 – 248.71

place text metadata

Free-text place descriptions for seismic events, typically formatted as distance + direction + landmark (e.g. '104 km SSW of Nikolski, Alaska'), averaging 29 characters and 6 words. Alaska dominates with 1991 mentions, followed by Canada (472) and Mexico (412), and 'off the coast of Oregon' alone repeats 151 times, driving a 19.8% duplicate rate across 3002 unique values. The multilingual flag is essentially noise: 3719 of 3742 rows are English, with only a handful of de/es/ja/ceb tags likely from place names misread by the detector.

Treatment: Parse into structured fields (distance_km, bearing, reference_place, region) rather than embedding the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["place"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3,002
len_min 4
len_max 59
len_mean 29.47
len_median 29
len_p95 36
word_mean 6.293
word_median 6
n_empty 0
n_duplicates 740
duplicate_rate 0.1978
vocab_size 1,036
readability_flesch_mean 69.91
emoji_rate 0
url_rate 0
one_word_rate 0.0005345
allcaps_rate 0
boilerplate_rate 0
alert: multilingual6 languages detected in sample
Fig 17.
Character-length distribution for place.
Show data table
Character-length distribution for place (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592

earthquake_type categorical feature

Classifies seismic events into earthquake, explosion, or landslide, but is effectively a constant: 3,739 of 3,742 rows (top_rate 0.9992) are 'earthquake', with only 2 explosions and 1 landslide. Entropy of 0.0101 (entropy_ratio 0.0064) confirms there is virtually no information content here.

Treatment: Drop as a predictor, or isolate the 3 non-earthquake rows as anomalies.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["earthquake_type"].stats

statvalue
n3,742
nulls0 (0.0%)
unique3
top_value earthquake
top_rate 0.9992
cardinality 3
entropy 0.01014
entropy_ratio 0.006396
alert: imbalancetop value is 99.9% of rows
Fig 18.
Top values for earthquake_type.
Show data table
Top values for earthquake_type (3 unique shown, of 3 total).
valuecountshare
earthquake373999.9%
explosion20.1%
landslide10.0%

How to cite

click to copy

BibTeX
@misc{saturn-usgs-significant-earthquakes-usgs-significant-earthquakes-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: usgs significant earthquakes usgs significant earthquakes},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/usgs-significant-earthquakes-usgs_significant_earthquakes}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: usgs significant earthquakes usgs significant earthquakes. Source: /home/coolhand/datasets/usgs-significant-earthquakes/usgs_significant_earthquakes.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/usgs-significant-earthquakes-usgs_significant_earthquakes