saturn

/home/coolhand/html/datavis/data_trove/data/wild/usgs_significant_earthquakes.json 3,742 rows sample n=3,742 seed 42 2026-06-22T00:24:00+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/wild/usgs_significant_earthquakes.json
Total rows3,742
Profiled sample3,742
Columns11
Generated2026-06-22T00:24:00+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datetext0.0%
countryunknown0.0%
magnitudenumeric0.0%
depth_kmnumeric0.0%
placetext0.0%
earthquake_typecategorical0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset contains 3,742 records of significant earthquakes catalogued by the USGS, each describing a seismic event with location, magnitude, depth, and type. The vast majority (99.9%) are classified as earthquakes, with just 2 explosions and 1 landslide, so event type is not a useful differentiator. Two things stand out for closer inspection: first, depth_km is heavily right-skewed (median 10 km, mean 23.7 km, max 248.7 km) with 314 outliers, suggesting a small but important subset of unusually deep earthquakes worth isolating. Second, geographic concentration is striking — Alaska dominates the place names (appearing in roughly 1,991 records) and 'off the coast of Oregon' is the single most repeated location (151 times), pointing to a strong Pacific Northwest and Alaskan bias in this 'significant' events catalog. Magnitude ranges from 4.5 to 8.2 with a median of 4.8 and a long upper tail, meaning truly destructive events are rare outliers worth flagging.

date high anthropic:default

This column is named 'date' but contains values that are clearly not valid calendar dates — the year component is a 13-digit Unix timestamp in milliseconds (e.g., '1614452365296') appended with '-01-01', indicating a malformed or incorrectly formatted datetime field. With 3,741 unique values out of 3,742 rows and near-zero duplicates (only 1 duplicate exists), this column functions almost as a row identifier. The one duplicate ('1614452365296-01-01' appearing twice) is the only anomaly in an otherwise fully unique set.

depth_km high anthropic:default

This column represents seismic event or borehole depth in kilometres, almost certainly from an earthquake catalogue or geological survey dataset. The distribution is severely right-skewed (skew 3.07, kurtosis 11.61): the median is just 10.0 km—a canonical default depth assigned when depth is poorly constrained—with Q1 also exactly 10.0 km, suggesting a large fraction of records are pinned to that default value. The tail extends to 248.7 km with 314 outliers (8.4%), and a small negative minimum (−2.261 km) indicates above-sea-level or instrument-offset events that may need review.

category high anthropic:default

This column is a dataset-level category tag indicating the source or filter applied to all 3,742 rows — every single record carries the value 'significant_earthquakes' with a top_rate of 1.0 and entropy of 0.0. It is a constant column with zero discriminative power. The imbalance alert confirms it: cardinality is 1, meaning this field adds no within-dataset information whatsoever.

earthquake_type high anthropic:default

This column classifies seismic event types, with three possible values: 'earthquake', 'explosion', and 'landslide'. It is severely imbalanced: 'earthquake' accounts for 3,739 of 3,742 records (99.92%), leaving only 2 explosions and 1 landslide. The near-zero entropy (0.0101) and entropy_ratio (0.0064) confirm the column carries almost no information variance, which will make it useless as a predictive feature without special handling.

description high anthropic:default

This column contains structured natural-language descriptions of seismic events, likely auto-generated strings of the form 'Earthquake magnitude X - depth: Y km [location]'. The top words confirm every row references 'earthquake', 'magnitude', 'depth:', and 'km', making this a templated field rather than free prose (mean length 71.7 chars, mean 12.3 words, Flesch readability 63.2). Surprising signals: 151 duplicate descriptions (4.0% duplicate rate) despite 3,742 rows, meaning distinct earthquake events share identical text — likely collisions on rounded magnitude/depth/location values. The vocabulary of only 2,674 unique words across 3,742 rows further confirms the highly templated, low-diversity nature of the text.

name high anthropic:default

This column contains geographic location descriptions for seismic events, formatted as place names relative to known landmarks (e.g., '104 km SSW of Nikolski, Alaska'). The dominant region is Alaska, appearing in 1,991 of ~3,742 entries, with Canada and Mexico also frequent. Notably, 740 duplicate values (19.8% duplicate rate) exist despite only 3,002 unique values out of 3,742 rows, driven by repeated location labels like 'off the coast of Oregon' (151 occurrences), suggesting many events cluster in the same geographic zones. A multilingual alert fires, but non-English entries are negligible (21 rows across de/es/ja/ceb vs. 3,719 en), so this is not a practical concern.

place high anthropic:default

This column contains human-readable geographic location descriptions for seismic or oceanographic events, predominantly structured as compass-bearing distance strings (e.g., '104 km SSW of Nikolski, Alaska'). Nearly all 3,742 entries are in English (3,719), with negligible Spanish, German, Cebuano, and Japanese entries flagging a multilingual alert. The dominant location is 'off the coast of Oregon' appearing 151 times — far more than any other value — and 740 duplicate entries (19.8% duplicate rate) suggest event clustering at recurring geographic hotspots. The vocabulary of directional abbreviations (SSW, SSE, SE, W) and 'km' appearing 3,220 times confirms a standardized but free-text geocoding convention.

country low anthropic:default

This column contains country values across 3,742 rows with no nulls. The profiler skipped detailed analysis ('skipped' alert), so cardinality, value distribution, and encoding format (ISO codes vs. full names) are unknown. The absence of any stats prevents assessment of skew, dominant categories, or anomalies.

latitude high anthropic:default

This column contains geographic latitude coordinates, ranging from 20.02° (subtropical) to 69.7975° (Arctic), indicating a dataset spanning locations across the Northern Hemisphere, likely Europe and parts of North America or Asia. The median of 52.4° and mean of 48.5° suggest a concentration of records around central/northern Europe. Near-zero skew (−0.76) and platykurtic distribution (kurtosis −0.29) indicate a relatively flat, spread-out distribution with no extreme clustering. High cardinality (3,627 unique values out of 3,742 rows) confirms these are precise geospatial coordinates rather than bucketed regions.

longitude high anthropic:default

This column contains geographic longitude values, all negative, indicating locations exclusively in the Western Hemisphere. The range spans from approximately -170° (near the International Date Line, consistent with Alaska or Pacific islands) to -65° (eastern US/Caribbean region), with a mean near -140° and median near -144°, suggesting a heavy concentration of observations in Alaska or the North Pacific. The distribution is mildly right-skewed (skew 0.45) with near-platykurtic shape (kurtosis -0.43), implying a broad but relatively uniform spread across this western band rather than a tight geographic cluster. Only 26 outliers (0.69%) are flagged, and with 3,668 unique values out of 3,742 rows, coordinates appear precise and largely non-duplicated.

magnitude high anthropic:default

This column represents seismic event magnitude, almost certainly on the Richter or moment magnitude scale, given the range of 4.5–8.2 and mean of ~4.92. Despite 3,742 records, only 123 unique values appear, indicating the data is reported at one decimal place of precision. The distribution is strongly right-skewed (skew=1.97) with high kurtosis (5.58), meaning the vast majority of events cluster in the 4.6–5.1 IQR while 184 outliers (≈4.9% of records) pull the tail toward the extreme 8.2 maximum — a pattern entirely consistent with earthquake frequency-magnitude relationships (Gutenberg-Richter law).

Numeric correlation

Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
latitudelongitudemagnitudedepth_km
latitude+1.00-0.72-0.12+0.31
longitude-0.72+1.00+0.07-0.37
magnitude-0.12+0.07+1.00-0.03
depth_km+0.31-0.37-0.03+1.00

Languages detected

Per-string language detection across text columns (sampled).

Show data table
Per-language counts (total 7,480 detected strings).
langcountshare
en743899.4%
es320.4%
de60.1%
ja20.0%
ceb20.0%

latitude numeric

rows3,742
null0 (0.0%)
unique3,627
min20.020
max69.797
mean48.529
median52.396
std11.577
q141.336
q355.899
iqr14.563
skew-0.759
kurtosis-0.289
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for latitude (median: 52.395700000000005).
bincount
20.02 – 21.2635
21.26 – 22.5128
22.51 – 23.7542
23.75 – 2588
25 – 26.2491
26.24 – 27.4935
27.49 – 28.7343
28.73 – 29.9839
29.98 – 31.2252
31.22 – 32.4683
32.46 – 33.7152
33.71 – 34.9526
34.95 – 36.267
36.2 – 37.4443
37.44 – 38.6954
38.69 – 39.9322
39.93 – 41.18131
41.18 – 42.4252
42.42 – 43.6696
43.66 – 44.91177
44.91 – 46.154
46.15 – 47.47
47.4 – 48.6434
48.64 – 49.89117
49.89 – 51.13151
51.13 – 52.38288
52.38 – 53.62423
53.62 – 54.86349
54.86 – 56.11223
56.11 – 57.35201
57.35 – 58.675
58.6 – 59.84104
59.84 – 61.09116
61.09 – 62.33117
62.33 – 63.58120
63.58 – 64.8233
64.82 – 66.0635
66.06 – 67.3125
67.31 – 68.5529
68.55 – 69.835

longitude numeric

rows3,742
null0 (0.0%)
unique3,668
min-169.997
max-65.039
mean-140.101
median-144.217
std21.809
q1-159.933
q3-125.097
iqr34.836
skew0.449
kurtosis-0.430
n_outliers26
outlier_rate6.95e-03
zero_rate0.000
Show data table
Histogram bins for longitude (median: -144.21715).
bincount
-170 – -167.4368
-167.4 – -164.7161
-164.7 – -162.1243
-162.1 – -159.5241
-159.5 – -156.9113
-156.9 – -154.3129
-154.3 – -151.6241
-151.6 – -149220
-149 – -146.477
-146.4 – -143.883
-143.8 – -141.129
-141.1 – -138.548
-138.5 – -135.941
-135.9 – -133.334
-133.3 – -130.6129
-130.6 – -128428
-128 – -125.4200
-125.4 – -122.889
-122.8 – -120.144
-120.1 – -117.5130
-117.5 – -114.9149
-114.9 – -112.3113
-112.3 – -109.6163
-109.6 – -107134
-107 – -104.433
-104.4 – -101.812
-101.8 – -99.1510
-99.15 – -96.5320
-96.53 – -93.94
-93.9 – -91.281
-91.28 – -88.652
-88.65 – -86.036
-86.03 – -83.411
-83.41 – -80.782
-80.78 – -78.163
-78.16 – -75.537
-75.53 – -72.918
-72.91 – -70.297
-70.29 – -67.665
-67.66 – -65.0414

name text

6 languages detected in sample
rows3,742
null0 (0.0%)
unique3,002
len_min4
len_max59
len_mean29.466
len_median29.000
len_p9536.000
word_mean6.293
word_median6.000
n_empty0
n_duplicates740
duplicate_rate0.198
vocab_size1,036
readability_flesch_mean69.914
emoji_rate0.000
url_rate0.000
one_word_rate5.34e-04
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592
Sample values (first 10)
  1. 88 km N of Yakutat, Alaska
  2. 100 km SW of Topolobampo, Mexico
  3. Gulf of Alaska
  4. 8 km NNW of Tahoe Vista, California
  5. 88 km SSW of Nikolski, Alaska
  6. 33 km NE of Chignik, Alaska
  7. 204 km W of Port McNeill, Canada
  8. 69 km S of Nikolski, Alaska
  9. 181 km W of Ferndale, California
  10. 47 km NW of Ninilchik, Alaska

description text

96.0% of rows are unique strings
rows3,742
null0 (0.0%)
unique3,591
len_min45
len_max100
len_mean71.712
len_median72.000
len_p9579.000
word_mean12.293
word_median12.000
n_empty0
n_duplicates151
duplicate_rate0.040
vocab_size2,674
readability_flesch_mean63.233
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for description (mean: 71.7116515232496).
charscount
45 – 461
46 – 481
48 – 490
49 – 500
50 – 520
52 – 530
53 – 554
55 – 564
56 – 5719
57 – 5915
59 – 6020
60 – 629
62 – 6317
63 – 64158
64 – 6645
66 – 6773
67 – 68380
68 – 70350
70 – 71726
71 – 72389
72 – 74367
74 – 75605
75 – 77204
77 – 7891
78 – 7990
79 – 8121
81 – 8233
82 – 848
84 – 8525
85 – 8631
86 – 8824
88 – 8910
89 – 907
90 – 921
92 – 933
93 – 945
94 – 961
96 – 972
97 – 991
99 – 1002
Sample values (first 10)
  1. Magnitude 5.8 earthquake - 88 km N of Yakutat, Alaska. Depth: 10km.
  2. Magnitude 6.1 earthquake - 100 km SW of Topolobampo, Mexico. Depth: 9km.
  3. Magnitude 4.6 earthquake - Gulf of Alaska. Depth: 11.9km.
  4. Magnitude 4.77 earthquake - 8 km NNW of Tahoe Vista, California. Depth: -2.261km.
  5. Magnitude 4.6 earthquake - 88 km SSW of Nikolski, Alaska. Depth: 11.7km.
  6. Magnitude 5.5 earthquake - 33 km NE of Chignik, Alaska. Depth: 91.5km.
  7. Magnitude 5.1 earthquake - 204 km W of Port McNeill, Canada. Depth: 8.65km.
  8. Magnitude 5.3 earthquake - 69 km S of Nikolski, Alaska. Depth: 33km.
  9. Magnitude 4.7 earthquake - 181 km W of Ferndale, California. Depth: 10km.
  10. Magnitude 4.5 earthquake - 47 km NW of Ninilchik, Alaska. Depth: 83.8km.

category categorical

top value is 100.0% of rows
rows3,742
null0 (0.0%)
unique1
top_valuesignificant_earthquakes
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for category (1 unique shown, of 1 total).
valuecountshare
significant_earthquakes3742100.0%
Top values (rank 1–20)
  1. significant_earthquakes — 3,742

date text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows3,742
null0 (0.0%)
unique3,741
len_min18
len_max19
len_mean18.952
len_median19.000
len_p9519.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates1
duplicate_rate2.67e-04
vocab_size3,741
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Show data table
Character-length distribution for date (mean: 18.951897381079636).
charscount
18 – 18180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 190
19 – 193562
Sample values (first 10)
  1. 1765155410417-01-01
  2. 1188674062640-01-01
  3. 1079568936020-01-01
  4. 1119811557790-01-01
  5. 1437994466000-01-01
  6. 977445639208-01-01
  7. 1425458107120-01-01
  8. 1041407421320-01-01
  9. 1118819652400-01-01
  10. 1734667402093-01-01

country unknown

no profiler for kind=unknown
rows3,742
null0 (0.0%)

magnitude numeric

rows3,742
null0 (0.0%)
unique123
min4.500
max8.200
mean4.917
median4.800
std0.462
q14.600
q35.100
iqr0.500
skew1.970
kurtosis5.583
n_outliers184
outlier_rate0.049
zero_rate0.000
Show data table
Histogram bins for magnitude (median: 4.8).
bincount
4.5 – 4.593752
4.593 – 4.685601
4.685 – 4.777445
4.777 – 4.87340
4.87 – 4.963286
4.963 – 5.055254
5.055 – 5.147218
5.147 – 5.24177
5.24 – 5.332137
5.332 – 5.425104
5.425 – 5.51885
5.518 – 5.6166
5.61 – 5.70253
5.702 – 5.7953
5.795 – 5.88738
5.887 – 5.9846
5.98 – 6.07225
6.072 – 6.16517
6.165 – 6.25716
6.257 – 6.3511
6.35 – 6.44215
6.442 – 6.53511
6.535 – 6.62710
6.627 – 6.724
6.72 – 6.8126
6.812 – 6.9055
6.905 – 6.9970
6.997 – 7.093
7.09 – 7.1823
7.182 – 7.2753
7.275 – 7.3671
7.367 – 7.460
7.46 – 7.5521
7.552 – 7.6451
7.645 – 7.7370
7.737 – 7.832
7.83 – 7.9222
7.922 – 8.0150
8.015 – 8.1070
8.107 – 8.21

depth_km numeric

skew=+3.07 8.4% rows beyond 1.5 IQR
rows3,742
null0 (0.0%)
unique1,505
min-2.261
max248.700
mean23.712
median10.000
std28.790
q110.000
q329.102
iqr19.102
skew3.072
kurtosis11.611
n_outliers314
outlier_rate0.084
zero_rate2.67e-03
Show data table
Histogram bins for depth_km (median: 10.0).
bincount
-2.261 – 4.013219
4.013 – 10.291730
10.29 – 16.56370
16.56 – 22.84258
22.84 – 29.11230
29.11 – 35.38250
35.38 – 41.66167
41.66 – 47.93129
47.93 – 54.2156
54.21 – 60.4831
60.48 – 66.7543
66.75 – 73.0327
73.03 – 79.319
79.3 – 85.5829
85.58 – 91.8521
91.85 – 98.1219
98.12 – 104.424
104.4 – 110.712
110.7 – 116.99
116.9 – 123.214
123.2 – 129.513
129.5 – 135.819
135.8 – 14214
142 – 148.35
148.3 – 154.66
154.6 – 160.90
160.9 – 167.17
167.1 – 173.44
173.4 – 179.70
179.7 – 1861
186 – 192.25
192.2 – 198.51
198.5 – 204.83
204.8 – 211.12
211.1 – 217.33
217.3 – 223.61
223.6 – 229.90
229.9 – 236.20
236.2 – 242.40
242.4 – 248.71

place text

6 languages detected in sample
rows3,742
null0 (0.0%)
unique3,002
len_min4
len_max59
len_mean29.466
len_median29.000
len_p9536.000
word_mean6.293
word_median6.000
n_empty0
n_duplicates740
duplicate_rate0.198
vocab_size1,036
readability_flesch_mean69.914
emoji_rate0.000
url_rate0.000
one_word_rate5.34e-04
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for place (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592
Sample values (first 10)
  1. 88 km N of Yakutat, Alaska
  2. 100 km SW of Topolobampo, Mexico
  3. Gulf of Alaska
  4. 8 km NNW of Tahoe Vista, California
  5. 88 km SSW of Nikolski, Alaska
  6. 33 km NE of Chignik, Alaska
  7. 204 km W of Port McNeill, Canada
  8. 69 km S of Nikolski, Alaska
  9. 181 km W of Ferndale, California
  10. 47 km NW of Ninilchik, Alaska

earthquake_type categorical

top value is 99.9% of rows
rows3,742
null0 (0.0%)
unique3
top_valueearthquake
top_rate0.999
cardinality3
entropy0.010
entropy_ratio6.40e-03
Show data table
Top values for earthquake_type (3 unique shown, of 3 total).
valuecountshare
earthquake373999.9%
explosion20.1%
landslide10.0%
Top values (rank 1–20)
  1. earthquake — 3,739
  2. explosion — 2
  3. landslide — 1