saturn·

natural hazards earthquakes

source /home/coolhand/html/datavis/data_trove/data/natural_hazards/earthquakes.json 3,742 rows 11 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,742 records of significant earthquakes, with numeric measurements (magnitude, depth, latitude, longitude), location text fields, and a date column. Magnitude is tightly clustered between 4.5 and 5.1 (median 4.8) but reaches up to 8.2, producing 184 high-end outliers worth a closer look. Depth is highly skewed (skew 3.07) with a median of 10 km but a max of 248.7 km and 314 outliers, suggesting a mix of shallow and deep events. Geographically, the data is heavily concentrated around Alaska — 'alaska' appears in 1,991 place names, with Canada and Mexico trailing far behind — and longitudes sit firmly in the western hemisphere (median -144.2). Note that 'category' is a single constant value and 'earthquake_type' is 99.9% 'earthquake', so neither adds analytic signal.

citing: row_count · column_count · depth_km · magnitude · latitude · longitude · place · earthquake_type · category

Schema

11 columns
Per-column summary. Click column name to jump to its detail.
Alerts
latitude numeric 0.0% 3,627
longitude numeric 0.0% 3,668
name text 0.0% 3,002
multilingual
description text 0.0% 3,591
near_unique
category categorical 0.0% 1
imbalance
date text 0.0% 3,741
near_unique one_word allcaps short_text
country unknown 0.0%
skipped
magnitude numeric 0.0% 123
depth_km numeric 0.0% 1,505
high_skew outliers
place text 0.0% 3,002
multilingual
earthquake_type categorical 0.0% 3
imbalance

latitude

numeric feature
Geographic latitude coordinates ranging from 20.02 to 69.80, with a mean of 48.53 and median of 52.40. The distribution is moderately left-skewed (-0.76), concentrated in northern mid-to-high latitudes (Q1=41.34, Q3=55.90), suggesting a Europe/North America-heavy sample with no southern hemisphere points. Near-unique values (3627/3742) and zero nulls indicate clean, granular location data. Treatment: Pair with longitude for geospatial features; consider binning or projecting rather than using raw value in linear models. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,627
min
20.02
max
69.8
mean
48.53
median
52.4
std
11.58
q1
41.34
q3
55.9
iqr
14.56
skew
-0.7591
kurtosis
-0.2887
n_outliers
0
outlier_rate
0
zero_rate
0

longitude

numeric feature
Geographic longitude coordinates, all negative and ranging from -169.997 to -65.039, placing every record in the western hemisphere. The distribution is wide (IQR ~34.8°) with a slight right skew (0.449) and only 26 mild outliers (0.69%); near-uniqueness (3668 unique of 3742) suggests point-level locations rather than a small set of sites. Treatment: Pair with latitude as a 2-D geospatial feature; avoid treating as a standalone scalar in models. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,668
min
-170
max
-65.04
mean
-140.1
median
-144.2
std
21.81
q1
-159.9
q3
-125.1
iqr
34.84
skew
0.4489
kurtosis
-0.4302
n_outliers
26
outlier_rate
0.006948
zero_rate
0

name

text metadata multilingual
This column holds short geographic descriptions of event locations, almost certainly earthquake place names (e.g. '104 km SSW of Nikolski, Alaska', 'off the coast of Oregon'), with mean length 29 chars and 6 words. Of 3742 rows, only 3002 are unique and 740 are duplicates (19.8%), with 'off the coast of Oregon' alone repeating 151 times. The language detector flags a multilingual mix but it is overwhelmingly English (3719) with trace de/es/ja/ceb counts likely false positives on place tokens. Treatment: Parse into structured fields (distance, bearing, place) or geocode rather than using the raw string as a feature. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,002
len_min
4
len_max
59
len_mean
29.47
len_median
29
len_p95
36
word_mean
6.293
word_median
6
n_empty
0
n_duplicates
740
duplicate_rate
0.1978
vocab_size
1,036
readability_flesch_mean
69.91
emoji_rate
0
url_rate
0
one_word_rate
0.0005345
allcaps_rate
0
boilerplate_rate
0

description

text free_text near_unique
This is a templated text description of seismic events, with every row mentioning 'earthquake', 'magnitude', and 'depth:' and lengths tightly clustered between 45 and 100 characters (median 72). Despite the formulaic structure, 3591 of 3742 values are unique because the embedded magnitude and depth numbers vary; still, 151 exact duplicates (4.0%) exist. Alaska appears in 1973 rows and a 10km depth in 1216, suggesting the corpus is dominated by shallow Alaskan quakes. Treatment: Parse out magnitude, depth, and region with regex into structured features rather than embedding the raw string. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,591
len_min
45
len_max
100
len_mean
71.71
len_median
72
len_p95
79
word_mean
12.29
word_median
12
n_empty
0
n_duplicates
151
duplicate_rate
0.04035
vocab_size
2,674
readability_flesch_mean
63.23
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

category

categorical metadata imbalance
This column is a constant tag labeling every row as "significant_earthquakes" across all 3742 records, with cardinality of 1 and entropy of 0. It carries no information for modelling or analysis since the top_rate is 1.0 with no nulls. Treatment: Drop; constant column with a single value. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
1
top_value
significant_earthquakes
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

date

text identifier near_unique one_word allcaps short_text
Despite its name, this column holds malformed date-like strings rather than parseable timestamps — every value follows a 'NNNNNNNNNNNNN-01-01' pattern with a 13-digit prefix (likely a millisecond epoch) glued to a literal '-01-01' suffix. Values are 18-19 chars, one token each, and 3741 of 3742 are unique with only one duplicate ('1614452365296-01-01' appears twice). Stored as text, not a date type, so no temporal ordering or range is usable as-is. Treatment: Parse the leading 13-digit epoch as ms-since-epoch into a real timestamp, or treat as a near-unique id and drop. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,741
len_min
18
len_max
19
len_mean
18.95
len_median
19
len_p95
19
word_mean
1
word_median
1
n_empty
0
n_duplicates
1
duplicate_rate
0.0002672
vocab_size
3,741
readability_flesch_mean
121.2
emoji_rate
0
url_rate
0
one_word_rate
1
allcaps_rate
1
boilerplate_rate
0

country

unknown feature skipped
The column is named "country" and has 3742 non-null entries with a null_rate of 0.0, but saturn skipped detailed profiling (kind is "unknown") so cardinality and value distribution are not available. Without n_unique or category stats, we cannot tell whether this is a clean ISO code field, free-text country names, or a near-constant. The lack of any descriptive statistics is itself the main signal. Treatment: Re-profile or manually inspect to determine cardinality before deciding whether to one-hot encode or normalize to ISO codes. low · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique

magnitude

numeric feature
This is an earthquake-style magnitude reading, bounded between 4.5 and 8.2 with a mean of 4.92 and median of 4.8. The distribution is heavily right-skewed (skew 1.97, kurtosis 5.58) with a tight IQR of 0.5 and 184 outliers (4.9%) — consistent with a catalog truncated at magnitude 4.5 where large events are rare but extreme. Treatment: Treat as a right-skewed numeric feature; consider modelling tail events separately rather than transforming, since magnitude is already on a log scale. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
123
min
4.5
max
8.2
mean
4.917
median
4.8
std
0.462
q1
4.6
q3
5.1
iqr
0.5
skew
1.97
kurtosis
5.583
n_outliers
184
outlier_rate
0.04917
zero_rate
0

depth_km

numeric feature high_skew outliers
This is almost certainly the focal depth in kilometres for seismic events, with values ranging from -2.261 to 248.7 and a median of 10.0. The distribution is heavily right-skewed (skew 3.07, kurtosis 11.6) with 314 outliers (8.39% of rows) and a small number of negative depths that warrant inspection. Half the records sit between q1=10.0 and q3=29.1015, suggesting a strong concentration of shallow events with a long deep tail. Treatment: Clip or investigate negative values, then log-transform (e.g., log(depth+c)) before modelling. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
1,505
min
-2.261
max
248.7
mean
23.71
median
10
std
28.79
q1
10
q3
29.1
iqr
19.1
skew
3.072
kurtosis
11.61
n_outliers
314
outlier_rate
0.08391
zero_rate
0.002672

place

text metadata multilingual
Short geographic descriptors of earthquake locations, typically formatted as ' km of , ' (e.g., '104 km SSW of Nikolski, Alaska'), with mean length 29 characters and median 6 words. Alaska dominates (1,991 of 3,742 rows mention it), and a single string 'off the coast of Oregon' repeats 151 times, contributing to a 19.8% duplicate rate across 3,002 unique values. The 'multilingual' alert is driven by a tiny non-English fringe (16 es, 3 de, 1 ja, 1 ceb) against 3,719 English rows, so it is effectively monolingual. Treatment: Parse into structured fields (distance_km, bearing, place_name, region) rather than using the raw string as a categorical. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3,002
len_min
4
len_max
59
len_mean
29.47
len_median
29
len_p95
36
word_mean
6.293
word_median
6
n_empty
0
n_duplicates
740
duplicate_rate
0.1978
vocab_size
1,036
readability_flesch_mean
69.91
emoji_rate
0
url_rate
0
one_word_rate
0.0005345
allcaps_rate
0
boilerplate_rate
0

earthquake_type

categorical label imbalance
This is a categorical event-type label with only 3 distinct values across 3742 rows and no nulls. The distribution is essentially degenerate: 'earthquake' covers 99.92% of rows, leaving just 2 'explosion' and 1 'landslide' records, yielding an entropy ratio of 0.006. The column carries almost no information as-is. Treatment: Drop or collapse to a binary rare-event flag; near-constant for modelling. high · anthropic:claude-opus-4-7
n
3,742
nulls
0 (0.0%)
unique
3
top_value
earthquake
top_rate
0.9992
cardinality
3
entropy
0.01014
entropy_ratio
0.006396