saturn·

natural hazards storms

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/natural_hazards/storms.json

Saturn profiled 14,770 rows across 14 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/natural_hazards/storms.json",
    "--findings", "natural_hazards-storms.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 14,770 records of significant U.S. storm events sourced entirely from the NOAA Storm Events Database, with each row describing a weather incident's location, type, magnitude, and damages. Tornadoes dominate the event mix at roughly 43% of records, followed by Flash Floods and Thunderstorm Winds, so the event_type distribution is the first thing to inspect. Geographically the data skews toward Texas (1,450 events) and other tornado-belt states like Missouri, Arkansas, and Mississippi, which is worth confirming on the latitude/longitude spread. Two caveats deserve attention: the magnitude field is missing for 51.8% of rows, and category/country/source are constants (single value) so they carry no analytical signal. Fatalities and injuries are heavily zero-inflated (about 69% and 68% zeros respectively), meaning summary stats will be driven by a small tail of severe events.

citing: row_count · column_count · columns.event_type.top_values · columns.event_type.stats.top_rate · columns.state.top_values · columns.magnitude.null_rate · columns.fatalities.stats.top_rate · columns.injuries.stats.top_rate · columns.country.stats.top_value · columns.source.stats.top_value · columns.latitude.stats · columns.longitude.stats

Out[4]:

saturn.schema() · 14 columns

column kind n null% unique alerts
latitude numeric 14,770 0.0% 7,810
longitude numeric 14,770 0.0% 8,828
name text 14,770 0.0% 6,660 multilingual duplicates
description text 14,770 0.0% 5,796 multilingual duplicates
category categorical 14,770 0.0% 1 imbalance
date text 14,770 0.0% 5,058 one_word allcaps short_text duplicates
country categorical 14,770 0.0% 1 imbalance
event_type categorical 14,770 0.0% 17
state categorical 14,770 0.0% 65
magnitude categorical 14,770 51.8% 170 null_rate
injuries categorical 14,770 0.0% 178
fatalities categorical 14,770 0.0% 49
damage_property text 14,770 0.0% 1,014 one_word allcaps short_text duplicates
source categorical 14,770 0.0% 1 imbalance
Fig 1.
event_type · Tornadoes account for ~43% of events; check how steeply the distribution drops to Flash Flood and Thunderstorm Wind.
Show data table
Top values for event_type (17 unique shown, of 17 total).
valuecountshare
Tornado633442.9%
Flash Flood235816.0%
Thunderstorm Wind225715.3%
Flood177712.0%
Hail12468.4%
Lightning5743.9%
Heavy Rain990.7%
Marine Strong Wind430.3%
Debris Flow430.3%
Marine Thunderstorm Wind250.2%
Marine High Wind50.0%
Dust Devil30.0%
Waterspout20.0%
Tropical Storm10.0%
High Wind10.0%
Heat10.0%
Marine Lightning10.0%
Fig 2.
state · Texas leads at nearly 10% of records — see how concentrated events are in the southern/midwestern tornado belt.
Show data table
Top values for state (20 unique shown, of 65 total).
valuecountshare
TEXAS14509.8%
MISSOURI6484.4%
ARKANSAS6024.1%
MISSISSIPPI5703.9%
GEORGIA5623.8%
ILLINOIS5603.8%
IOWA5273.6%
LOUISIANA5073.4%
TENNESSEE4993.4%
FLORIDA4983.4%
OKLAHOMA4903.3%
NEBRASKA4863.3%
ALABAMA4693.2%
WISCONSIN4633.1%
OHIO4413.0%
MICHIGAN4262.9%
NORTH CAROLINA4222.9%
KANSAS4182.8%
INDIANA4082.8%
KENTUCKY3832.6%
Fig 3.
fatalities · Heavy zero-inflation (69% are 0 fatalities) means the tail of deadly events is what matters analytically.
Show data table
Top values for fatalities (20 unique shown, of 49 total).
valuecountshare
01020969.1%
1320821.7%
26494.4%
32221.5%
41120.8%
5740.5%
6660.4%
7380.3%
9250.2%
10240.2%
8210.1%
11200.1%
13110.1%
16100.1%
1290.1%
1480.1%
1760.0%
2060.0%
2540.0%
2330.0%
Fig 4.
damage_property · Damage values cluster on a few round figures like 2.5M and 1.00M, hinting at coded or estimated reporting rather than precise amounts.
Show data table
Character-length distribution for damage_property (mean: 4.380568720379147).
charscount
0 – 0368
0 – 00
0 – 10
1 – 10
1 – 10
1 – 1264
1 – 10
1 – 20
2 – 20
2 – 20
2 – 21252
2 – 20
2 – 30
3 – 30
3 – 30
3 – 31172
3 – 30
3 – 40
4 – 40
4 – 40
4 – 43414
4 – 40
4 – 50
5 – 50
5 – 50
5 – 56075
5 – 50
5 – 60
6 – 60
6 – 60
6 – 61450
6 – 60
6 – 70
7 – 70
7 – 70
7 – 7514
7 – 70
7 – 80
8 – 80
8 – 8261
Fig 5.
latitude · Most events fall between 33° and 41° N, with a few outliers extending toward Alaska and the tropics.
Show data table
Histogram bins for latitude (median: 37.12).
bincount
-14.32 – -12.213
-12.21 – -10.10
-10.1 – -7.990
-7.99 – -5.8790
-5.879 – -3.7670
-3.767 – -1.6560
-1.656 – 0.45520
0.4552 – 2.5660
2.566 – 4.6780
4.678 – 6.7890
6.789 – 8.92
8.9 – 11.010
11.01 – 13.120
13.12 – 15.232
15.23 – 17.350
17.35 – 19.4675
19.46 – 21.5719
21.57 – 23.6810
23.68 – 25.7922
25.79 – 27.9270
27.9 – 30.01522
30.01 – 32.121240
32.12 – 34.242165
34.24 – 36.352333
36.35 – 38.461803
38.46 – 40.571901
40.57 – 42.682226
42.68 – 44.791382
44.79 – 46.9515
46.9 – 49.01232
49.01 – 51.130
51.13 – 53.240
53.24 – 55.350
55.35 – 57.465
57.46 – 59.576
59.57 – 61.6815
61.68 – 63.7911
63.79 – 65.98
65.9 – 68.022
68.02 – 70.131
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datetext0.0%
countrycategorical0.0%
event_typecategorical0.0%
statecategorical0.0%
magnitudecategorical51.8%
injuriescategorical0.0%
fatalitiescategorical0.0%
damage_propertytext0.0%
sourcecategorical0.0%
Fig 7.
Language mix across all text columns (per-string detection, sampled).
Show data table
Per-language counts (total 10,000 detected strings).
langcountshare
en978097.8%
es1341.3%
de250.2%
ja230.2%
no100.1%
id60.1%
fr60.1%
it50.1%
pt40.0%
sr20.0%
ru20.0%
eu20.0%
zh10.0%
Fig 8.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
latitudelongitude
latitude+1.00-0.31
longitude-0.31+1.00

latitude numeric feature

This column holds geographic latitudes, ranging from -14.3236 to 70.1269 with a mean of 37.28 and median of 37.12, consistent with degrees north/south. The distribution is tightly clustered (IQR 7.50, std 5.25) around the mid-30s to low-40s, suggesting most records sit in northern temperate zones. Skew is mild (-0.18) but kurtosis of 3.34 plus 159 outliers (1.08%) hints at a long southern tail extending into the southern hemisphere.

Treatment: Pair with longitude as a geospatial feature; consider binning or projecting rather than treating as a plain scalar.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["latitude"].stats

statvalue
n14,770
nulls0 (0.0%)
unique7,810
min -14.32
max 70.13
mean 37.28
median 37.12
std 5.247
q1 33.63
q3 41.13
iqr 7.499
skew -0.1787
kurtosis 3.341
n_outliers 159
outlier_rate 0.01077
zero_rate 0
Fig 9.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 37.12).
bincount
-14.32 – -12.213
-12.21 – -10.10
-10.1 – -7.990
-7.99 – -5.8790
-5.879 – -3.7670
-3.767 – -1.6560
-1.656 – 0.45520
0.4552 – 2.5660
2.566 – 4.6780
4.678 – 6.7890
6.789 – 8.92
8.9 – 11.010
11.01 – 13.120
13.12 – 15.232
15.23 – 17.350
17.35 – 19.4675
19.46 – 21.5719
21.57 – 23.6810
23.68 – 25.7922
25.79 – 27.9270
27.9 – 30.01522
30.01 – 32.121240
32.12 – 34.242165
34.24 – 36.352333
36.35 – 38.461803
38.46 – 40.571901
40.57 – 42.682226
42.68 – 44.791382
44.79 – 46.9515
46.9 – 49.01232
49.01 – 51.130
51.13 – 53.240
53.24 – 55.350
55.35 – 57.465
57.46 – 59.576
59.57 – 61.6815
61.68 – 63.7911
63.79 – 65.98
65.9 – 68.022
68.02 – 70.131

longitude numeric feature

Geographic longitude coordinates, with 8828 unique values across 14770 rows and no nulls. Values span -170.73 to 171.47 (full global range), but the distribution is tightly concentrated around a median of -90.22 with an IQR of just 12.17, indicating most records cluster in the Americas. Heavy kurtosis (55.6) and 623 outliers (4.2%) reflect a small set of points scattered far from this North/Central American core.

Treatment: Pair with latitude for geospatial features; do not standardize as a plain scalar.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["longitude"].stats

statvalue
n14,770
nulls0 (0.0%)
unique8,828
min -170.7
max 171.5
mean -90.94
median -90.22
std 11.7
q1 -96.4
q3 -84.23
iqr 12.17
skew 1.286
kurtosis 55.61
n_outliers 623
outlier_rate 0.04218
zero_rate 0
Fig 10.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: -90.22).
bincount
-170.7 – -162.24
-162.2 – -153.633
-153.6 – -145.128
-145.1 – -136.54
-136.5 – -12811
-128 – -119.4305
-119.4 – -110.8466
-110.8 – -102.3544
-102.3 – -93.743693
-93.74 – -85.185377
-85.18 – -76.633281
-76.63 – -68.07941
-68.07 – -59.5279
-59.52 – -50.960
-50.96 – -42.410
-42.41 – -33.850
-33.85 – -25.30
-25.3 – -16.740
-16.74 – -8.1860
-8.186 – 0.36870
0.3687 – 8.9240
8.924 – 17.480
17.48 – 26.030
26.03 – 34.590
34.59 – 43.140
43.14 – 51.70
51.7 – 60.250
60.25 – 68.810
68.81 – 77.360
77.36 – 85.920
85.92 – 94.470
94.47 – 1030
103 – 111.60
111.6 – 120.10
120.1 – 128.70
128.7 – 137.20
137.2 – 145.82
145.8 – 154.41
154.4 – 162.90
162.9 – 171.51

name text label

This column holds short structured event labels of the form ' in , ' describing US severe weather incidents (tornadoes, floods, hail, thunderstorm winds). It is highly repetitive: 8110 of 14770 rows are duplicates (54.9% duplicate_rate) with only 6660 unique values, and 'Hail in TEXAS, TARRANT' alone appears 59 times. The 'multilingual' alert is largely a false positive driven by place names — 4796 rows classify as English versus tiny counts like 134 Spanish and 25 German.

Treatment: Parse into structured fields (event_type, state, county) rather than treating as free text.

anthropic:claude-opus-4-7 · confidence high
Out[20]:

saturn.columns["name"].stats

statvalue
n14,770
nulls0 (0.0%)
unique6,660
len_min 17
len_max 134
len_mean 30.22
len_median 29
len_p95 41
word_mean 4.588
word_median 4
n_empty 0
n_duplicates 8,110
duplicate_rate 0.5491
vocab_size 1,980
readability_flesch_mean 31.16
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: multilingual13 languages detected in sample
alert: duplicates54.9% duplicate strings
Fig 11.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 30.219160460392686).
charscount
17 – 2050
20 – 23793
23 – 262165
26 – 293842
29 – 322969
32 – 351813
35 – 371442
37 – 40915
40 – 43493
43 – 46153
46 – 4945
49 – 524
52 – 553
55 – 584
58 – 616
61 – 649
64 – 675
67 – 707
70 – 733
73 – 7610
76 – 787
78 – 815
81 – 846
84 – 873
87 – 903
90 – 933
93 – 961
96 – 995
99 – 1023
102 – 1050
105 – 1080
108 – 1110
111 – 1140
114 – 1160
116 – 1191
119 – 1220
122 – 1250
125 – 1280
128 – 1311
131 – 1341

description text metadata

This is a templated event-summary field describing storm or disaster impacts (magnitude, injuries, fatalities, property damage in dollars), not free-form prose. With 14,770 rows but only 5,796 unique values and a 60.8% duplicate rate, the text is highly formulaic — the single string 'Magnitude 0; $2.5M property damage' alone appears 1,055 times. Language detection flags 5 French, 1 Japanese, and 10 Norwegian rows against 4,984 English, almost certainly false positives from numeric/currency tokens rather than real multilingual content; mean Flesch of 29.9 reflects the terse template, not difficult prose.

Treatment: Parse with regex into structured fields (magnitude, injuries, fatalities, damage_usd) rather than treating as free text.

anthropic:claude-opus-4-7 · confidence high
Out[23]:

saturn.columns["description"].stats

statvalue
n14,770
nulls0 (0.0%)
unique5,796
len_min 3
len_max 259
len_mean 50.09
len_median 36
len_p95 166
word_mean 7.393
word_median 5
n_empty 0
n_duplicates 8,974
duplicate_rate 0.6076
vocab_size 4,289
readability_flesch_mean 29.86
emoji_rate 0
url_rate 0
one_word_rate 0.0002708
allcaps_rate 0.0002708
boilerplate_rate 0
alert: multilingual5 languages detected in sample
alert: duplicates60.8% duplicate strings
Fig 12.
Character-length distribution for description.
Show data table
Character-length distribution for description (mean: 50.085240352065).
charscount
3 – 94
9 – 1613
16 – 223101
22 – 291689
29 – 352230
35 – 411186
41 – 48572
48 – 541589
54 – 612168
61 – 67902
67 – 737
73 – 806
80 – 8614
86 – 9310
93 – 9915
99 – 10525
105 – 11230
112 – 11830
118 – 12534
125 – 13147
131 – 13771
137 – 14464
144 – 15052
150 – 15777
157 – 16360
163 – 16974
169 – 17666
176 – 18265
182 – 18954
189 – 19569
195 – 20182
201 – 20870
208 – 21472
214 – 22178
221 – 22764
227 – 23329
233 – 24020
240 – 24611
246 – 25312
253 – 2598

category categorical metadata

This column is a constant categorical tag identifying the dataset source: every one of the 14770 rows holds the single value "significant_us_storms". With cardinality 1, entropy 0, and top_rate 1.0, it carries no information for modelling and only serves as a provenance marker.

Treatment: Drop before modelling; retain only as a source tag if merging with other datasets.

anthropic:claude-opus-4-7 · confidence high
Out[26]:

saturn.columns["category"].stats

statvalue
n14,770
nulls0 (0.0%)
unique1
top_value significant_us_storms
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 13.
Top values for category.
Show data table
Top values for category (1 unique shown, of 1 total).
valuecountshare
significant_us_storms14770100.0%

date text timestamp

This is a date column stored as ISO-formatted text (YYYY-MM-DD), with every one of 14770 rows exactly 10 characters and a single token. Values span at least 1965 to 2021, and duplicates are heavy: 9712 rows (65.8%) repeat, with 1974-04-03 appearing 126 times and 2011-04-27 appearing 105 times, suggesting clustering around specific event dates rather than a uniform timeline. The 'allcaps' flag is a false positive since digits and hyphens trigger it.

Treatment: parse to date dtype and use for temporal grouping or joins.

anthropic:claude-opus-4-7 · confidence high
Out[29]:

saturn.columns["date"].stats

statvalue
n14,770
nulls0 (0.0%)
unique5,058
len_min 10
len_max 10
len_mean 10
len_median 10
len_p95 10
word_mean 1
word_median 1
n_empty 0
n_duplicates 9,712
duplicate_rate 0.6575
vocab_size 5,058
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
alert: duplicates65.8% duplicate strings
Fig 14.
Character-length distribution for date.
Show data table
Character-length distribution for date (mean: 10.0).
charscount
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 1014770
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100

country categorical metadata

This column records the country of origin for each record, but every one of the 14770 rows holds the value "USA". Cardinality is 1 and entropy is 0, so the field carries no information.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high
Out[32]:

saturn.columns["country"].stats

statvalue
n14,770
nulls0 (0.0%)
unique1
top_value USA
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for country.
Show data table
Top values for country (1 unique shown, of 1 total).
valuecountshare
USA14770100.0%

event_type categorical label

Categorical label of severe-weather event types across 14,770 rows with no nulls and 17 distinct categories. Tornado dominates at 42.9% (6,334 rows), followed by Flash Flood, Thunderstorm Wind, Flood, and Hail; entropy ratio of 0.57 confirms the distribution is concentrated in a few classes. The long tail (Heavy Rain, Marine Strong Wind, Debris Flow, Marine Thunderstorm Wind) has very thin support, which will hurt per-class modelling.

Treatment: Use as categorical target or feature; consider grouping rare classes given the heavy Tornado skew.

anthropic:claude-opus-4-7 · confidence high
Out[35]:

saturn.columns["event_type"].stats

statvalue
n14,770
nulls0 (0.0%)
unique17
top_value Tornado
top_rate 0.4288
cardinality 17
entropy 2.336
entropy_ratio 0.5715
Fig 16.
Top values for event_type.
Show data table
Top values for event_type (17 unique shown, of 17 total).
valuecountshare
Tornado633442.9%
Flash Flood235816.0%
Thunderstorm Wind225715.3%
Flood177712.0%
Hail12468.4%
Lightning5743.9%
Heavy Rain990.7%
Marine Strong Wind430.3%
Debris Flow430.3%
Marine Thunderstorm Wind250.2%
Marine High Wind50.0%
Dust Devil30.0%
Waterspout20.0%
Tropical Storm10.0%
High Wind10.0%
Heat10.0%
Marine Lightning10.0%

state categorical feature

US state names in uppercase, with 65 distinct values across 14,770 complete rows — more than the 50 states, suggesting territories, military codes, or 'UNKNOWN'-style entries are mixed in. Distribution is broad (entropy ratio 0.86) with Texas leading at 9.8% (1,450 rows), followed by Missouri, Arkansas, Mississippi, and Georgia, indicating a southern/central US tilt.

Treatment: Normalize to standard state codes and one-hot or target-encode; investigate the 15 extra categories beyond 50 states.

anthropic:claude-opus-4-7 · confidence high
Out[38]:

saturn.columns["state"].stats

statvalue
n14,770
nulls0 (0.0%)
unique65
top_value TEXAS
top_rate 0.09817
cardinality 65
entropy 5.182
entropy_ratio 0.8605
Fig 17.
Top values for state.
Show data table
Top values for state (20 unique shown, of 65 total).
valuecountshare
TEXAS14509.8%
MISSOURI6484.4%
ARKANSAS6024.1%
MISSISSIPPI5703.9%
GEORGIA5623.8%
ILLINOIS5603.8%
IOWA5273.6%
LOUISIANA5073.4%
TENNESSEE4993.4%
FLORIDA4983.4%
OKLAHOMA4903.3%
NEBRASKA4863.3%
ALABAMA4693.2%
WISCONSIN4633.1%
OHIO4413.0%
MICHIGAN4262.9%
NORTH CAROLINA4222.9%
KANSAS4182.8%
INDIANA4082.8%
KENTUCKY3832.6%

magnitude categorical feature

Numeric magnitudes stored as strings, with 170 distinct values ranging from '0' to two-decimal figures like '70.00'. Over half the rows (51.78%) are null, and of the non-nulls the value '0' dominates at 54.24%, leaving real magnitudes in a minority of records. Entropy ratio of 0.48 confirms most signal is concentrated in a few buckets.

Treatment: Cast to numeric, treat '0' and nulls as likely missing/sentinel, and consider a presence flag before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[41]:

saturn.columns["magnitude"].stats

statvalue
n14,770
nulls7,648 (51.8%)
unique170
top_value 0
top_rate 0.5424
cardinality 170
entropy 3.586
entropy_ratio 0.484
alert: null_rate51.8% null
Fig 18.
Top values for magnitude.
Show data table
Top values for magnitude (20 unique shown, of 170 total).
valuecountshare
0386326.2%
1.753832.6%
2.752201.5%
70.001621.1%
50.001511.0%
2.001501.0%
2.501230.8%
61.001220.8%
65.001040.7%
52.00950.6%
78.00800.5%
70790.5%
3.00770.5%
56.00760.5%
87.00650.4%
60.00630.4%
50590.4%
60540.4%
1.50500.3%
61470.3%

injuries categorical feature

This is an injury count per record, stored as strings but numeric in nature (top values are '0' through '12'). The distribution is dominated by zeros: 68.1% of 14,770 rows report '0' injuries, with 893 at '1' and a long tail spanning 178 distinct values. Entropy ratio of 0.33 confirms heavy concentration at the low end.

Treatment: Cast to integer and consider log1p or zero-inflated modelling given the heavy zero mass.

anthropic:claude-opus-4-7 · confidence high
Out[44]:

saturn.columns["injuries"].stats

statvalue
n14,770
nulls0 (0.0%)
unique178
top_value 0
top_rate 0.6814
cardinality 178
entropy 2.468
entropy_ratio 0.3301
Fig 19.
Top values for injuries.
Show data table
Top values for injuries (20 unique shown, of 178 total).
valuecountshare
01006468.1%
18936.0%
25523.7%
33432.3%
42361.6%
52341.6%
102191.5%
61961.3%
121581.1%
71340.9%
81210.8%
201140.8%
151110.8%
11900.6%
9850.6%
13700.5%
14690.5%
30680.5%
25560.4%
16480.3%

fatalities categorical numeric_target

Counts of fatalities per event, stored as strings with 49 distinct values across 14,770 rows and no nulls. The distribution is heavily zero-inflated: 69.1% of records are "0" and the next bucket "1" covers 3,208 rows, leaving a long thin tail (e.g., 25 rows at "9", 24 at "10"). Low entropy ratio (0.25) confirms most variance lives in the 0/1 split.

Treatment: Cast to integer and consider modelling as a count (Poisson/negative binomial) or binarise to any-fatality given the zero inflation.

anthropic:claude-opus-4-7 · confidence high
Out[47]:

saturn.columns["fatalities"].stats

statvalue
n14,770
nulls0 (0.0%)
unique49
top_value 0
top_rate 0.6912
cardinality 49
entropy 1.423
entropy_ratio 0.2535
Fig 20.
Top values for fatalities.
Show data table
Top values for fatalities (20 unique shown, of 49 total).
valuecountshare
01020969.1%
1320821.7%
26494.4%
32221.5%
41120.8%
5740.5%
6660.4%
7380.3%
9250.2%
10240.2%
8210.1%
11200.1%
13110.1%
16100.1%
1290.1%
1480.1%
1760.0%
2060.0%
2540.0%
2330.0%

damage_property text feature

This column encodes property damage as short magnitude strings with a K/M suffix (e.g., '2.5M', '250K', '0.00K'), not free text — every value is one word and max length is 8. Formatting is inconsistent: '1M' and '1.00M' coexist, and '0.00K' appears 1229 times alongside 368 empty entries, conflating true zeros with missingness. With 93.1% duplicate rate across only 1013 unique tokens, this is a coarse categorical-looking encoding of a numeric quantity.

Treatment: Parse the K/M suffix into a numeric dollar amount and treat empty strings as missing (distinct from 0.00K).

anthropic:claude-opus-4-7 · confidence high
Out[50]:

saturn.columns["damage_property"].stats

statvalue
n14,770
nulls0 (0.0%)
unique1,014
len_min 0
len_max 8
len_mean 4.381
len_median 5
len_p95 7
word_mean 1
word_median 1
n_empty 368
n_duplicates 13,756
duplicate_rate 0.9313
vocab_size 1,013
readability_flesch_mean 117
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0.8724
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: allcaps87.2% rows are all-caps
alert: short_text95th-percentile length under 20 chars
alert: duplicates93.1% duplicate strings
Fig 21.
Character-length distribution for damage_property.
Show data table
Character-length distribution for damage_property (mean: 4.380568720379147).
charscount
0 – 0368
0 – 00
0 – 10
1 – 10
1 – 10
1 – 1264
1 – 10
1 – 20
2 – 20
2 – 20
2 – 21252
2 – 20
2 – 30
3 – 30
3 – 30
3 – 31172
3 – 30
3 – 40
4 – 40
4 – 40
4 – 43414
4 – 40
4 – 50
5 – 50
5 – 50
5 – 56075
5 – 50
5 – 60
6 – 60
6 – 60
6 – 61450
6 – 60
6 – 70
7 – 70
7 – 70
7 – 7514
7 – 70
7 – 80
8 – 80
8 – 8261

source categorical metadata

This column records the data provenance, holding the constant string "NOAA Storm Events Database" for all 14770 rows. With cardinality of 1 and entropy of 0.0, it carries no information for modelling and only serves as a dataset-level annotation.

Treatment: Drop before modelling; retain in documentation as a provenance tag.

anthropic:claude-opus-4-7 · confidence high
Out[53]:

saturn.columns["source"].stats

statvalue
n14,770
nulls0 (0.0%)
unique1
top_value NOAA Storm Events Database
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 22.
Top values for source.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
NOAA Storm Events Database14770100.0%

How to cite

click to copy

BibTeX
@misc{saturn-natural-hazards-storms-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: natural hazards storms},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/natural_hazards-storms}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: natural hazards storms. Source: /home/coolhand/html/datavis/data_trove/data/natural_hazards/storms.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/natural_hazards-storms