saturn

/home/coolhand/html/datavis/data_trove/data/quirky/tornadoes.json 70,022 rows sample n=70,022 seed 42 2026-05-01T23:12:11+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/tornadoes.json
Total rows70,022
Profiled sample70,022
Columns13
Generated2026-05-01T23:12:11+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is a tornado event log with 70,022 rows and 13 columns covering dates, times, start/end coordinates, magnitudes, widths, fatalities, injuries, and U.S. state. Geographically it is a U.S.-centered dataset: starting longitudes average around -92.7 and latitudes around 37.1, with Texas (13.3% of records), Kansas, and Oklahoma leading the state counts. The severity fields are highly imbalanced — fatalities are 0 in 97.7% of events and injuries are 0 in 88.8% — so any analysis of harm should focus on the rare non-zero tail. Magnitude (mag) is a more usable categorical signal with 7 levels, dominated by 0 (46%) and 1 (34%). Note that the end-coordinate columns (elat, elon) are null in ~37.7% of rows, which matters if you plan to draw tornado tracks rather than just start points.

date high anthropic:claude-opus-4-7

This is a date column stored as ISO-formatted text (YYYY-MM-DD), with every value exactly 10 characters long and one token. Across 70,022 rows there are only 12,639 unique dates and an 81.9% duplicate rate, so many records share dates — top value 2011-04-27 appears 207 times. The range spans at least 1974-04-03 to 2023-03-31, suggesting it was misclassified as text rather than a date type.

time high anthropic:claude-opus-4-7

This column holds clock times stored as 8-character HH:MM:SS strings, with all 70,022 rows non-null and uniform length (len_min=len_max=8). Only 1,438 distinct values appear and 97.95% are duplicates, with afternoon/evening slots like 16:00:00 (978), 17:00:00 (971) and 15:00:00 (959) dominating — consistent with event start times rather than free text. It's mistyped as text: parse to a proper time type before use.

state high anthropic:claude-opus-4-7

This is a US state code column with 53 distinct values, slightly more than the 50 states (likely including DC and territories like PR). TX dominates at 13.3% of 70,022 rows, followed by KS (4,474) and OK (4,221), giving a clear southern/plains skew rather than a uniform national distribution. Entropy ratio of 0.847 confirms the distribution is fairly spread but not flat, and there are no nulls.

mag high anthropic:claude-opus-4-7

`mag` is a low-cardinality categorical with 7 distinct values dominated by '0' (46% of 70022 rows) and decreasing counts through '1','2','3','4','5'. The ordered integer levels suggest a magnitude or severity code rather than a free category, and the presence of '-9' (1024 rows) is the standout signal — almost certainly a sentinel for missing/unknown that is not being counted as null.

injuries high anthropic:claude-opus-4-7

Counts of injuries stored as strings, with 209 distinct values across 70,022 rows and no nulls. The distribution is severely zero-inflated: '0' accounts for 88.8% of records, and entropy ratio is just 0.123. Tail values like '1' through '10' decay quickly but 209 unique tokens suggests very long tails or non-integer entries worth inspecting.

fatalities high anthropic:claude-opus-4-7

This is a fatality count per event, stored as a categorical/string field with 50 distinct integer values and no nulls across 70,022 rows. The distribution is extremely imbalanced: '0' accounts for 97.72% of records, giving an entropy ratio of just 0.039, with '1' at 830 rows and a long thin tail (2 → 277, 3 → 134, down to single-digit counts at higher values). Despite being typed as categorical, the values are numeric and ordered, so the current encoding is likely a load artifact.

loss high anthropic:claude-opus-4-7

Numeric loss values stored as text strings, with all 70,022 entries being single tokens (one_word_rate 1.0) and lengths of 1-10 characters. The distribution is heavily concentrated: '0.0' alone accounts for 22,764 rows and the duplicate_rate is 0.985 across only 1,019 unique values. Mixed formatting is a hazard — '0' and '0.0' appear as separate tokens, so a naive cast will collapse them but string-based grouping won't.

slat high anthropic:claude-opus-4-7

Values range from 17.72 to 61.02 with a mean of 37.14 and median of 37.03, consistent with a starting latitude (slat) field in decimal degrees, likely covering the contiguous US. The distribution is essentially symmetric (skew 0.04, kurtosis -0.58) with a tight IQR of 7.74 and only 70 outliers (0.10%). No nulls and no zeros, and 16,016 unique values across 70,022 rows suggest repeated coordinates rather than free-form noise.

slon high anthropic:claude-opus-4-7

Values are negative decimal degrees ranging from -163.53 to -64.7151 with a median of -93.5, consistent with western-hemisphere longitudes (the 'slon' name suggests starting longitude). The distribution is mildly left-skewed (-0.32) and concentrated within an IQR of ~11.7 degrees around the central US longitude band. Only 951 outliers (1.36%) fall outside that range, and there are no nulls or zeros.

elat high anthropic:claude-opus-4-7

Almost certainly an event latitude in decimal degrees, with values spanning 17.72 to 61.02 — consistent with locations across North America. The distribution is roughly symmetric (skew 0.03, kurtosis -0.41) and centered near 37.26, with only 78 mild outliers. The standout concern is that 37.65% of rows are null, so coverage is partial.

elon high anthropic:claude-opus-4-7

This is almost certainly longitude (east coordinate), with values bounded between -163.53 and -64.7151 and a median of -92.47, consistent with points across North America. The distribution is moderately left-skewed (-0.60) with kurtosis 3.77 and a tight IQR of 11.26 around the median. Notably, 37.65% of rows are null, which will materially shrink any geo-based analysis.

len high anthropic:claude-opus-4-7

Despite being typed as text, `len` holds short numeric tokens (length 3-8, one word each) like '0.1', '0.5', '1.0' — almost certainly a length or size measurement stored as strings. Values are highly concentrated: '0.1' alone covers 15,456 of 70,022 rows and the duplicate rate is 94.8% across only 3,663 unique tokens. The allcaps and one_word alerts are artefacts of numeric strings rather than real text signal.

wid high anthropic:claude-opus-4-7

wid is a categorical column with 419 distinct values, all numeric-looking strings like '10', '50', '100', '30' — almost certainly a width or weight parameter stored as text rather than a true category. The distribution is heavily concentrated on round numbers: '10' alone covers 20.6% of 70022 rows and the top ten values are all multiples of 5 or 25, giving an entropy ratio of 0.51. No nulls, but the string encoding of clearly numeric quanta is the surprise.

Numeric correlation

date text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 81.9% duplicate strings
rows70,022
null0 (0.0%)
unique12,639
len_min10
len_max10
len_mean10.000
len_median10.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates57,383
duplicate_rate0.819
vocab_size7,831
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 1950-05-16
  2. 2009-06-27
  3. 2016-05-26
  4. 2013-06-21
  5. 1990-09-09
  6. 2022-04-05
  7. 1991-08-15
  8. 2018-04-15
  9. 2013-07-01
  10. 1956-04-03

time text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 97.9% duplicate strings
rows70,022
null0 (0.0%)
unique1,438
len_min8
len_max8
len_mean8.000
len_median8.000
len_p958.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates68,584
duplicate_rate0.979
vocab_size1,352
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 18:00:00
  2. 13:55:00
  3. 19:01:00
  4. 12:58:00
  5. 14:45:00
  6. 07:14:00
  7. 15:35:00
  8. 16:56:00
  9. 18:41:00
  10. 19:45:00

state categorical

rows70,022
null0 (0.0%)
unique53
top_valueTX
top_rate0.133
cardinality53
entropy4.851
entropy_ratio0.847
Top values (rank 1–20)
  1. TX — 9,345
  2. KS — 4,474
  3. OK — 4,221
  4. FL — 3,620
  5. NE — 3,056
  6. IA — 2,887
  7. IL — 2,835
  8. MS — 2,657
  9. AL — 2,529
  10. MO — 2,462
  11. CO — 2,425
  12. LA — 2,305
  13. MN — 2,118
  14. AR — 1,981
  15. SD — 1,917
  16. GA — 1,898
  17. ND — 1,640
  18. IN — 1,610
  19. WI — 1,515
  20. NC — 1,472

mag categorical

rows70,022
null0 (0.0%)
unique7
top_value0
top_rate0.460
cardinality7
entropy1.772
entropy_ratio0.631
Top values (rank 1–20)
  1. 0 — 32,218
  2. 1 — 23,782
  3. 2 — 9,767
  4. 3 — 2,585
  5. -9 — 1,024
  6. 4 — 587
  7. 5 — 59

injuries categorical

rows70,022
null0 (0.0%)
unique209
top_value0
top_rate0.888
cardinality209
entropy0.945
entropy_ratio0.123
Top values (rank 1–20)
  1. 0 — 62,177
  2. 1 — 2,480
  3. 2 — 1,388
  4. 3 — 770
  5. 4 — 484
  6. 5 — 385
  7. 6 — 300
  8. 7 — 194
  9. 8 — 171
  10. 10 — 141
  11. 12 — 120
  12. 9 — 117
  13. 11 — 80
  14. 20 — 71
  15. 15 — 69
  16. 13 — 66
  17. 14 — 55
  18. 30 — 46
  19. 25 — 44
  20. 16 — 43

fatalities categorical

top value is 97.7% of rows
rows70,022
null0 (0.0%)
unique50
top_value0
top_rate0.977
cardinality50
entropy0.217
entropy_ratio0.039
Top values (rank 1–20)
  1. 0 — 68,423
  2. 1 — 830
  3. 2 — 277
  4. 3 — 134
  5. 4 — 77
  6. 5 — 46
  7. 6 — 45
  8. 7 — 32
  9. 9 — 15
  10. 10 — 15
  11. 11 — 14
  12. 8 — 13
  13. 16 — 12
  14. 13 — 8
  15. 17 — 7
  16. 18 — 6
  17. 21 — 6
  18. 12 — 6
  19. 22 — 5
  20. 25 — 4

loss text

100.0% rows are a single word 92.5% rows are all-caps 95th-percentile length under 20 chars 98.5% duplicate strings
rows70,022
null0 (0.0%)
unique1,019
len_min1
len_max10
len_mean3.181
len_median3.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates69,003
duplicate_rate0.985
vocab_size503
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.925
boilerplate_rate0.000
Sample values (first 10)
  1. 3.0
  2. 0.0
  3. 0
  4. 0.0
  5. 3.0
  6. 0
  7. 5.0
  8. 100000
  9. 0.0
  10. 4.0

slat numeric

rows70,022
null0 (0.0%)
unique16,016
min17.721
max61.020
mean37.137
median37.025
std5.090
q133.190
q340.930
iqr7.740
skew0.038
kurtosis-0.582
n_outliers70
outlier_rate1.00e-03
zero_rate0.000

slon numeric

rows70,022
null0 (0.0%)
unique17,912
min-163.530
max-64.715
mean-92.738
median-93.500
std8.677
q1-98.400
q3-86.691
iqr11.709
skew-0.323
kurtosis2.156
n_outliers951
outlier_rate0.014
zero_rate0.000

elat numeric

37.6% null
rows70,022
null26,363 (37.6%)
unique16,965
min17.721
max61.020
mean37.262
median37.131
std4.942
q133.490
q340.910
iqr7.420
skew0.034
kurtosis-0.409
n_outliers78
outlier_rate1.79e-03
zero_rate0.000

elon numeric

37.6% null
rows70,022
null26,363 (37.6%)
unique18,586
min-163.530
max-64.715
mean-92.193
median-92.470
std8.545
q1-97.730
q3-86.470
iqr11.260
skew-0.595
kurtosis3.766
n_outliers647
outlier_rate0.015
zero_rate0.000

len text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 94.8% duplicate strings
rows70,022
null0 (0.0%)
unique3,663
len_min3
len_max8
len_mean3.626
len_median3.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates66,359
duplicate_rate0.948
vocab_size2,204
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 0.2
  2. 0.22
  3. 5.4400
  4. 0.21
  5. 0.3
  6. 2.2100
  7. 0.8
  8. 1.7000
  9. 0.8
  10. 0.2

wid categorical

rows70,022
null0 (0.0%)
unique419
top_value10
top_rate0.206
cardinality419
entropy4.463
entropy_ratio0.512
Top values (rank 1–20)
  1. 10 — 14,417
  2. 50 — 10,366
  3. 100 — 7,067
  4. 30 — 4,772
  5. 20 — 4,368
  6. 200 — 2,946
  7. 25 — 2,452
  8. 150 — 2,101
  9. 40 — 1,967
  10. 75 — 1,906
  11. 300 — 1,430
  12. 33 — 1,160
  13. 17 — 1,037
  14. 400 — 944
  15. 23 — 812
  16. 250 — 765
  17. 60 — 737
  18. 440 — 677
  19. 500 — 636
  20. 80 — 573