saturn·

natural hazards meteorites

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/natural_hazards/meteorites.json

Saturn profiled 1,097 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/natural_hazards/meteorites.json",
    "--findings", "natural_hazards-meteorites.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is a 1,097-row catalogue of witnessed meteorite falls, with each record carrying a name, description, date, lat/long coordinates and a meteorite class. Two columns (category and fall_type) are constant — every record is a 'witnessed_meteorite_falls' event with fall_type 'Fell' — so the analytic interest sits elsewhere. Meteorite class is the most informative categorical: 125 distinct classes but heavily concentrated, with L6 alone accounting for ~24% of falls and H5 the next largest at ~15%. Latitude is skewed toward the northern hemisphere (median 36.1, mean 30.0) with ~8% flagged as outliers, while longitude spreads broadly across the globe (-157.9 to 174.4). Start with meteorite_class to understand the dominant compositions, then look at the lat/long pair to see geographic coverage.

citing: row_count · column_count · category.top_value · fall_type.top_value · meteorite_class.n_unique · meteorite_class.top_values · meteorite_class.top_rate · latitude.mean · latitude.median · latitude.outlier_rate · longitude.min · longitude.max · date.n_unique · date.top_values

Out[4]:

saturn.schema() · 10 columns

column kind n null% unique alerts
latitude numeric 1,097 0.0% 958 outliers
longitude numeric 1,097 0.0% 1,030
name text 1,097 0.0% 1,097 near_unique one_word short_text
description text 1,097 0.0% 1,097 near_unique
category categorical 1,097 0.0% 1 imbalance
date categorical 1,097 1.7% 231
country unknown 1,097 0.0% skipped
mass_g unknown 1,097 0.0% skipped
meteorite_class categorical 1,097 0.0% 125
fall_type categorical 1,097 0.0% 1 imbalance
Fig 1.
meteorite_class · L6 dominates at ~24% of falls, with H5 and H6 the next most common — note the long tail of 125 classes.
Show data table
Top values for meteorite_class (20 unique shown, of 125 total).
valuecountshare
L626023.7%
H516314.9%
H6918.3%
L5766.9%
H4504.6%
LL6413.7%
Stone-uncl393.6%
OC242.2%
LL5191.7%
Eucrite-mmict181.6%
L4181.6%
Howardite161.5%
CM2151.4%
H131.2%
L100.9%
Iron, IIIAB100.9%
Aubrite90.8%
Diogenite80.7%
EL680.7%
CV370.6%
Fig 2.
latitude · Distribution leans toward northern mid-latitudes (median 36.1) with a left tail of southern-hemisphere outliers.
Show data table
Histogram bins for latitude (median: 36.1).
bincount
-44.12 – -40.772
-40.77 – -37.421
-37.42 – -34.073
-34.07 – -30.7331
-30.73 – -27.3814
-27.38 – -24.0314
-24.03 – -20.689
-20.68 – -17.3411
-17.34 – -13.996
-13.99 – -10.644
-10.64 – -7.29511
-7.295 – -3.94818
-3.948 – -0.60029
-0.6002 – 2.74710
2.747 – 6.0956
6.095 – 9.44213
9.442 – 12.7934
12.79 – 16.1432
16.14 – 19.4820
19.48 – 22.8334
22.83 – 26.1857
26.18 – 29.5359
29.53 – 32.8761
32.87 – 36.2296
36.22 – 39.5779
39.57 – 42.9278
42.92 – 46.26119
46.26 – 49.6181
49.61 – 52.9692
52.96 – 56.3153
56.31 – 59.6522
59.65 – 6314
63 – 66.354
Fig 3.
longitude · Longitude spans the full globe (-158 to 174); look for clustering around populated landmasses where falls get reported.
Show data table
Histogram bins for longitude (median: 18.71667).
bincount
-157.9 – -147.82
-147.8 – -137.70
-137.7 – -127.71
-127.7 – -117.67
-117.6 – -107.511
-107.5 – -97.4544
-97.45 – -87.3949
-87.39 – -77.3251
-77.32 – -67.2526
-67.25 – -57.1821
-57.18 – -47.1114
-47.11 – -37.047
-37.04 – -26.972
-26.97 – -16.910
-16.91 – -6.83623
-6.836 – 3.232102
3.232 – 13.3135
13.3 – 23.3788
23.37 – 33.4491
33.44 – 43.5159
43.51 – 53.5825
53.58 – 63.6411
63.64 – 73.7129
73.71 – 83.78104
83.78 – 93.8531
93.85 – 103.913
103.9 – 11440
114 – 124.138
124.1 – 134.128
134.1 – 144.233
144.2 – 154.39
154.3 – 164.31
164.3 – 174.42
Fig 4.
date · 231 distinct fall dates spread fairly evenly; 1933 is the single busiest year with 17 recorded falls.
Show data table
Top values for date (20 unique shown, of 231 total).
valuecountshare
1933-01-01171.5%
1949-01-01131.2%
1950-01-01121.1%
1976-01-01111.0%
1930-01-01111.0%
1938-01-01111.0%
1910-01-01111.0%
1868-01-01111.0%
1977-01-01100.9%
1939-01-01100.9%
1984-01-01100.9%
1934-01-01100.9%
1916-01-01100.9%
1924-01-01100.9%
1917-01-01100.9%
2008-01-0190.8%
2003-01-0190.8%
1998-01-0190.8%
1890-01-0190.8%
1986-01-0190.8%
Fig 5.
description · Descriptions are uniformly templated (46-72 chars); useful as a sanity check that no records are truncated or malformed.
Show data table
Character-length distribution for description (mean: 54.30811303555151).
charscount
46 – 471
47 – 475
47 – 480
48 – 4929
49 – 4979
49 – 500
50 – 51118
51 – 51137
51 – 520
52 – 52129
52 – 53110
53 – 540
54 – 5476
54 – 5568
55 – 560
56 – 5658
56 – 5754
57 – 580
58 – 5834
58 – 590
59 – 6040
60 – 6022
60 – 610
61 – 6226
62 – 6221
62 – 630
63 – 6420
64 – 6420
64 – 650
65 – 6614
66 – 669
66 – 670
67 – 6711
67 – 684
68 – 690
69 – 693
69 – 705
70 – 710
71 – 711
71 – 723
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datecategorical1.7%
countryunknown0.0%
mass_gunknown0.0%
meteorite_classcategorical0.0%
fall_typecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
latitudelongitude
latitude+1.00-0.09
longitude-0.09+1.00

latitude numeric feature

Geographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most inhabited latitudes from southern Australia to northern Scandinavia. The distribution is left-skewed (skew -1.28) with a median of 36.1 sitting well above the mean of 30.04, indicating a northern-hemisphere bias with a tail of southern-hemisphere observations flagged as 90 outliers (8.2%). Near-unique values (958 of 1097) suggest each row is a distinct location.

Treatment: Pair with longitude for spatial features; the southern-hemisphere outliers are likely legitimate, not errors.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["latitude"].stats

statvalue
n1,097
nulls0 (0.0%)
unique958
min -44.12
max 66.35
mean 30.04
median 36.1
std 23.13
q1 21.87
q3 46.07
iqr 24.2
skew -1.276
kurtosis 1.01
n_outliers 90
outlier_rate 0.08204
zero_rate 0.001823
alert: outliers8.2% rows beyond 1.5 IQR
Fig 8.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 36.1).
bincount
-44.12 – -40.772
-40.77 – -37.421
-37.42 – -34.073
-34.07 – -30.7331
-30.73 – -27.3814
-27.38 – -24.0314
-24.03 – -20.689
-20.68 – -17.3411
-17.34 – -13.996
-13.99 – -10.644
-10.64 – -7.29511
-7.295 – -3.94818
-3.948 – -0.60029
-0.6002 – 2.74710
2.747 – 6.0956
6.095 – 9.44213
9.442 – 12.7934
12.79 – 16.1432
16.14 – 19.4820
19.48 – 22.8334
22.83 – 26.1857
26.18 – 29.5359
29.53 – 32.8761
32.87 – 36.2296
36.22 – 39.5779
39.57 – 42.9278
42.92 – 46.26119
46.26 – 49.6181
49.61 – 52.9692
52.96 – 56.3153
56.31 – 59.6522
59.65 – 6314
63 – 66.354

longitude numeric feature

Geographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. The distribution is roughly symmetric (skew -0.23, kurtosis -0.62) and centered near 18.72, suggesting a slight Eurasian/African concentration but broad global coverage. With 1030 unique values across 1097 rows and no nulls, points are nearly all distinct.

Treatment: Pair with latitude as a geospatial coordinate; avoid scaling as a plain numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["longitude"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,030
min -157.9
max 174.4
mean 20.13
median 18.72
std 68.87
q1 -4.233
q3 76.27
iqr 80.5
skew -0.2257
kurtosis -0.6185
n_outliers 3
outlier_rate 0.002735
zero_rate 0.0009116
Fig 9.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: 18.71667).
bincount
-157.9 – -147.82
-147.8 – -137.70
-137.7 – -127.71
-127.7 – -117.67
-117.6 – -107.511
-107.5 – -97.4544
-97.45 – -87.3949
-87.39 – -77.3251
-77.32 – -67.2526
-67.25 – -57.1821
-57.18 – -47.1114
-47.11 – -37.047
-37.04 – -26.972
-26.97 – -16.910
-16.91 – -6.83623
-6.836 – 3.232102
3.232 – 13.3135
13.3 – 23.3788
23.37 – 33.4491
33.44 – 43.5159
43.51 – 53.5825
53.58 – 63.6411
63.64 – 73.7129
73.71 – 83.78104
83.78 – 93.8531
93.85 – 103.913
103.9 – 11440
114 – 124.138
124.1 – 134.128
134.1 – 144.233
144.2 – 154.39
154.3 – 164.31
164.3 – 174.42

name text identifier

This is a `name` column with 1097 fully unique short strings (n_unique == n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being a single word. Top words like `st.`, `county`, `san`, `santa`, `creek`, `la`, `el`, `de` strongly suggest these are place or geographic entity names rather than person names, with a Spanish/English mix. No nulls, no URLs, no emoji — clean but effectively an identifier-grade label.

Treatment: Treat as a unique label/key; drop from modelling features or use only via geographic enrichment lookup.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["name"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,097
len_min 2
len_max 28
len_mean 8.557
len_median 8
len_p95 15
word_mean 1.209
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,238
readability_flesch_mean 40.67
emoji_rate 0
url_rate 0
one_word_rate 0.8295
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word83.0% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 10.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 8.55697356426618).
charscount
2 – 31
3 – 37
3 – 40
4 – 540
5 – 5104
5 – 60
6 – 7174
7 – 7170
7 – 80
8 – 8144
8 – 9136
9 – 100
10 – 1078
10 – 1162
11 – 120
12 – 1251
12 – 1344
13 – 140
14 – 1420
14 – 150
15 – 1617
16 – 1611
16 – 170
17 – 1815
18 – 184
18 – 190
19 – 209
20 – 203
20 – 210
21 – 222
22 – 223
22 – 230
23 – 231
23 – 240
24 – 250
25 – 250
25 – 260
26 – 270
27 – 270
27 – 281

description text metadata

This appears to be a templated, machine-generated description string for meteorite records, with every row containing the tokens 'meteorite', 'mass:', 'found:', and 'fell.' exactly 1097 times. Every value is unique (n_unique=1097, duplicate_rate=0.0) yet length is tightly bounded (46-72 chars, median 53), confirming a fixed schema where only embedded fields like classification (l6. appears 260 times, h5. 163) and numeric values vary. The 'unknown.' token appearing 1099 times signals frequent missing sub-fields packed into the template.

Treatment: Parse the template to extract structured fields (class, mass, found/fell) rather than embedding the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["description"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,097
len_min 46
len_max 72
len_mean 54.31
len_median 53
len_p95 64
word_mean 8.254
word_median 8
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,372
readability_flesch_mean 52.62
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 11.
Character-length distribution for description.
Show data table
Character-length distribution for description (mean: 54.30811303555151).
charscount
46 – 471
47 – 475
47 – 480
48 – 4929
49 – 4979
49 – 500
50 – 51118
51 – 51137
51 – 520
52 – 52129
52 – 53110
53 – 540
54 – 5476
54 – 5568
55 – 560
56 – 5658
56 – 5754
57 – 580
58 – 5834
58 – 590
59 – 6040
60 – 6022
60 – 610
61 – 6226
62 – 6221
62 – 630
63 – 6420
64 – 6420
64 – 650
65 – 6614
66 – 669
66 – 670
67 – 6711
67 – 684
68 – 690
69 – 693
69 – 705
70 – 710
71 – 711
71 – 723

category categorical metadata

This column is a constant categorical tag, holding the single value "witnessed_meteorite_falls" across all 1097 rows. With cardinality of 1 and entropy of 0, it carries no information and likely identifies the dataset's provenance or scope rather than describing individual records.

Treatment: Drop before modelling; retain only as a dataset-level label.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["category"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1
top_value witnessed_meteorite_falls
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 12.
Top values for category.
Show data table
Top values for category (1 unique shown, of 1 total).
valuecountshare
witnessed_meteorite_falls1097100.0%

date categorical timestamp

Date values stored as ISO strings, all snapped to January 1st of the year — so this is effectively a year-granularity field masquerading as a full date. Across 1097 rows there are 231 distinct years with very high entropy ratio (0.967), and the most common year (1933-01-01) accounts for only 1.6% of rows. Null rate is 1.73%.

Treatment: Parse to datetime and extract the year as an integer feature; the month/day component carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["date"].stats

statvalue
n1,097
nulls19 (1.7%)
unique231
top_value 1933-01-01
top_rate 0.01577
cardinality 231
entropy 7.593
entropy_ratio 0.967
Fig 13.
Top values for date.
Show data table
Top values for date (20 unique shown, of 231 total).
valuecountshare
1933-01-01171.5%
1949-01-01131.2%
1950-01-01121.1%
1976-01-01111.0%
1930-01-01111.0%
1938-01-01111.0%
1910-01-01111.0%
1868-01-01111.0%
1977-01-01100.9%
1939-01-01100.9%
1984-01-01100.9%
1934-01-01100.9%
1916-01-01100.9%
1924-01-01100.9%
1917-01-01100.9%
2008-01-0190.8%
2003-01-0190.8%
1998-01-0190.8%
1890-01-0190.8%
1986-01-0190.8%

country unknown feature

This column is labeled 'country' and likely holds country names or codes, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. The only confirmed signals are 1097 rows with a 0.0 null rate. No further evidence is available to characterize cardinality, format, or dominant values.

Treatment: Re-profile to determine cardinality and format, then encode as a categorical.

anthropic:claude-opus-4-7 · confidence low
Out[31]:

saturn.columns["country"].stats

statvalue
n1,097
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

mass_g unknown other

The column is named mass_g, suggesting a mass measurement in grams across 1,097 rows with no nulls. However, saturn skipped profiling this column and reported no kind, uniqueness, or distribution stats, so its actual content and shape cannot be confirmed from the evidence.

Treatment: Re-profile or manually inspect this column before use; current evidence is insufficient to decide handling.

anthropic:claude-opus-4-7 · confidence low
Out[33]:

saturn.columns["mass_g"].stats

statvalue
n1,097
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

meteorite_class categorical label

This column records meteorite classification codes (L6, H5, H6, etc.), the standard taxonomy for ordinary chondrites and other meteorite types. Cardinality is high at 125 distinct classes across 1,097 rows with no nulls, but the distribution is concentrated: L6 alone covers 23.7% and the top four classes (L6, H5, H6, L5) account for over half the data. Entropy ratio of 0.67 confirms a long tail of rare classes that will be sparsely represented.

Treatment: Group rare classes into an 'other' bucket or use target encoding before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[35]:

saturn.columns["meteorite_class"].stats

statvalue
n1,097
nulls0 (0.0%)
unique125
top_value L6
top_rate 0.237
cardinality 125
entropy 4.639
entropy_ratio 0.666
Fig 14.
Top values for meteorite_class.
Show data table
Top values for meteorite_class (20 unique shown, of 125 total).
valuecountshare
L626023.7%
H516314.9%
H6918.3%
L5766.9%
H4504.6%
LL6413.7%
Stone-uncl393.6%
OC242.2%
LL5191.7%
Eucrite-mmict181.6%
L4181.6%
Howardite161.5%
CM2151.4%
H131.2%
L100.9%
Iron, IIIAB100.9%
Aubrite90.8%
Diogenite80.7%
EL680.7%
CV370.6%

fall_type categorical metadata

This column records a fall classification but contains the single value "Fell" across all 1097 rows, with no nulls. Entropy is 0 and cardinality is 1, so it carries no information for any downstream model or segmentation.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[38]:

saturn.columns["fall_type"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1
top_value Fell
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for fall_type.
Show data table
Top values for fall_type (1 unique shown, of 1 total).
valuecountshare
Fell1097100.0%

How to cite

click to copy

BibTeX
@misc{saturn-natural-hazards-meteorites-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: natural hazards meteorites},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/natural_hazards-meteorites}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: natural hazards meteorites. Source: /home/coolhand/html/datavis/data_trove/data/natural_hazards/meteorites.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/natural_hazards-meteorites