saturn·

witnessed meteorite falls witnessed meteorite falls

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/witnessed-meteorite-falls/witnessed_meteorite_falls.json

Saturn profiled 1,097 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/witnessed-meteorite-falls/witnessed_meteorite_falls.json",
    "--findings", "witnessed-meteorite-falls-witnessed_meteorite_falls.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 1,097 witnessed meteorite falls, with each row identified by a unique name and described by date, geographic coordinates, meteorite class, and a short description. Two columns (category and fall_type) are constants ('witnessed_meteorite_falls' and 'Fell') and offer no analytical value. The most informative dimensions are meteorite_class — heavily dominated by L6 (260 falls, ~24%) followed by H5 (163) and H6 (91) — and the latitude/longitude pair, where latitude skews north (median 36.1) with about 8% outliers and longitude spans the full globe. The date column covers 231 distinct years with 1933 as the most frequent (17 falls), suggesting room for a time-trend exploration.

citing: row_count · column_count · columns.category.stats.top_value · columns.fall_type.stats.top_value · columns.meteorite_class.top_values · columns.meteorite_class.stats.cardinality · columns.latitude.stats · columns.longitude.stats · columns.date.top_values · columns.date.stats.cardinality

Out[4]:

saturn.schema() · 10 columns

column kind n null% unique alerts
latitude numeric 1,097 0.0% 958 outliers
longitude numeric 1,097 0.0% 1,030
name text 1,097 0.0% 1,097 near_unique one_word short_text
description text 1,097 0.0% 1,097 near_unique
category categorical 1,097 0.0% 1 imbalance
date categorical 1,097 1.7% 231
country unknown 1,097 0.0% skipped
mass_g unknown 1,097 0.0% skipped
meteorite_class categorical 1,097 0.0% 125
fall_type categorical 1,097 0.0% 1 imbalance
Fig 1.
meteorite_class · L6, H5, and H6 dominate — check how concentrated the class distribution really is across 125 categories.
Show data table
Top values for meteorite_class (20 unique shown, of 125 total).
valuecountshare
L626023.7%
H516314.9%
H6918.3%
L5766.9%
H4504.6%
LL6413.7%
Stone-uncl393.6%
OC242.2%
LL5191.7%
Eucrite-mmict181.6%
L4181.6%
Howardite161.5%
CM2151.4%
H131.2%
L100.9%
Iron, IIIAB100.9%
Aubrite90.8%
Diogenite80.7%
EL680.7%
CV370.6%
Fig 2.
latitude · Latitude skews toward the northern hemisphere (median 36.1) with ~8% outliers worth inspecting.
Show data table
Histogram bins for latitude (median: 36.1).
bincount
-44.12 – -40.772
-40.77 – -37.421
-37.42 – -34.073
-34.07 – -30.7331
-30.73 – -27.3814
-27.38 – -24.0314
-24.03 – -20.689
-20.68 – -17.3411
-17.34 – -13.996
-13.99 – -10.644
-10.64 – -7.29511
-7.295 – -3.94818
-3.948 – -0.60029
-0.6002 – 2.74710
2.747 – 6.0956
6.095 – 9.44213
9.442 – 12.7934
12.79 – 16.1432
16.14 – 19.4820
19.48 – 22.8334
22.83 – 26.1857
26.18 – 29.5359
29.53 – 32.8761
32.87 – 36.2296
36.22 – 39.5779
39.57 – 42.9278
42.92 – 46.26119
46.26 – 49.6181
49.61 – 52.9692
52.96 – 56.3153
56.31 – 59.6522
59.65 – 6314
63 – 66.354
Fig 3.
longitude · Longitude spans the globe with a wide IQR of 80.5 — useful for spotting regional clustering of fall sites.
Show data table
Histogram bins for longitude (median: 18.71667).
bincount
-157.9 – -147.82
-147.8 – -137.70
-137.7 – -127.71
-127.7 – -117.67
-117.6 – -107.511
-107.5 – -97.4544
-97.45 – -87.3949
-87.39 – -77.3251
-77.32 – -67.2526
-67.25 – -57.1821
-57.18 – -47.1114
-47.11 – -37.047
-37.04 – -26.972
-26.97 – -16.910
-16.91 – -6.83623
-6.836 – 3.232102
3.232 – 13.3135
13.3 – 23.3788
23.37 – 33.4491
33.44 – 43.5159
43.51 – 53.5825
53.58 – 63.6411
63.64 – 73.7129
73.71 – 83.78104
83.78 – 93.8531
93.85 – 103.913
103.9 – 11440
114 – 124.138
124.1 – 134.128
134.1 – 144.233
144.2 – 154.39
154.3 – 164.31
164.3 – 174.42
Fig 4.
date · Top years like 1933 and 1949 hint at temporal patterns; look for eras with elevated witnessed-fall counts.
Show data table
Top values for date (20 unique shown, of 231 total).
valuecountshare
1933-01-01171.5%
1949-01-01131.2%
1950-01-01121.1%
1976-01-01111.0%
1930-01-01111.0%
1938-01-01111.0%
1910-01-01111.0%
1868-01-01111.0%
1977-01-01100.9%
1939-01-01100.9%
1984-01-01100.9%
1934-01-01100.9%
1916-01-01100.9%
1924-01-01100.9%
1917-01-01100.9%
2008-01-0190.8%
2003-01-0190.8%
1998-01-0190.8%
1890-01-0190.8%
1986-01-0190.8%
Fig 5.
description · Descriptions are tightly templated (46–72 chars); length variation reflects which fields are populated.
Show data table
Character-length distribution for description (mean: 54.30811303555151).
charscount
46 – 471
47 – 475
47 – 480
48 – 4929
49 – 4979
49 – 500
50 – 51118
51 – 51137
51 – 520
52 – 52129
52 – 53110
53 – 540
54 – 5476
54 – 5568
55 – 560
56 – 5658
56 – 5754
57 – 580
58 – 5834
58 – 590
59 – 6040
60 – 6022
60 – 610
61 – 6226
62 – 6221
62 – 630
63 – 6420
64 – 6420
64 – 650
65 – 6614
66 – 669
66 – 670
67 – 6711
67 – 684
68 – 690
69 – 693
69 – 705
70 – 710
71 – 711
71 – 723
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datecategorical1.7%
countryunknown0.0%
mass_gunknown0.0%
meteorite_classcategorical0.0%
fall_typecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
latitudelongitude
latitude+1.00-0.09
longitude-0.09+1.00

latitude numeric feature

Geographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most of the inhabited globe. The distribution is left-skewed (skew -1.28) with median 36.1° pulling above the mean of 30.04°, indicating a Northern Hemisphere concentration. Roughly 8.2% of values (90 rows) flag as outliers, likely far-southern points well below the Q1 of 21.87°.

Treatment: Pair with longitude for geospatial features; keep outliers as legitimate Southern Hemisphere observations rather than trimming.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["latitude"].stats

statvalue
n1,097
nulls0 (0.0%)
unique958
min -44.12
max 66.35
mean 30.04
median 36.1
std 23.13
q1 21.87
q3 46.07
iqr 24.2
skew -1.276
kurtosis 1.01
n_outliers 90
outlier_rate 0.08204
zero_rate 0.001823
alert: outliers8.2% rows beyond 1.5 IQR
Fig 8.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 36.1).
bincount
-44.12 – -40.772
-40.77 – -37.421
-37.42 – -34.073
-34.07 – -30.7331
-30.73 – -27.3814
-27.38 – -24.0314
-24.03 – -20.689
-20.68 – -17.3411
-17.34 – -13.996
-13.99 – -10.644
-10.64 – -7.29511
-7.295 – -3.94818
-3.948 – -0.60029
-0.6002 – 2.74710
2.747 – 6.0956
6.095 – 9.44213
9.442 – 12.7934
12.79 – 16.1432
16.14 – 19.4820
19.48 – 22.8334
22.83 – 26.1857
26.18 – 29.5359
29.53 – 32.8761
32.87 – 36.2296
36.22 – 39.5779
39.57 – 42.9278
42.92 – 46.26119
46.26 – 49.6181
49.61 – 52.9692
52.96 – 56.3153
56.31 – 59.6522
59.65 – 6314
63 – 66.354

longitude numeric feature

Geographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. Distribution is broad (std 68.87, IQR 80.5) and only mildly left-skewed (-0.23) with flat tails (kurtosis -0.62), indicating worldwide coverage rather than a single region. 1030 unique values across 1097 rows suggests these are distinct point locations with minimal repetition; no nulls and only 3 outliers.

Treatment: Pair with latitude as a geospatial coordinate; avoid treating as a standalone scalar feature.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["longitude"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,030
min -157.9
max 174.4
mean 20.13
median 18.72
std 68.87
q1 -4.233
q3 76.27
iqr 80.5
skew -0.2257
kurtosis -0.6185
n_outliers 3
outlier_rate 0.002735
zero_rate 0.0009116
Fig 9.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: 18.71667).
bincount
-157.9 – -147.82
-147.8 – -137.70
-137.7 – -127.71
-127.7 – -117.67
-117.6 – -107.511
-107.5 – -97.4544
-97.45 – -87.3949
-87.39 – -77.3251
-77.32 – -67.2526
-67.25 – -57.1821
-57.18 – -47.1114
-47.11 – -37.047
-37.04 – -26.972
-26.97 – -16.910
-16.91 – -6.83623
-6.836 – 3.232102
3.232 – 13.3135
13.3 – 23.3788
23.37 – 33.4491
33.44 – 43.5159
43.51 – 53.5825
53.58 – 63.6411
63.64 – 73.7129
73.71 – 83.78104
83.78 – 93.8531
93.85 – 103.913
103.9 – 11440
114 – 124.138
124.1 – 134.128
134.1 – 144.233
144.2 – 154.39
154.3 – 164.31
164.3 – 174.42

name text identifier

This is a `name` column with 1097 fully unique short strings (n_unique equals n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being single-word entries. Top tokens like `st.`, `county`, `san`, `santa`, `creek`, plus Spanish articles `de`, `la`, `el`, strongly suggest place names (likely US/Latin-influenced toponyms) rather than person names. Every row is distinct, so this functions as an identifier-like label rather than a learnable feature.

Treatment: Treat as a unique label/key; drop from modelling features or use only for joins and display.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["name"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,097
len_min 2
len_max 28
len_mean 8.557
len_median 8
len_p95 15
word_mean 1.209
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,238
readability_flesch_mean 40.67
emoji_rate 0
url_rate 0
one_word_rate 0.8295
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word83.0% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 10.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 8.55697356426618).
charscount
2 – 31
3 – 37
3 – 40
4 – 540
5 – 5104
5 – 60
6 – 7174
7 – 7170
7 – 80
8 – 8144
8 – 9136
9 – 100
10 – 1078
10 – 1162
11 – 120
12 – 1251
12 – 1344
13 – 140
14 – 1420
14 – 150
15 – 1617
16 – 1611
16 – 170
17 – 1815
18 – 184
18 – 190
19 – 209
20 – 203
20 – 210
21 – 222
22 – 223
22 – 230
23 – 231
23 – 240
24 – 250
25 – 250
25 – 260
26 – 270
27 – 270
27 – 281

description text free_text

Short, templated descriptions of meteorite records — every one of 1097 rows contains the tokens 'meteorite', 'mass:', 'found:', and 'fell.', confirming a generated sentence rather than free prose. Lengths are tight (46–72 chars, mean 54.3, ~8 words) and each row is unique (n_unique=1097, duplicate_rate=0), so the field carries the same signal as the underlying structured columns. Class codes like 'l6.' (260), 'h5.' (163), 'h6.' (91), 'l5.' (76) leak the meteorite classification into the text.

Treatment: Drop or parse into structured fields (mass, found, class) rather than embedding — it is a template over existing columns.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["description"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1,097
len_min 46
len_max 72
len_mean 54.31
len_median 53
len_p95 64
word_mean 8.254
word_median 8
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,372
readability_flesch_mean 52.62
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 11.
Character-length distribution for description.
Show data table
Character-length distribution for description (mean: 54.30811303555151).
charscount
46 – 471
47 – 475
47 – 480
48 – 4929
49 – 4979
49 – 500
50 – 51118
51 – 51137
51 – 520
52 – 52129
52 – 53110
53 – 540
54 – 5476
54 – 5568
55 – 560
56 – 5658
56 – 5754
57 – 580
58 – 5834
58 – 590
59 – 6040
60 – 6022
60 – 610
61 – 6226
62 – 6221
62 – 630
63 – 6420
64 – 6420
64 – 650
65 – 6614
66 – 669
66 – 670
67 – 6711
67 – 684
68 – 690
69 – 693
69 – 705
70 – 710
71 – 711
71 – 723

category categorical metadata

This column is a single-valued categorical tag, with all 1097 rows labeled "witnessed_meteorite_falls". Cardinality is 1 and entropy is 0, so it carries no information for modelling and merely records the dataset's provenance or scope.

Treatment: Drop before modelling; retain only as a dataset-level annotation.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["category"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1
top_value witnessed_meteorite_falls
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 12.
Top values for category.
Show data table
Top values for category (1 unique shown, of 1 total).
valuecountshare
witnessed_meteorite_falls1097100.0%

date categorical timestamp

This column holds dates stored as strings, all snapped to January 1st of the year, suggesting year-only granularity disguised as full dates. Across 1097 rows there are 231 distinct values with very high entropy ratio (0.967) and no single year exceeding 1.6% frequency, so the distribution is spread broadly across years from at least 1868 to 1977. Null rate is low at 1.73%.

Treatment: Parse to datetime and extract year as the working feature, since month/day are constant.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["date"].stats

statvalue
n1,097
nulls19 (1.7%)
unique231
top_value 1933-01-01
top_rate 0.01577
cardinality 231
entropy 7.593
entropy_ratio 0.967
Fig 13.
Top values for date.
Show data table
Top values for date (20 unique shown, of 231 total).
valuecountshare
1933-01-01171.5%
1949-01-01131.2%
1950-01-01121.1%
1976-01-01111.0%
1930-01-01111.0%
1938-01-01111.0%
1910-01-01111.0%
1868-01-01111.0%
1977-01-01100.9%
1939-01-01100.9%
1984-01-01100.9%
1934-01-01100.9%
1916-01-01100.9%
1924-01-01100.9%
1917-01-01100.9%
2008-01-0190.8%
2003-01-0190.8%
1998-01-0190.8%
1890-01-0190.8%
1986-01-0190.8%

country unknown metadata

This column is labeled "country" and contains 1097 non-null values, but saturn skipped detailed profiling so neither the cardinality nor value distribution is available. Without unique counts or sample values, I cannot confirm whether it holds country names, ISO codes, or something else. The only firm signals are full population (null_rate 0.0) and the skipped alert.

Treatment: Re-profile with categorical stats enabled, then standardize to ISO codes before use.

anthropic:claude-opus-4-7 · confidence low
Out[31]:

saturn.columns["country"].stats

statvalue
n1,097
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

mass_g unknown other

Column `mass_g` was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. The only confirmed signals are 1097 rows with a 0.0 null rate; uniqueness, distribution, and type are all missing. The name suggests a numeric mass measurement in grams, but this cannot be verified from the evidence.

Treatment: Re-run profiling on this column to recover type and distribution before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[33]:

saturn.columns["mass_g"].stats

statvalue
n1,097
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

meteorite_class categorical label

This column captures the petrologic classification of meteorites, with 125 distinct classes across 1097 records and no nulls. The distribution is dominated by ordinary chondrite types — L6 alone covers 23.7% of rows, followed by H5 (163) and H6 (91) — while a long tail of 115+ rare classes pushes entropy ratio to 0.67. Analysts should note the heavy concentration in a handful of chondrite groups alongside niche entries like Eucrite-mmict (18).

Treatment: Group rare classes into an 'other' bucket before encoding for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[35]:

saturn.columns["meteorite_class"].stats

statvalue
n1,097
nulls0 (0.0%)
unique125
top_value L6
top_rate 0.237
cardinality 125
entropy 4.639
entropy_ratio 0.666
Fig 14.
Top values for meteorite_class.
Show data table
Top values for meteorite_class (20 unique shown, of 125 total).
valuecountshare
L626023.7%
H516314.9%
H6918.3%
L5766.9%
H4504.6%
LL6413.7%
Stone-uncl393.6%
OC242.2%
LL5191.7%
Eucrite-mmict181.6%
L4181.6%
Howardite161.5%
CM2151.4%
H131.2%
L100.9%
Iron, IIIAB100.9%
Aubrite90.8%
Diogenite80.7%
EL680.7%
CV370.6%

fall_type categorical metadata

This column records the type of fall event but contains the single value "Fell" across all 1097 rows, with zero nulls. Entropy is 0.0 and top_rate is 1.0, so it carries no information for any downstream model.

Treatment: Drop; constant column with a single value.

anthropic:claude-opus-4-7 · confidence high
Out[38]:

saturn.columns["fall_type"].stats

statvalue
n1,097
nulls0 (0.0%)
unique1
top_value Fell
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for fall_type.
Show data table
Top values for fall_type (1 unique shown, of 1 total).
valuecountshare
Fell1097100.0%

How to cite

click to copy

BibTeX
@misc{saturn-witnessed-meteorite-falls-witnessed-meteorite-falls-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: witnessed meteorite falls witnessed meteorite falls},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/witnessed-meteorite-falls-witnessed_meteorite_falls}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: witnessed meteorite falls witnessed meteorite falls. Source: /home/coolhand/datasets/witnessed-meteorite-falls/witnessed_meteorite_falls.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/witnessed-meteorite-falls-witnessed_meteorite_falls