saturn·

wild nasa meteorites

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/wild/nasa_meteorites.csv

Saturn profiled 45,716 rows across 20 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/wild/nasa_meteorites.csv",
    "--findings", "wild-nasa_meteorites.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is a NASA meteorites dataset with 45,716 records and 20 columns covering each meteorite's name, classification, mass, fall type, year, and geographic coordinates. The most interesting signals are physical and categorical: mass (g) is extremely skewed (mean ~13,278g vs median 32.6g, max 60,000,000g) with ~15.5% flagged as outliers, and recclass is dominated by ordinary chondrites (L6 at 18.1%, followed by H5, L5, H6, H4). The fall column is heavily imbalanced — 97.6% 'Found' vs 2.4% 'Fell' — and year shows a clear concentration in recent decades, peaking at 2003 (3,323 records). Note that Counties and States are 96% null, several columns (created_at, updated_at, position, meta) are constant and can be ignored, and GeoLocation has 55% duplicate values driven by a few repeated Antarctic coordinates.

citing: mass (g) · recclass · fall · year · nametype · Counties · States · GeoLocation · created_at · position · meta

Out[4]:

saturn.schema() · 20 columns

column kind n null% unique alerts
sid text 45,716 0.0% 45,716 near_unique one_word short_text
id text 45,716 0.0% 45,716 near_unique one_word allcaps
position numeric 45,716 0.0% 1 constant
created_at numeric 45,716 0.0% 1 constant
created_meta unknown 45,716 0.0% skipped
updated_at numeric 45,716 0.0% 1 constant
updated_meta unknown 45,716 0.0% skipped
meta categorical 45,716 0.0% 1 imbalance
name text 45,716 0.0% 45,716 near_unique
id_1 numeric 45,716 0.0% 45,716
nametype categorical 45,716 0.0% 2 imbalance
recclass categorical 45,716 0.0% 466
mass (g) numeric 45,716 0.3% 12,576 high_skew outliers
fall categorical 45,716 0.0% 2 imbalance
year categorical 45,716 0.6% 266
reclat numeric 45,716 16.0% 12,738
reclong numeric 45,716 16.0% 14,640
GeoLocation text 45,716 16.0% 17,100 duplicates
States numeric 45,716 96.4% 45 null_rate
Counties numeric 45,716 96.4% 662 null_rate
Fig 1.
mass (g) · Extreme right skew — most meteorites are tiny (median 32.6g) but a long tail reaches 60 million grams.
Show data table
Histogram bins for mass (g) (median: 32.6).
bincount
0 – 1.5e+0645544
1.5e+06 – 3e+0616
3e+06 – 4.5e+068
4.5e+06 – 6e+061
6e+06 – 7.5e+061
7.5e+06 – 9e+061
9e+06 – 1.05e+072
1.05e+07 – 1.2e+070
1.2e+07 – 1.35e+070
1.35e+07 – 1.5e+070
1.5e+07 – 1.65e+072
1.65e+07 – 1.8e+070
1.8e+07 – 1.95e+070
1.95e+07 – 2.1e+070
2.1e+07 – 2.25e+071
2.25e+07 – 2.4e+071
2.4e+07 – 2.55e+072
2.55e+07 – 2.7e+071
2.7e+07 – 2.85e+071
2.85e+07 – 3e+070
3e+07 – 3.15e+071
3.15e+07 – 3.3e+070
3.3e+07 – 3.45e+070
3.45e+07 – 3.6e+070
3.6e+07 – 3.75e+070
3.75e+07 – 3.9e+070
3.9e+07 – 4.05e+070
4.05e+07 – 4.2e+070
4.2e+07 – 4.35e+070
4.35e+07 – 4.5e+070
4.5e+07 – 4.65e+070
4.65e+07 – 4.8e+070
4.8e+07 – 4.95e+070
4.95e+07 – 5.1e+071
5.1e+07 – 5.25e+070
5.25e+07 – 5.4e+070
5.4e+07 – 5.55e+070
5.55e+07 – 5.7e+070
5.7e+07 – 5.85e+071
5.85e+07 – 6e+071
Fig 2.
recclass · Top classifications are dominated by ordinary chondrites L6 (18%) and H5, with a long tail of 466 classes.
Show data table
Top values for recclass (20 unique shown, of 466 total).
valuecountshare
L6828518.1%
H5714215.6%
L5479610.5%
H645289.9%
H442119.2%
LL527666.1%
LL620434.5%
L412532.7%
H4/54280.9%
CM24160.9%
H33860.8%
L33650.8%
CO33350.7%
Ureilite3000.7%
Iron, IIIAB2850.6%
LL42680.6%
CV32560.6%
Diogenite2410.5%
Howardite2400.5%
LL2250.5%
Fig 3.
fall · Heavy imbalance: 97.6% were 'Found' versus only 2.4% observed 'Fell'.
Show data table
Top values for fall (2 unique shown, of 2 total).
valuecountshare
Found4460997.6%
Fell11072.4%
Fig 4.
year · Year of record is concentrated in recent decades, peaking around 2003, 1979, and 1998.
Show data table
Top values for year (20 unique shown, of 266 total).
valuecountshare
2003-01-01T00:00:0033237.3%
1979-01-01T00:00:0030466.7%
1998-01-01T00:00:0026975.9%
2006-01-01T00:00:0024565.4%
1988-01-01T00:00:0022965.0%
2002-01-01T00:00:0020784.5%
2004-01-01T00:00:0019404.2%
2000-01-01T00:00:0017923.9%
1997-01-01T00:00:0016963.7%
1999-01-01T00:00:0016913.7%
2001-01-01T00:00:0016503.6%
1990-01-01T00:00:0015183.3%
2009-01-01T00:00:0014973.3%
1986-01-01T00:00:0013753.0%
2007-01-01T00:00:0011892.6%
2010-01-01T00:00:0010052.2%
1993-01-01T00:00:009792.1%
2008-01-01T00:00:009572.1%
1987-01-01T00:00:009162.0%
1991-01-01T00:00:008771.9%
Fig 5.
nametype · Nearly all entries are 'Valid' (99.8%); only 75 'Relict' meteorites stand out.
Show data table
Top values for nametype (2 unique shown, of 2 total).
valuecountshare
Valid4564199.8%
Relict750.2%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
sidtext0.0%
idtext0.0%
positionnumeric0.0%
created_atnumeric0.0%
created_metaunknown0.0%
updated_atnumeric0.0%
updated_metaunknown0.0%
metacategorical0.0%
nametext0.0%
id_1numeric0.0%
nametypecategorical0.0%
recclasscategorical0.0%
mass (g)numeric0.3%
fallcategorical0.0%
yearcategorical0.6%
reclatnumeric16.0%
reclongnumeric16.0%
GeoLocationtext16.0%
Statesnumeric96.4%
Countiesnumeric96.4%
Fig 7.
Language mix across all text columns (per-string detection, sampled).
Show data table
Per-language counts (total 4,211 detected strings).
langcountshare
en4209100.0%
sh20.0%
Fig 8.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 9 numeric columns (values clipped to 2 decimals).
positioncreated_atupdated_atid_1mass (g)reclatreclongStatesCounties
position+nan+nan+nan+nan+nan+nan+nan+nan+nan
created_at+nan+nan+nan+nan+nan+nan+nan+nan+nan
updated_at+nan+nan+nan+nan+nan+nan+nan+nan+nan
id_1+nan+nan+nan+1.00-0.04+0.09-0.18-0.06-0.09
mass (g)+nan+nan+nan-0.04+1.00+0.06+0.02+0.09-0.01
reclat+nan+nan+nan+0.09+0.06+1.00-0.56+0.07+0.02
reclong+nan+nan+nan-0.18+0.02-0.56+1.00-0.03-0.08
States+nan+nan+nan-0.06+0.09+0.07-0.03+1.00+0.15
Counties+nan+nan+nan-0.09-0.01+0.02-0.08+0.15+1.00

sid text identifier

This is a synthetic row identifier: every one of the 45716 values is unique, exactly 18 characters long, single-token, and follows a 'row-xxxx-xxxx-xxxx' pattern. There are no nulls, duplicates, or empties, confirming it functions as a primary key rather than a feature.

Treatment: Drop from modelling; retain only as a join key or row index.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["sid"].stats

statvalue
n45,716
nulls0 (0.0%)
unique45,716
len_min 18
len_max 18
len_mean 18
len_median 18
len_p95 18
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 20,000
readability_flesch_mean -5.68
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 9.
Character-length distribution for sid.
Show data table
Character-length distribution for sid (mean: 18.0).
charscount
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 1845716
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180
18 – 180

id text identifier

This column is a row identifier holding 36-character UUID-style strings, all uppercase and one token wide. Every one of the 45,716 values is unique with zero nulls or duplicates, and length is fixed at exactly 36 characters across min, median, and max. The shared `00000000-0000-0000-` prefix on all sampled values is notable — only the latter half of each UUID varies, suggesting a namespaced or truncated-entropy ID scheme rather than fully random v4 UUIDs.

Treatment: Drop from modelling features; retain only as a join key or row reference.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["id"].stats

statvalue
n45,716
nulls0 (0.0%)
unique45,716
len_min 36
len_max 36
len_mean 36
len_median 36
len_p95 36
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 20,000
readability_flesch_mean 65.38
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
Fig 10.
Character-length distribution for id.
Show data table
Character-length distribution for id (mean: 36.0).
charscount
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 3645716
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360
36 – 360

position numeric other

The column 'position' is numeric but holds a single value across all 45716 rows: every entry is 0, giving a zero_rate of 1.0 and n_unique of 1. With zero variance (std 0.0, iqr 0.0), it carries no information for any downstream task.

Treatment: Drop, constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[20]:

saturn.columns["position"].stats

statvalue
n45,716
nulls0 (0.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 11.
Distribution of position. Vertical dash marks the median.
Show data table
Histogram bins for position (median: 0.0).
bincount
-0.5 – -0.4750
-0.475 – -0.450
-0.45 – -0.4250
-0.425 – -0.40
-0.4 – -0.3750
-0.375 – -0.350
-0.35 – -0.3250
-0.325 – -0.30
-0.3 – -0.2750
-0.275 – -0.250
-0.25 – -0.2250
-0.225 – -0.20
-0.2 – -0.1750
-0.175 – -0.150
-0.15 – -0.1250
-0.125 – -0.10
-0.1 – -0.0750
-0.075 – -0.050
-0.05 – -0.0250
-0.025 – 00
0 – 0.02545716
0.025 – 0.050
0.05 – 0.0750
0.075 – 0.10
0.1 – 0.1250
0.125 – 0.150
0.15 – 0.1750
0.175 – 0.20
0.2 – 0.2250
0.225 – 0.250
0.25 – 0.2750
0.275 – 0.30
0.3 – 0.3250
0.325 – 0.350
0.35 – 0.3750
0.375 – 0.40
0.4 – 0.4250
0.425 – 0.450
0.45 – 0.4750
0.475 – 0.50

created_at numeric timestamp

This column appears to be a Unix epoch creation timestamp (1446143734 corresponds to a single moment in late 2015), stored as a numeric value. Across all 45716 rows it holds exactly one value, with std 0.0 and n_unique 1, so it carries no information to differentiate records. The 'constant' alert confirms there is no variation to model or filter on.

Treatment: Drop; constant column adds no signal.

anthropic:claude-opus-4-7 · confidence high
Out[23]:

saturn.columns["created_at"].stats

statvalue
n45,716
nulls0 (0.0%)
unique1
min 1.446e+09
max 1.446e+09
mean 1.446e+09
median 1.446e+09
std 0
q1 1.446e+09
q3 1.446e+09
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 12.
Distribution of created_at. Vertical dash marks the median.
Show data table
Histogram bins for created_at (median: 1446143734.0).
bincount
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+0945716
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090

created_meta unknown metadata

The column `created_meta` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 45716 and a null_rate of 0.0. The name suggests it carries creation-time metadata (e.g., a user id or system tag attached to record creation), but this cannot be confirmed from the evidence. No further signal is present to assess distribution, uniqueness, or drift.

Treatment: Re-profile with parsing enabled before deciding; otherwise drop until contents are characterised.

anthropic:claude-opus-4-7 · confidence low
Out[26]:

saturn.columns["created_meta"].stats

statvalue
n45,716
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

updated_at numeric timestamp

This column is almost certainly a Unix epoch timestamp recording a row update time, with the single value 1446143734 (late 2015) repeated across all 45716 rows. With n_unique=1, std=0, and identical min/median/max, it carries no information—every record was stamped at the same instant, suggesting a bulk export or a field that was never actually updated per-row.

Treatment: Drop; constant column provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["updated_at"].stats

statvalue
n45,716
nulls0 (0.0%)
unique1
min 1.446e+09
max 1.446e+09
mean 1.446e+09
median 1.446e+09
std 0
q1 1.446e+09
q3 1.446e+09
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 13.
Distribution of updated_at. Vertical dash marks the median.
Show data table
Histogram bins for updated_at (median: 1446143734.0).
bincount
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+0945716
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090
1.446e+09 – 1.446e+090

updated_meta unknown metadata

The column `updated_meta` was skipped by the profiler, so no type inference, uniqueness count, or value statistics are available. The only confirmed signals are 45716 rows with a null rate of 0.0, but the actual content and structure remain uncharacterised. The name suggests it may hold update-related metadata (e.g., a timestamp, user, or nested struct), yet this is not supported by evidence.

Treatment: Re-profile with an appropriate parser before deciding; do not feed into modelling until its type is known.

anthropic:claude-opus-4-7 · confidence low
Out[31]:

saturn.columns["updated_meta"].stats

statvalue
n45,716
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

meta categorical metadata

This 'meta' column is a constant placeholder: every one of the 45,716 rows holds the same '{ }' value, giving a cardinality of 1 and entropy of 0. There is no information to extract here, likely a vestigial JSON metadata field that was never populated.

Treatment: Drop; the column is constant and carries zero signal.

anthropic:claude-opus-4-7 · confidence high
Out[33]:

saturn.columns["meta"].stats

statvalue
n45,716
nulls0 (0.0%)
unique1
top_value { }
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 14.
Top values for meta.
Show data table
Top values for meta (1 unique shown, of 1 total).
valuecountshare
{ }45716100.0%

name text identifier

This is a short text column of place or feature names — every one of 45,716 rows is unique with zero nulls, averaging 17.8 characters and 2.8 words. Top tokens like 'yamato', 'range', 'northwest', 'hills', 'mountains', 'queen alexandra', and 'grove' suggest geographic/toponymic entries (mountain ranges, hills, regions). With n_unique equal to n, it functions as an identifier rather than a categorical feature.

Treatment: Drop for modelling; retain as a label/key for joins or display.

anthropic:claude-opus-4-7 · confidence high
Out[36]:

saturn.columns["name"].stats

statvalue
n45,716
nulls0 (0.0%)
unique45,716
len_min 2
len_max 28
len_mean 17.78
len_median 19
len_p95 27
word_mean 2.772
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 17,917
readability_flesch_mean 63.74
emoji_rate 0
url_rate 0
one_word_rate 0.04749
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 15.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 17.78460495231429).
charscount
2 – 32
3 – 316
3 – 40
4 – 597
5 – 5224
5 – 60
6 – 7420
7 – 7430
7 – 80
8 – 8449
8 – 9958
9 – 100
10 – 101392
10 – 111235
11 – 120
12 – 123961
12 – 135478
13 – 140
14 – 14297
14 – 150
15 – 161217
16 – 16237
16 – 170
17 – 182772
18 – 182194
18 – 190
19 – 201826
20 – 203427
20 – 210
21 – 228562
22 – 225145
22 – 230
23 – 231466
23 – 2433
24 – 250
25 – 25405
25 – 2621
26 – 270
27 – 273398
27 – 2854

id_1 numeric identifier

id_1 is almost certainly a row identifier: 45716 unique values across 45716 rows, no nulls, ranging from 1 to 57458 with a near-uniform spread (kurtosis -1.16, mild skew 0.27). The fact that the max (57458) exceeds the row count suggests gaps in the sequence, consistent with a primary key carried over from a larger source table.

Treatment: drop from modelling; retain only for joins or row tracing.

anthropic:claude-opus-4-7 · confidence high
Out[39]:

saturn.columns["id_1"].stats

statvalue
n45,716
nulls0 (0.0%)
unique45,716
min 1
max 57,458
mean 2.689e+04
median 2.426e+04
std 1.686e+04
q1 1.269e+04
q3 4.066e+04
iqr 27,968
skew 0.2665
kurtosis -1.16
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 16.
Distribution of id_1. Vertical dash marks the median.
Show data table
Histogram bins for id_1 (median: 24261.5).
bincount
1 – 14371354
1437 – 28741151
2874 – 4310814
4310 – 57471270
5747 – 71831416
7183 – 86201428
8620 – 1.006e+041433
1.006e+04 – 1.149e+041404
1.149e+04 – 1.293e+041394
1.293e+04 – 1.437e+041437
1.437e+04 – 1.58e+041415
1.58e+04 – 1.724e+041414
1.724e+04 – 1.867e+041420
1.867e+04 – 2.011e+041423
2.011e+04 – 2.155e+041437
2.155e+04 – 2.298e+041432
2.298e+04 – 2.442e+041368
2.442e+04 – 2.586e+041296
2.586e+04 – 2.729e+041368
2.729e+04 – 2.873e+04900
2.873e+04 – 3.017e+041368
3.017e+04 – 3.16e+041078
3.16e+04 – 3.304e+04529
3.304e+04 – 3.448e+04763
3.448e+04 – 3.591e+041205
3.591e+04 – 3.735e+041049
3.735e+04 – 3.878e+04582
3.878e+04 – 4.022e+04810
4.022e+04 – 4.166e+04392
4.166e+04 – 4.309e+040
4.309e+04 – 4.453e+04186
4.453e+04 – 4.597e+041300
4.597e+04 – 4.74e+041281
4.74e+04 – 4.884e+041413
4.884e+04 – 5.028e+041129
5.028e+04 – 5.171e+041274
5.171e+04 – 5.315e+041022
5.315e+04 – 5.459e+041219
5.459e+04 – 5.602e+041275
5.602e+04 – 5.746e+041267

nametype categorical feature

This is a binary categorical flag distinguishing meteorite name types, with values 'Valid' and 'Relict'. The distribution is extremely lopsided: 45,641 of 45,716 rows (99.84%) are 'Valid' and only 75 are 'Relict', yielding an entropy ratio of just 0.018. With effectively no variation, this column carries almost no information for modelling.

Treatment: Drop or retain only as a rare-event indicator; near-constant for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[42]:

saturn.columns["nametype"].stats

statvalue
n45,716
nulls0 (0.0%)
unique2
top_value Valid
top_rate 0.9984
cardinality 2
entropy 0.01754
entropy_ratio 0.01754
alert: imbalancetop value is 99.8% of rows
Fig 17.
Top values for nametype.
Show data table
Top values for nametype (2 unique shown, of 2 total).
valuecountshare
Valid4564199.8%
Relict750.2%

recclass categorical label

This column holds meteorite classification codes (recclass), with 466 distinct classes across 45,716 records and no nulls. The distribution is dominated by ordinary chondrites: L6 (18.1%), H5, L5, H6, and H4 together account for the bulk of records, while the long tail (entropy ratio 0.51) includes rare classes like CM2 with only 416 entries. High cardinality combined with concentrated top categories suggests a classic taxonomic hierarchy (H/L/LL groups with petrologic types).

Treatment: Group rare classes into an 'other' bucket or roll up to parent groups (H/L/LL/C) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[45]:

saturn.columns["recclass"].stats

statvalue
n45,716
nulls0 (0.0%)
unique466
top_value L6
top_rate 0.1812
cardinality 466
entropy 4.548
entropy_ratio 0.5131
Fig 18.
Top values for recclass.
Show data table
Top values for recclass (20 unique shown, of 466 total).
valuecountshare
L6828518.1%
H5714215.6%
L5479610.5%
H645289.9%
H442119.2%
LL527666.1%
LL620434.5%
L412532.7%
H4/54280.9%
CM24160.9%
H33860.8%
L33650.8%
CO33350.7%
Ureilite3000.7%
Iron, IIIAB2850.6%
LL42680.6%
CV32560.6%
Diogenite2410.5%
Howardite2400.5%
LL2250.5%

mass (g) numeric feature

Numeric mass measurements in grams across 45,716 rows, with a median of just 32.6g but a maximum of 60,000,000g — a 6-order-of-magnitude span. The distribution is extremely heavy-tailed (skew 76.9, kurtosis ~6796) and 15.5% of values flag as outliers, while the std (574,988) dwarfs the IQR (195.4). Nulls (0.29%) and zeros (0.04%) are negligible.

Treatment: log-transform before any modelling or distance-based analysis.

anthropic:claude-opus-4-7 · confidence high
Out[48]:

saturn.columns["mass (g)"].stats

statvalue
n45,716
nulls131 (0.3%)
unique12,576
min 0
max 6e+07
mean 1.328e+04
median 32.6
std 5.75e+05
q1 7.2
q3 202.6
iqr 195.4
skew 76.91
kurtosis 6796
n_outliers 7,086
outlier_rate 0.1554
zero_rate 0.0004168
alert: high_skewskew=+76.91
alert: outliers15.5% rows beyond 1.5 IQR
Fig 19.
Distribution of mass (g). Vertical dash marks the median.
Show data table
Histogram bins for mass (g) (median: 32.6).
bincount
0 – 1.5e+0645544
1.5e+06 – 3e+0616
3e+06 – 4.5e+068
4.5e+06 – 6e+061
6e+06 – 7.5e+061
7.5e+06 – 9e+061
9e+06 – 1.05e+072
1.05e+07 – 1.2e+070
1.2e+07 – 1.35e+070
1.35e+07 – 1.5e+070
1.5e+07 – 1.65e+072
1.65e+07 – 1.8e+070
1.8e+07 – 1.95e+070
1.95e+07 – 2.1e+070
2.1e+07 – 2.25e+071
2.25e+07 – 2.4e+071
2.4e+07 – 2.55e+072
2.55e+07 – 2.7e+071
2.7e+07 – 2.85e+071
2.85e+07 – 3e+070
3e+07 – 3.15e+071
3.15e+07 – 3.3e+070
3.3e+07 – 3.45e+070
3.45e+07 – 3.6e+070
3.6e+07 – 3.75e+070
3.75e+07 – 3.9e+070
3.9e+07 – 4.05e+070
4.05e+07 – 4.2e+070
4.2e+07 – 4.35e+070
4.35e+07 – 4.5e+070
4.5e+07 – 4.65e+070
4.65e+07 – 4.8e+070
4.8e+07 – 4.95e+070
4.95e+07 – 5.1e+071
5.1e+07 – 5.25e+070
5.25e+07 – 5.4e+070
5.4e+07 – 5.55e+070
5.55e+07 – 5.7e+070
5.7e+07 – 5.85e+071
5.85e+07 – 6e+071

fall categorical feature

Binary categorical flag distinguishing meteorites that were observed falling versus those found later, with only two values: "Found" and "Fell". The split is severely imbalanced — "Found" accounts for 44609 of 45716 rows (top_rate 0.9758) while "Fell" has just 1107, yielding an entropy_ratio of 0.164. No nulls are present.

Treatment: Encode as binary; stratify or rebalance before modelling given the 40:1 class skew.

anthropic:claude-opus-4-7 · confidence high
Out[51]:

saturn.columns["fall"].stats

statvalue
n45,716
nulls0 (0.0%)
unique2
top_value Found
top_rate 0.9758
cardinality 2
entropy 0.1645
entropy_ratio 0.1645
alert: imbalancetop value is 97.6% of rows
Fig 20.
Top values for fall.
Show data table
Top values for fall (2 unique shown, of 2 total).
valuecountshare
Found4460997.6%
Fell11072.4%

year categorical timestamp

Stored as January-1 timestamps, this column encodes a year-of-record across 45,716 rows with 266 distinct values and a 0.64% null rate. Despite being labeled 'year', the values are full datetimes pinned to YYYY-01-01, which will surprise anyone expecting integer years. The distribution is moderately spread (entropy ratio 0.66) with 2003 the modal year at 7.3% of rows, followed by 1979 and 1998.

Treatment: Cast to integer year (or proper date) before using as a temporal feature.

anthropic:claude-opus-4-7 · confidence high
Out[54]:

saturn.columns["year"].stats

statvalue
n45,716
nulls291 (0.6%)
unique266
top_value 2003-01-01T00:00:00
top_rate 0.07315
cardinality 266
entropy 5.299
entropy_ratio 0.6578
Fig 21.
Top values for year.
Show data table
Top values for year (20 unique shown, of 266 total).
valuecountshare
2003-01-01T00:00:0033237.3%
1979-01-01T00:00:0030466.7%
1998-01-01T00:00:0026975.9%
2006-01-01T00:00:0024565.4%
1988-01-01T00:00:0022965.0%
2002-01-01T00:00:0020784.5%
2004-01-01T00:00:0019404.2%
2000-01-01T00:00:0017923.9%
1997-01-01T00:00:0016963.7%
1999-01-01T00:00:0016913.7%
2001-01-01T00:00:0016503.6%
1990-01-01T00:00:0015183.3%
2009-01-01T00:00:0014973.3%
1986-01-01T00:00:0013753.0%
2007-01-01T00:00:0011892.6%
2010-01-01T00:00:0010052.2%
1993-01-01T00:00:009792.1%
2008-01-01T00:00:009572.1%
1987-01-01T00:00:009162.0%
1991-01-01T00:00:008771.9%

reclat numeric feature

This is the meteorite reception latitude in decimal degrees, ranging from -87.37 to 81.17. The distribution leans heavily toward the southern hemisphere with a median of -71.5 and a Q3 of exactly 0.0, and 16.8% of values are exactly zero — likely placeholder/unknown coordinates rather than the equator. About 16% of rows are null, and the bimodal-feeling shape (kurtosis -1.48) suggests clusters in Antarctica and elsewhere.

Treatment: Treat exact zeros as missing and pair with reclong for geospatial use.

anthropic:claude-opus-4-7 · confidence high
Out[57]:

saturn.columns["reclat"].stats

statvalue
n45,716
nulls7,315 (16.0%)
unique12,738
min -87.37
max 81.17
mean -39.12
median -71.5
std 46.38
q1 -76.71
q3 0
iqr 76.71
skew 0.4916
kurtosis -1.477
n_outliers 0
outlier_rate 0
zero_rate 0.1677
Fig 22.
Distribution of reclat. Vertical dash marks the median.
Show data table
Histogram bins for reclat (median: -71.5).
bincount
-87.37 – -83.157090
-83.15 – -78.941218
-78.94 – -74.734083
-74.73 – -70.519707
-70.51 – -66.31
-66.3 – -62.090
-62.09 – -57.870
-57.87 – -53.661
-53.66 – -49.450
-49.45 – -45.233
-45.23 – -41.0211
-41.02 – -36.8127
-36.81 – -32.5991
-32.59 – -28.38550
-28.38 – -24.17436
-24.17 – -19.9593
-19.95 – -15.7435
-15.74 – -11.5318
-11.53 – -7.31319
-7.313 – -3.124
-3.1 – 1.1136448
1.113 – 5.32715
5.327 – 9.5419
9.54 – 13.7555
13.75 – 17.9740
17.97 – 22.183197
22.18 – 26.39315
26.39 – 30.612239
30.61 – 34.82859
34.82 – 39.03649
39.03 – 43.25403
43.25 – 47.46230
47.46 – 51.67196
51.67 – 55.89155
55.89 – 60.1119
60.1 – 64.3130
64.31 – 68.5317
68.53 – 72.744
72.74 – 76.953
76.95 – 81.171

reclong numeric feature

Longitude coordinate for meteorite recovery sites, ranging from -165.43 to 354.47 with median 35.67. The maximum exceeding 180 is anomalous for standard longitude and suggests un-normalized or erroneous values, and the 16.2% zero rate aligns suspiciously with the 16% null rate, hinting that missing coordinates were coded as 0.

Treatment: Normalize longitudes to [-180,180], treat 0/0 pairs as missing, then use as a geospatial feature.

anthropic:claude-opus-4-7 · confidence high
Out[60]:

saturn.columns["reclong"].stats

statvalue
n45,716
nulls7,315 (16.0%)
unique14,640
min -165.4
max 354.5
mean 61.07
median 35.67
std 80.65
q1 0
q3 157.2
iqr 157.2
skew -0.1745
kurtosis -0.7312
n_outliers 0
outlier_rate 0
zero_rate 0.1618
Fig 23.
Distribution of reclong. Vertical dash marks the median.
Show data table
Histogram bins for reclong (median: 35.66667).
bincount
-165.4 – -152.430
-152.4 – -139.4228
-139.4 – -126.46
-126.4 – -113.4444
-113.4 – -100.4795
-100.4 – -87.45462
-87.45 – -74.45214
-74.45 – -61.451386
-61.45 – -48.4557
-48.45 – -35.4633
-35.46 – -22.462
-22.46 – -9.46176
-9.461 – 3.5366696
3.536 – 16.532208
16.53 – 29.531782
29.53 – 42.535243
42.53 – 55.531818
55.53 – 68.521420
68.52 – 81.522616
81.52 – 94.5278
94.52 – 107.545
107.5 – 120.5131
120.5 – 133.5483
133.5 – 146.5178
146.5 – 159.54052
159.5 – 172.57724
172.5 – 185.5193
185.5 – 198.50
198.5 – 211.50
211.5 – 224.50
224.5 – 237.50
237.5 – 250.50
250.5 – 263.50
263.5 – 276.50
276.5 – 289.50
289.5 – 302.50
302.5 – 315.50
315.5 – 328.50
328.5 – 341.50
341.5 – 354.51

GeoLocation text feature

Serialised Python list literals encoding geolocation tuples of the form [None, lat, lon, None, False], with 45716 rows, 16% nulls and only 17100 distinct values. Duplication is severe (duplicate_rate 0.55, 21301 duplicates), and the top value '[None, 0.0, 0.0, None, False]' appears 6214 times suggesting placeholder coordinates. Lengths are tightly bounded (min 33, max 47) consistent with a fixed serialisation rather than free text.

Treatment: Parse the list literal into separate latitude and longitude numeric columns and treat 0.0/0.0 as missing.

anthropic:claude-opus-4-7 · confidence high
Out[63]:

saturn.columns["GeoLocation"].stats

statvalue
n45,716
nulls7,315 (16.0%)
unique17,100
len_min 33
len_max 47
len_mean 40.3
len_median 41
len_p95 45
word_mean 5
word_median 5
n_empty 0
n_duplicates 21,301
duplicate_rate 0.5547
vocab_size 15,461
readability_flesch_mean 117.2
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: duplicates55.5% duplicate strings
Fig 24.
Character-length distribution for GeoLocation.
Show data table
Character-length distribution for GeoLocation (mean: 40.3046535246478).
charscount
33 – 336214
33 – 340
34 – 3427
34 – 340
34 – 350
35 – 35139
35 – 350
35 – 360
36 – 361693
36 – 360
36 – 370
37 – 373283
37 – 380
38 – 380
38 – 38527
38 – 390
39 – 390
39 – 39488
39 – 400
40 – 400
40 – 405488
40 – 410
41 – 412086
41 – 410
41 – 420
42 – 423512
42 – 420
42 – 430
43 – 433760
43 – 440
44 – 440
44 – 442811
44 – 450
45 – 450
45 – 457678
45 – 460
46 – 460
46 – 46672
46 – 470
47 – 4723

States numeric feature

Numeric column 'States' takes 45 distinct integer values between 1 and 51 with a median of 15, strongly suggesting encoded US state identifiers rather than a true quantity. The column is 96.37% null, so it carries information for fewer than 4% of rows, and the right skew (1.11) reflects uneven coverage across the encoded states. Treating the mean of 17.3 as meaningful would be a mistake given the categorical nature.

Treatment: Cast to categorical state codes and impute or flag the 96% missing before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[66]:

saturn.columns["States"].stats

statvalue
n45,716
nulls44,057 (96.4%)
unique45
min 1
max 51
mean 17.34
median 15
std 10.41
q1 9
q3 23
iqr 14
skew 1.115
kurtosis 0.6891
n_outliers 40
outlier_rate 0.02411
zero_rate 0
alert: null_rate96.4% null
Fig 25.
Distribution of States. Vertical dash marks the median.
Show data table
Histogram bins for States (median: 15.0).
bincount
1 – 2.2513
2.25 – 3.59
3.5 – 4.752
4.75 – 66
6 – 7.25125
7.25 – 8.5224
8.5 – 9.7587
9.75 – 1195
11 – 12.25229
12.25 – 13.520
13.5 – 14.7514
14.75 – 1615
16 – 17.25146
17.25 – 18.523
18.5 – 19.7549
19.75 – 2140
21 – 22.2519
22.25 – 23.5297
23.5 – 24.754
24.75 – 261
26 – 27.250
27.25 – 28.50
28.5 – 29.7517
29.75 – 315
31 – 32.2529
32.25 – 33.56
33.5 – 34.7510
34.75 – 3612
36 – 37.2555
37.25 – 38.512
38.5 – 39.7523
39.75 – 4114
41 – 42.2518
42.25 – 43.50
43.5 – 44.750
44.75 – 463
46 – 47.2511
47.25 – 48.58
48.5 – 49.755
49.75 – 5113

Counties numeric feature

Numeric column 'Counties' is populated for only 3.6% of the 45,716 rows (null_rate 0.9637), with 662 unique values ranging from 5 to 3210 and a roughly symmetric distribution (skew 0.24, kurtosis -1.19, mean 1353 vs median 1195). The values look like county counts or county FIPS-style codes rather than a continuous measurement, and the overwhelming sparsity is the headline issue. No outliers or zeros are flagged.

Treatment: Impute or add a missingness indicator; given 96% nulls, consider dropping unless the populated subset is analytically meaningful.

anthropic:claude-opus-4-7 · confidence medium
Out[69]:

saturn.columns["Counties"].stats

statvalue
n45,716
nulls44,057 (96.4%)
unique662
min 5
max 3,210
mean 1353
median 1,195
std 994.1
q1 482
q3 2,113
iqr 1,631
skew 0.2374
kurtosis -1.19
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate96.4% null
Fig 26.
Distribution of Counties. Vertical dash marks the median.
Show data table
Histogram bins for Counties (median: 1195.0).
bincount
5 – 85.12303
85.12 – 165.28
165.2 – 245.417
245.4 – 325.526
325.5 – 405.620
405.6 – 485.864
485.8 – 565.98
565.9 – 64634
646 – 726.120
726.1 – 806.2113
806.2 – 886.459
886.4 – 966.546
966.5 – 104749
1047 – 112725
1127 – 120740
1207 – 128757
1287 – 136728
1367 – 144725
1447 – 152716
1527 – 160810
1608 – 168813
1688 – 176811
1768 – 18488
1848 – 192813
1928 – 2008198
2008 – 208825
2088 – 216821
2168 – 224828
2248 – 232928
2329 – 240969
2409 – 248918
2489 – 256934
2569 – 264911
2649 – 272934
2729 – 280919
2809 – 289018
2890 – 297028
2970 – 305019
3050 – 313024
3130 – 321072

How to cite

click to copy

BibTeX
@misc{saturn-wild-nasa-meteorites-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: wild nasa meteorites},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/wild-nasa_meteorites}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: wild nasa meteorites. Source: /home/coolhand/html/datavis/data_trove/data/wild/nasa_meteorites.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/wild-nasa_meteorites