saturn

/home/coolhand/html/datavis/data_trove/data/wild/nasa_meteorites.csv 45,716 rows sample n=45,716 seed 42 2026-05-01T23:01:37+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/wild/nasa_meteorites.csv
Total rows	45,716
Profiled sample	45,716
Columns	20
Generated	2026-05-01T23:01:37+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is a NASA meteorites dataset with 45,716 records and 20 columns covering each meteorite's name, classification, mass, fall type, year, and geographic coordinates. The most interesting signals are physical and categorical: mass (g) is extremely skewed (mean ~13,278g vs median 32.6g, max 60,000,000g) with ~15.5% flagged as outliers, and recclass is dominated by ordinary chondrites (L6 at 18.1%, followed by H5, L5, H6, H4). The fall column is heavily imbalanced — 97.6% 'Found' vs 2.4% 'Fell' — and year shows a clear concentration in recent decades, peaking at 2003 (3,323 records). Note that Counties and States are 96% null, several columns (created_at, updated_at, position, meta) are constant and can be ignored, and GeoLocation has 55% duplicate values driven by a few repeated Antarctic coordinates.

sid high anthropic:claude-opus-4-7

This is a synthetic row identifier: every one of the 45716 values is unique, exactly 18 characters long, single-token, and follows a 'row-xxxx-xxxx-xxxx' pattern. There are no nulls, duplicates, or empties, confirming it functions as a primary key rather than a feature.

id high anthropic:claude-opus-4-7

This column is a row identifier holding 36-character UUID-style strings, all uppercase and one token wide. Every one of the 45,716 values is unique with zero nulls or duplicates, and length is fixed at exactly 36 characters across min, median, and max. The shared `00000000-0000-0000-` prefix on all sampled values is notable — only the latter half of each UUID varies, suggesting a namespaced or truncated-entropy ID scheme rather than fully random v4 UUIDs.

position high anthropic:claude-opus-4-7

The column 'position' is numeric but holds a single value across all 45716 rows: every entry is 0, giving a zero_rate of 1.0 and n_unique of 1. With zero variance (std 0.0, iqr 0.0), it carries no information for any downstream task.

created_at high anthropic:claude-opus-4-7

This column appears to be a Unix epoch creation timestamp (1446143734 corresponds to a single moment in late 2015), stored as a numeric value. Across all 45716 rows it holds exactly one value, with std 0.0 and n_unique 1, so it carries no information to differentiate records. The 'constant' alert confirms there is no variation to model or filter on.

created_meta low anthropic:claude-opus-4-7

The column `created_meta` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 45716 and a null_rate of 0.0. The name suggests it carries creation-time metadata (e.g., a user id or system tag attached to record creation), but this cannot be confirmed from the evidence. No further signal is present to assess distribution, uniqueness, or drift.

updated_at high anthropic:claude-opus-4-7

This column is almost certainly a Unix epoch timestamp recording a row update time, with the single value 1446143734 (late 2015) repeated across all 45716 rows. With n_unique=1, std=0, and identical min/median/max, it carries no information—every record was stamped at the same instant, suggesting a bulk export or a field that was never actually updated per-row.

updated_meta low anthropic:claude-opus-4-7

The column `updated_meta` was skipped by the profiler, so no type inference, uniqueness count, or value statistics are available. The only confirmed signals are 45716 rows with a null rate of 0.0, but the actual content and structure remain uncharacterised. The name suggests it may hold update-related metadata (e.g., a timestamp, user, or nested struct), yet this is not supported by evidence.

meta high anthropic:claude-opus-4-7

This 'meta' column is a constant placeholder: every one of the 45,716 rows holds the same '{ }' value, giving a cardinality of 1 and entropy of 0. There is no information to extract here, likely a vestigial JSON metadata field that was never populated.

name high anthropic:claude-opus-4-7

This is a short text column of place or feature names — every one of 45,716 rows is unique with zero nulls, averaging 17.8 characters and 2.8 words. Top tokens like 'yamato', 'range', 'northwest', 'hills', 'mountains', 'queen alexandra', and 'grove' suggest geographic/toponymic entries (mountain ranges, hills, regions). With n_unique equal to n, it functions as an identifier rather than a categorical feature.

id_1 high anthropic:claude-opus-4-7

id_1 is almost certainly a row identifier: 45716 unique values across 45716 rows, no nulls, ranging from 1 to 57458 with a near-uniform spread (kurtosis -1.16, mild skew 0.27). The fact that the max (57458) exceeds the row count suggests gaps in the sequence, consistent with a primary key carried over from a larger source table.

nametype high anthropic:claude-opus-4-7

This is a binary categorical flag distinguishing meteorite name types, with values 'Valid' and 'Relict'. The distribution is extremely lopsided: 45,641 of 45,716 rows (99.84%) are 'Valid' and only 75 are 'Relict', yielding an entropy ratio of just 0.018. With effectively no variation, this column carries almost no information for modelling.

recclass high anthropic:claude-opus-4-7

This column holds meteorite classification codes (recclass), with 466 distinct classes across 45,716 records and no nulls. The distribution is dominated by ordinary chondrites: L6 (18.1%), H5, L5, H6, and H4 together account for the bulk of records, while the long tail (entropy ratio 0.51) includes rare classes like CM2 with only 416 entries. High cardinality combined with concentrated top categories suggests a classic taxonomic hierarchy (H/L/LL groups with petrologic types).

mass (g) high anthropic:claude-opus-4-7

Numeric mass measurements in grams across 45,716 rows, with a median of just 32.6g but a maximum of 60,000,000g — a 6-order-of-magnitude span. The distribution is extremely heavy-tailed (skew 76.9, kurtosis ~6796) and 15.5% of values flag as outliers, while the std (574,988) dwarfs the IQR (195.4). Nulls (0.29%) and zeros (0.04%) are negligible.

fall high anthropic:claude-opus-4-7

Binary categorical flag distinguishing meteorites that were observed falling versus those found later, with only two values: "Found" and "Fell". The split is severely imbalanced — "Found" accounts for 44609 of 45716 rows (top_rate 0.9758) while "Fell" has just 1107, yielding an entropy_ratio of 0.164. No nulls are present.

year high anthropic:claude-opus-4-7

Stored as January-1 timestamps, this column encodes a year-of-record across 45,716 rows with 266 distinct values and a 0.64% null rate. Despite being labeled 'year', the values are full datetimes pinned to YYYY-01-01, which will surprise anyone expecting integer years. The distribution is moderately spread (entropy ratio 0.66) with 2003 the modal year at 7.3% of rows, followed by 1979 and 1998.

reclat high anthropic:claude-opus-4-7

This is the meteorite reception latitude in decimal degrees, ranging from -87.37 to 81.17. The distribution leans heavily toward the southern hemisphere with a median of -71.5 and a Q3 of exactly 0.0, and 16.8% of values are exactly zero — likely placeholder/unknown coordinates rather than the equator. About 16% of rows are null, and the bimodal-feeling shape (kurtosis -1.48) suggests clusters in Antarctica and elsewhere.

reclong high anthropic:claude-opus-4-7

Longitude coordinate for meteorite recovery sites, ranging from -165.43 to 354.47 with median 35.67. The maximum exceeding 180 is anomalous for standard longitude and suggests un-normalized or erroneous values, and the 16.2% zero rate aligns suspiciously with the 16% null rate, hinting that missing coordinates were coded as 0.

GeoLocation high anthropic:claude-opus-4-7

Serialised Python list literals encoding geolocation tuples of the form [None, lat, lon, None, False], with 45716 rows, 16% nulls and only 17100 distinct values. Duplication is severe (duplicate_rate 0.55, 21301 duplicates), and the top value '[None, 0.0, 0.0, None, False]' appears 6214 times suggesting placeholder coordinates. Lengths are tightly bounded (min 33, max 47) consistent with a fixed serialisation rather than free text.

States high anthropic:claude-opus-4-7

Numeric column 'States' takes 45 distinct integer values between 1 and 51 with a median of 15, strongly suggesting encoded US state identifiers rather than a true quantity. The column is 96.37% null, so it carries information for fewer than 4% of rows, and the right skew (1.11) reflects uneven coverage across the encoded states. Treating the mean of 17.3 as meaningful would be a mistake given the categorical nature.

Counties medium anthropic:claude-opus-4-7

Numeric column 'Counties' is populated for only 3.6% of the 45,716 rows (null_rate 0.9637), with 662 unique values ranging from 5 to 3210 and a roughly symmetric distribution (skew 0.24, kurtosis -1.19, mean 1353 vs median 1195). The values look like county counts or county FIPS-style codes rather than a continuous measurement, and the overwhelming sparsity is the headline issue. No outliers or zeros are flagged.

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

sid text

100.0% of rows are unique strings 100.0% rows are a single word 95th-percentile length under 20 chars

rows45,716

null0 (0.0%)

unique45,716

len_min18

len_max18

len_mean18.000

len_median18.000

len_p9518.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size20,000

readability_flesch_mean-5.680

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

row-r47i_enas-8d5d
row-2gxt~mwbj.kygv
row-97xa-5e59-vagr
row-7cp3~x6x4.vm6k
row-3wek_jm8i.pni8
row-y6uh~zk3x.wra8
row-t5mj_vvcr~wn2s
row-ki6c~wwdn-e92u
row-8f4u.pck5.95b7
row-7i8i-ffdi~r4gj

id text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps

rows45,716

null0 (0.0%)

unique45,716

len_min36

len_max36

len_mean36.000

len_median36.000

len_p9536.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size20,000

readability_flesch_mean65.384

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Sample values (first 10)

00000000-0000-0000-160C-D58AE0A2ECD9
00000000-0000-0000-D601-94E78E43026D
00000000-0000-0000-5666-66D2DD76BE81
00000000-0000-0000-FAAF-5F9E2B6945C2
00000000-0000-0000-7DCE-D65B4702C6F1
00000000-0000-0000-A883-F7AB796E948C
00000000-0000-0000-8F12-152021CC7C30
00000000-0000-0000-B720-5ED557EDEF92
00000000-0000-0000-57BA-C8C8DC37D4C8
00000000-0000-0000-7DF4-0F5269CA84E7

position numeric

only one distinct value

rows45,716

null0 (0.0%)

unique1

min0.000

max0.000

mean0.000

median0.000

std0.000

q10.000

q30.000

iqr0.000

skew0.000

kurtosis0.000

n_outliers0

outlier_rate0.000

zero_rate1.000

created_at numeric

only one distinct value

rows45,716

null0 (0.0%)

unique1

min1,446,143,734

max1,446,143,734

mean1,446,143,734

median1,446,143,734

std0.000

q11,446,143,734

q31,446,143,734

iqr0.000

skew0.000

kurtosis0.000

n_outliers0

outlier_rate0.000

zero_rate0.000

created_meta unknown

no profiler for kind=unknown

rows45,716

null0 (0.0%)

updated_at numeric

only one distinct value

rows45,716

null0 (0.0%)

unique1

min1,446,143,734

max1,446,143,734

mean1,446,143,734

median1,446,143,734

std0.000

q11,446,143,734

q31,446,143,734

iqr0.000

skew0.000

kurtosis0.000

n_outliers0

outlier_rate0.000

zero_rate0.000

updated_meta unknown

no profiler for kind=unknown

rows45,716

null0 (0.0%)

meta categorical

top value is 100.0% of rows

rows45,716

null0 (0.0%)

unique1

top_value{ }

top_rate1.000

cardinality1

entropy-0.000

entropy_ratio0.000

Top values (rank 1–20)

{ } — 45,716

name text

100.0% of rows are unique strings

rows45,716

null0 (0.0%)

unique45,716

len_min2

len_max28

len_mean17.785

len_median19.000

len_p9527.000

word_mean2.772

word_median3.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size17,917

readability_flesch_mean63.744

emoji_rate0.000

url_rate0.000

one_word_rate0.047

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

Atoka
Queen Alexandra Range 97947
Yamato 790306
Superior Valley 005
Larkman Nunatak 06417
Yamato 982185
Lewis Cliff 86041
Yamato 791912
Tanezrouft 033
Al Huwaysah 005

id_1 numeric

rows45,716

null0 (0.0%)

unique45,716

min1.000

max57,458

mean26,890

median24,262

std16,861

q112,689

q340,657

iqr27,968

skew0.267

kurtosis-1.160

n_outliers0

outlier_rate0.000

zero_rate0.000

nametype categorical

top value is 99.8% of rows

rows45,716

null0 (0.0%)

unique2

top_valueValid

top_rate0.998

cardinality2

entropy0.018

entropy_ratio0.018

Top values (rank 1–20)

Valid — 45,641
Relict — 75

recclass categorical

rows45,716

null0 (0.0%)

unique466

top_valueL6

top_rate0.181

cardinality466

entropy4.548

entropy_ratio0.513

Top values (rank 1–20)

L6 — 8,285
H5 — 7,142
L5 — 4,796
H6 — 4,528
H4 — 4,211
LL5 — 2,766
LL6 — 2,043
L4 — 1,253
H4/5 — 428
CM2 — 416
H3 — 386
L3 — 365
CO3 — 335
Ureilite — 300
Iron, IIIAB — 285
LL4 — 268
CV3 — 256
Diogenite — 241
Howardite — 240
LL — 225

mass (g) numeric

skew=+76.91 15.5% rows beyond 1.5 IQR

rows45,716

null131 (0.3%)

unique12,576

min0.000

max60,000,000

mean13,278

median32.600

std574,989

q17.200

q3202.600

iqr195.400

skew76.908

kurtosis6,796

n_outliers7,086

outlier_rate0.155

zero_rate4.17e-04

fall categorical

top value is 97.6% of rows

rows45,716

null0 (0.0%)

unique2

top_valueFound

top_rate0.976

cardinality2

entropy0.164

entropy_ratio0.164

Top values (rank 1–20)

Found — 44,609
Fell — 1,107

year categorical

rows45,716

null291 (0.6%)

unique266

top_value2003-01-01T00:00:00

top_rate0.073

cardinality266

entropy5.299

entropy_ratio0.658

Top values (rank 1–20)

2003-01-01T00:00:00 — 3,323
1979-01-01T00:00:00 — 3,046
1998-01-01T00:00:00 — 2,697
2006-01-01T00:00:00 — 2,456
1988-01-01T00:00:00 — 2,296
2002-01-01T00:00:00 — 2,078
2004-01-01T00:00:00 — 1,940
2000-01-01T00:00:00 — 1,792
1997-01-01T00:00:00 — 1,696
1999-01-01T00:00:00 — 1,691
2001-01-01T00:00:00 — 1,650
1990-01-01T00:00:00 — 1,518
2009-01-01T00:00:00 — 1,497
1986-01-01T00:00:00 — 1,375
2007-01-01T00:00:00 — 1,189
2010-01-01T00:00:00 — 1,005
1993-01-01T00:00:00 — 979
2008-01-01T00:00:00 — 957
1987-01-01T00:00:00 — 916
1991-01-01T00:00:00 — 877

reclat numeric

rows45,716

null7,315 (16.0%)

unique12,738

min-87.367

max81.167

mean-39.123

median-71.500

std46.379

q1-76.714

q30.000

iqr76.714

skew0.492

kurtosis-1.477

n_outliers0

outlier_rate0.000

zero_rate0.168

reclong numeric

rows45,716

null7,315 (16.0%)

unique14,640

min-165.433

max354.473

mean61.074

median35.667

std80.647

q10.000

q3157.167

iqr157.167

skew-0.174

kurtosis-0.731

n_outliers0

outlier_rate0.000

zero_rate0.162

GeoLocation text

55.5% duplicate strings

rows45,716

null7,315 (16.0%)

unique17,100

len_min33

len_max47

len_mean40.305

len_median41.000

len_p9545.000

word_mean5.000

word_median5.000

n_empty0

n_duplicates21,301

duplicate_rate0.555

vocab_size15,461

readability_flesch_mean117.160

emoji_rate0.000

url_rate0.000

one_word_rate0.000

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

[None, '38.5', '-94.3', None, False]
[None, '-84.0', '168.0', None, False]
[None, '-71.5', '35.66667', None, False]
[None, '-71.5', '35.66667', None, False]
[None, '19.72767', '55.72677', None, False]
[None, '0.0', '0.0', None, False]
[None, '50.375', '21.73333', None, False]
[None, '-71.5', '35.66667', None, False]
[None, '-71.5', '35.66667', None, False]
[None, '27.63556', '4.02528', None, False]

States numeric

96.4% null

rows45,716

null44,057 (96.4%)

unique45

min1.000

max51.000

mean17.338

median15.000

std10.411

q19.000

q323.000

iqr14.000

skew1.115

kurtosis0.689

n_outliers40

outlier_rate0.024

zero_rate0.000

Counties numeric

96.4% null

rows45,716

null44,057 (96.4%)

unique662

min5.000

max3,210

mean1,353

median1,195

std994.089

q1482.000

q32,113

iqr1,631

skew0.237

kurtosis-1.190

n_outliers0

outlier_rate0.000

zero_rate0.000