saturn·

natural hazards meteorites

source /home/coolhand/html/datavis/data_trove/data/natural_hazards/meteorites.json 1,097 rows 10 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This is a 1,097-row catalogue of witnessed meteorite falls, with each record carrying a name, description, date, lat/long coordinates and a meteorite class. Two columns (category and fall_type) are constant — every record is a 'witnessed_meteorite_falls' event with fall_type 'Fell' — so the analytic interest sits elsewhere. Meteorite class is the most informative categorical: 125 distinct classes but heavily concentrated, with L6 alone accounting for ~24% of falls and H5 the next largest at ~15%. Latitude is skewed toward the northern hemisphere (median 36.1, mean 30.0) with ~8% flagged as outliers, while longitude spreads broadly across the globe (-157.9 to 174.4). Start with meteorite_class to understand the dominant compositions, then look at the lat/long pair to see geographic coverage.

citing: row_count · column_count · category.top_value · fall_type.top_value · meteorite_class.n_unique · meteorite_class.top_values · meteorite_class.top_rate · latitude.mean · latitude.median · latitude.outlier_rate · longitude.min · longitude.max · date.n_unique · date.top_values

Schema

10 columns
Per-column summary. Click column name to jump to its detail.
Alerts
latitude numeric 0.0% 958
outliers
longitude numeric 0.0% 1,030
name text 0.0% 1,097
near_unique one_word short_text
description text 0.0% 1,097
near_unique
category categorical 0.0% 1
imbalance
date categorical 1.7% 231
country unknown 0.0%
skipped
mass_g unknown 0.0%
skipped
meteorite_class categorical 0.0% 125
fall_type categorical 0.0% 1
imbalance

latitude

numeric feature outliers
Geographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most inhabited latitudes from southern Australia to northern Scandinavia. The distribution is left-skewed (skew -1.28) with a median of 36.1 sitting well above the mean of 30.04, indicating a northern-hemisphere bias with a tail of southern-hemisphere observations flagged as 90 outliers (8.2%). Near-unique values (958 of 1097) suggest each row is a distinct location. Treatment: Pair with longitude for spatial features; the southern-hemisphere outliers are likely legitimate, not errors. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
958
min
-44.12
max
66.35
mean
30.04
median
36.1
std
23.13
q1
21.87
q3
46.07
iqr
24.2
skew
-1.276
kurtosis
1.01
n_outliers
90
outlier_rate
0.08204
zero_rate
0.001823

longitude

numeric feature
Geographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. The distribution is roughly symmetric (skew -0.23, kurtosis -0.62) and centered near 18.72, suggesting a slight Eurasian/African concentration but broad global coverage. With 1030 unique values across 1097 rows and no nulls, points are nearly all distinct. Treatment: Pair with latitude as a geospatial coordinate; avoid scaling as a plain numeric feature. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,030
min
-157.9
max
174.4
mean
20.13
median
18.72
std
68.87
q1
-4.233
q3
76.27
iqr
80.5
skew
-0.2257
kurtosis
-0.6185
n_outliers
3
outlier_rate
0.002735
zero_rate
0.0009116

name

text identifier near_unique one_word short_text
This is a `name` column with 1097 fully unique short strings (n_unique == n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being a single word. Top words like `st.`, `county`, `san`, `santa`, `creek`, `la`, `el`, `de` strongly suggest these are place or geographic entity names rather than person names, with a Spanish/English mix. No nulls, no URLs, no emoji — clean but effectively an identifier-grade label. Treatment: Treat as a unique label/key; drop from modelling features or use only via geographic enrichment lookup. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,097
len_min
2
len_max
28
len_mean
8.557
len_median
8
len_p95
15
word_mean
1.209
word_median
1
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,238
readability_flesch_mean
40.67
emoji_rate
0
url_rate
0
one_word_rate
0.8295
allcaps_rate
0
boilerplate_rate
0

description

text metadata near_unique
This appears to be a templated, machine-generated description string for meteorite records, with every row containing the tokens 'meteorite', 'mass:', 'found:', and 'fell.' exactly 1097 times. Every value is unique (n_unique=1097, duplicate_rate=0.0) yet length is tightly bounded (46-72 chars, median 53), confirming a fixed schema where only embedded fields like classification (l6. appears 260 times, h5. 163) and numeric values vary. The 'unknown.' token appearing 1099 times signals frequent missing sub-fields packed into the template. Treatment: Parse the template to extract structured fields (class, mass, found/fell) rather than embedding the raw string. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,097
len_min
46
len_max
72
len_mean
54.31
len_median
53
len_p95
64
word_mean
8.254
word_median
8
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,372
readability_flesch_mean
52.62
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

category

categorical metadata imbalance
This column is a constant categorical tag, holding the single value "witnessed_meteorite_falls" across all 1097 rows. With cardinality of 1 and entropy of 0, it carries no information and likely identifies the dataset's provenance or scope rather than describing individual records. Treatment: Drop before modelling; retain only as a dataset-level label. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1
top_value
witnessed_meteorite_falls
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

date

categorical timestamp
Date values stored as ISO strings, all snapped to January 1st of the year — so this is effectively a year-granularity field masquerading as a full date. Across 1097 rows there are 231 distinct years with very high entropy ratio (0.967), and the most common year (1933-01-01) accounts for only 1.6% of rows. Null rate is 1.73%. Treatment: Parse to datetime and extract the year as an integer feature; the month/day component carries no signal. high · anthropic:claude-opus-4-7
n
1,097
nulls
19 (1.7%)
unique
231
top_value
1933-01-01
top_rate
0.01577
cardinality
231
entropy
7.593
entropy_ratio
0.967

country

unknown feature skipped
This column is labeled 'country' and likely holds country names or codes, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. The only confirmed signals are 1097 rows with a 0.0 null rate. No further evidence is available to characterize cardinality, format, or dominant values. Treatment: Re-profile to determine cardinality and format, then encode as a categorical. low · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique

mass_g

unknown other skipped
The column is named mass_g, suggesting a mass measurement in grams across 1,097 rows with no nulls. However, saturn skipped profiling this column and reported no kind, uniqueness, or distribution stats, so its actual content and shape cannot be confirmed from the evidence. Treatment: Re-profile or manually inspect this column before use; current evidence is insufficient to decide handling. low · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique

meteorite_class

categorical label
This column records meteorite classification codes (L6, H5, H6, etc.), the standard taxonomy for ordinary chondrites and other meteorite types. Cardinality is high at 125 distinct classes across 1,097 rows with no nulls, but the distribution is concentrated: L6 alone covers 23.7% and the top four classes (L6, H5, H6, L5) account for over half the data. Entropy ratio of 0.67 confirms a long tail of rare classes that will be sparsely represented. Treatment: Group rare classes into an 'other' bucket or use target encoding before modelling. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
125
top_value
L6
top_rate
0.237
cardinality
125
entropy
4.639
entropy_ratio
0.666

fall_type

categorical metadata imbalance
This column records a fall classification but contains the single value "Fell" across all 1097 rows, with no nulls. Entropy is 0 and cardinality is 1, so it carries no information for any downstream model or segmentation. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1
top_value
Fell
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0