saturn·

witnessed meteorite falls witnessed meteorite falls

source /home/coolhand/datasets/witnessed-meteorite-falls/witnessed_meteorite_falls.json 1,097 rows 10 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset catalogs 1,097 witnessed meteorite falls, with each row identified by a unique name and described by date, geographic coordinates, meteorite class, and a short description. Two columns (category and fall_type) are constants ('witnessed_meteorite_falls' and 'Fell') and offer no analytical value. The most informative dimensions are meteorite_class — heavily dominated by L6 (260 falls, ~24%) followed by H5 (163) and H6 (91) — and the latitude/longitude pair, where latitude skews north (median 36.1) with about 8% outliers and longitude spans the full globe. The date column covers 231 distinct years with 1933 as the most frequent (17 falls), suggesting room for a time-trend exploration.

citing: row_count · column_count · columns.category.stats.top_value · columns.fall_type.stats.top_value · columns.meteorite_class.top_values · columns.meteorite_class.stats.cardinality · columns.latitude.stats · columns.longitude.stats · columns.date.top_values · columns.date.stats.cardinality

Schema

10 columns
Per-column summary. Click column name to jump to its detail.
Alerts
latitude numeric 0.0% 958
outliers
longitude numeric 0.0% 1,030
name text 0.0% 1,097
near_unique one_word short_text
description text 0.0% 1,097
near_unique
category categorical 0.0% 1
imbalance
date categorical 1.7% 231
country unknown 0.0%
skipped
mass_g unknown 0.0%
skipped
meteorite_class categorical 0.0% 125
fall_type categorical 0.0% 1
imbalance

latitude

numeric feature outliers
Geographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most of the inhabited globe. The distribution is left-skewed (skew -1.28) with median 36.1° pulling above the mean of 30.04°, indicating a Northern Hemisphere concentration. Roughly 8.2% of values (90 rows) flag as outliers, likely far-southern points well below the Q1 of 21.87°. Treatment: Pair with longitude for geospatial features; keep outliers as legitimate Southern Hemisphere observations rather than trimming. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
958
min
-44.12
max
66.35
mean
30.04
median
36.1
std
23.13
q1
21.87
q3
46.07
iqr
24.2
skew
-1.276
kurtosis
1.01
n_outliers
90
outlier_rate
0.08204
zero_rate
0.001823

longitude

numeric feature
Geographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. Distribution is broad (std 68.87, IQR 80.5) and only mildly left-skewed (-0.23) with flat tails (kurtosis -0.62), indicating worldwide coverage rather than a single region. 1030 unique values across 1097 rows suggests these are distinct point locations with minimal repetition; no nulls and only 3 outliers. Treatment: Pair with latitude as a geospatial coordinate; avoid treating as a standalone scalar feature. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,030
min
-157.9
max
174.4
mean
20.13
median
18.72
std
68.87
q1
-4.233
q3
76.27
iqr
80.5
skew
-0.2257
kurtosis
-0.6185
n_outliers
3
outlier_rate
0.002735
zero_rate
0.0009116

name

text identifier near_unique one_word short_text
This is a `name` column with 1097 fully unique short strings (n_unique equals n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being single-word entries. Top tokens like `st.`, `county`, `san`, `santa`, `creek`, plus Spanish articles `de`, `la`, `el`, strongly suggest place names (likely US/Latin-influenced toponyms) rather than person names. Every row is distinct, so this functions as an identifier-like label rather than a learnable feature. Treatment: Treat as a unique label/key; drop from modelling features or use only for joins and display. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,097
len_min
2
len_max
28
len_mean
8.557
len_median
8
len_p95
15
word_mean
1.209
word_median
1
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,238
readability_flesch_mean
40.67
emoji_rate
0
url_rate
0
one_word_rate
0.8295
allcaps_rate
0
boilerplate_rate
0

description

text free_text near_unique
Short, templated descriptions of meteorite records — every one of 1097 rows contains the tokens 'meteorite', 'mass:', 'found:', and 'fell.', confirming a generated sentence rather than free prose. Lengths are tight (46–72 chars, mean 54.3, ~8 words) and each row is unique (n_unique=1097, duplicate_rate=0), so the field carries the same signal as the underlying structured columns. Class codes like 'l6.' (260), 'h5.' (163), 'h6.' (91), 'l5.' (76) leak the meteorite classification into the text. Treatment: Drop or parse into structured fields (mass, found, class) rather than embedding — it is a template over existing columns. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1,097
len_min
46
len_max
72
len_mean
54.31
len_median
53
len_p95
64
word_mean
8.254
word_median
8
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,372
readability_flesch_mean
52.62
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

category

categorical metadata imbalance
This column is a single-valued categorical tag, with all 1097 rows labeled "witnessed_meteorite_falls". Cardinality is 1 and entropy is 0, so it carries no information for modelling and merely records the dataset's provenance or scope. Treatment: Drop before modelling; retain only as a dataset-level annotation. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1
top_value
witnessed_meteorite_falls
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

date

categorical timestamp
This column holds dates stored as strings, all snapped to January 1st of the year, suggesting year-only granularity disguised as full dates. Across 1097 rows there are 231 distinct values with very high entropy ratio (0.967) and no single year exceeding 1.6% frequency, so the distribution is spread broadly across years from at least 1868 to 1977. Null rate is low at 1.73%. Treatment: Parse to datetime and extract year as the working feature, since month/day are constant. high · anthropic:claude-opus-4-7
n
1,097
nulls
19 (1.7%)
unique
231
top_value
1933-01-01
top_rate
0.01577
cardinality
231
entropy
7.593
entropy_ratio
0.967

country

unknown metadata skipped
This column is labeled "country" and contains 1097 non-null values, but saturn skipped detailed profiling so neither the cardinality nor value distribution is available. Without unique counts or sample values, I cannot confirm whether it holds country names, ISO codes, or something else. The only firm signals are full population (null_rate 0.0) and the skipped alert. Treatment: Re-profile with categorical stats enabled, then standardize to ISO codes before use. low · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique

mass_g

unknown other skipped
Column `mass_g` was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. The only confirmed signals are 1097 rows with a 0.0 null rate; uniqueness, distribution, and type are all missing. The name suggests a numeric mass measurement in grams, but this cannot be verified from the evidence. Treatment: Re-run profiling on this column to recover type and distribution before any downstream use. low · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique

meteorite_class

categorical label
This column captures the petrologic classification of meteorites, with 125 distinct classes across 1097 records and no nulls. The distribution is dominated by ordinary chondrite types — L6 alone covers 23.7% of rows, followed by H5 (163) and H6 (91) — while a long tail of 115+ rare classes pushes entropy ratio to 0.67. Analysts should note the heavy concentration in a handful of chondrite groups alongside niche entries like Eucrite-mmict (18). Treatment: Group rare classes into an 'other' bucket before encoding for modelling. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
125
top_value
L6
top_rate
0.237
cardinality
125
entropy
4.639
entropy_ratio
0.666

fall_type

categorical metadata imbalance
This column records the type of fall event but contains the single value "Fell" across all 1097 rows, with zero nulls. Entropy is 0.0 and top_rate is 1.0, so it carries no information for any downstream model. Treatment: Drop; constant column with a single value. high · anthropic:claude-opus-4-7
n
1,097
nulls
0 (0.0%)
unique
1
top_value
Fell
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0