natural hazards meteorites

source /home/coolhand/html/datavis/data_trove/data/natural_hazards/meteorites.json 1,097 rows 10 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This is a 1,097-row catalogue of witnessed meteorite falls, with each record carrying a name, description, date, lat/long coordinates and a meteorite class. Two columns (category and fall_type) are constant — every record is a 'witnessed_meteorite_falls' event with fall_type 'Fell' — so the analytic interest sits elsewhere. Meteorite class is the most informative categorical: 125 distinct classes but heavily concentrated, with L6 alone accounting for ~24% of falls and H5 the next largest at ~15%. Latitude is skewed toward the northern hemisphere (median 36.1, mean 30.0) with ~8% flagged as outliers, while longitude spreads broadly across the globe (-157.9 to 174.4). Start with meteorite_class to understand the dominant compositions, then look at the lat/long pair to see geographic coverage.

citing: row_count · column_count · category.top_value · fall_type.top_value · meteorite_class.n_unique · meteorite_class.top_values · meteorite_class.top_rate · latitude.mean · latitude.median · latitude.outlier_rate · longitude.min · longitude.max · date.n_unique · date.top_values

Charts the summary said to look at first

meteorite_class · L6 dominates at ~24% of falls, with H5 and H6 the next most common — note the long tail of 125 classes.

Show data table

Top values for meteorite_class (20 unique shown, of 125 total).
value	count	share
L6	260	23.7%
H5	163	14.9%
H6	91	8.3%
L5	76	6.9%
H4	50	4.6%
LL6	41	3.7%
Stone-uncl	39	3.6%
OC	24	2.2%
LL5	19	1.7%
Eucrite-mmict	18	1.6%
L4	18	1.6%
Howardite	16	1.5%
CM2	15	1.4%
H	13	1.2%
L	10	0.9%
Iron, IIIAB	10	0.9%
Aubrite	9	0.8%
Diogenite	8	0.7%
EL6	8	0.7%
CV3	7	0.6%

latitude · Distribution leans toward northern mid-latitudes (median 36.1) with a left tail of southern-hemisphere outliers.

Show data table

Histogram bins for latitude (median: 36.1).
bin	count
-44.12 – -40.77	2
-40.77 – -37.42	1
-37.42 – -34.07	3
-34.07 – -30.73	31
-30.73 – -27.38	14
-27.38 – -24.03	14
-24.03 – -20.68	9
-20.68 – -17.34	11
-17.34 – -13.99	6
-13.99 – -10.64	4
-10.64 – -7.295	11
-7.295 – -3.948	18
-3.948 – -0.6002	9
-0.6002 – 2.747	10
2.747 – 6.095	6
6.095 – 9.442	13
9.442 – 12.79	34
12.79 – 16.14	32
16.14 – 19.48	20
19.48 – 22.83	34
22.83 – 26.18	57
26.18 – 29.53	59
29.53 – 32.87	61
32.87 – 36.22	96
36.22 – 39.57	79
39.57 – 42.92	78
42.92 – 46.26	119
46.26 – 49.61	81
49.61 – 52.96	92
52.96 – 56.31	53
56.31 – 59.65	22
59.65 – 63	14
63 – 66.35	4

longitude · Longitude spans the full globe (-158 to 174); look for clustering around populated landmasses where falls get reported.

Show data table

Histogram bins for longitude (median: 18.71667).
bin	count
-157.9 – -147.8	2
-147.8 – -137.7	0
-137.7 – -127.7	1
-127.7 – -117.6	7
-117.6 – -107.5	11
-107.5 – -97.45	44
-97.45 – -87.39	49
-87.39 – -77.32	51
-77.32 – -67.25	26
-67.25 – -57.18	21
-57.18 – -47.11	14
-47.11 – -37.04	7
-37.04 – -26.97	2
-26.97 – -16.91	0
-16.91 – -6.836	23
-6.836 – 3.232	102
3.232 – 13.3	135
13.3 – 23.37	88
23.37 – 33.44	91
33.44 – 43.51	59
43.51 – 53.58	25
53.58 – 63.64	11
63.64 – 73.71	29
73.71 – 83.78	104
83.78 – 93.85	31
93.85 – 103.9	13
103.9 – 114	40
114 – 124.1	38
124.1 – 134.1	28
134.1 – 144.2	33
144.2 – 154.3	9
154.3 – 164.3	1
164.3 – 174.4	2

date · 231 distinct fall dates spread fairly evenly; 1933 is the single busiest year with 17 recorded falls.

Show data table

Top values for date (20 unique shown, of 231 total).
value	count	share
1933-01-01	17	1.5%
1949-01-01	13	1.2%
1950-01-01	12	1.1%
1976-01-01	11	1.0%
1930-01-01	11	1.0%
1938-01-01	11	1.0%
1910-01-01	11	1.0%
1868-01-01	11	1.0%
1977-01-01	10	0.9%
1939-01-01	10	0.9%
1984-01-01	10	0.9%
1934-01-01	10	0.9%
1916-01-01	10	0.9%
1924-01-01	10	0.9%
1917-01-01	10	0.9%
2008-01-01	9	0.8%
2003-01-01	9	0.8%
1998-01-01	9	0.8%
1890-01-01	9	0.8%
1986-01-01	9	0.8%

description · Descriptions are uniformly templated (46-72 chars); useful as a sanity check that no records are truncated or malformed.

Show data table

Character-length distribution for description (mean: 54.30811303555151).
chars	count
46 – 47	1
47 – 47	5
47 – 48	0
48 – 49	29
49 – 49	79
49 – 50	0
50 – 51	118
51 – 51	137
51 – 52	0
52 – 52	129
52 – 53	110
53 – 54	0
54 – 54	76
54 – 55	68
55 – 56	0
56 – 56	58
56 – 57	54
57 – 58	0
58 – 58	34
58 – 59	0
59 – 60	40
60 – 60	22
60 – 61	0
61 – 62	26
62 – 62	21
62 – 63	0
63 – 64	20
64 – 64	20
64 – 65	0
65 – 66	14
66 – 66	9
66 – 67	0
67 – 67	11
67 – 68	4
68 – 69	0
69 – 69	3
69 – 70	5
70 – 71	0
71 – 71	1
71 – 72	3

Schema

10 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
latitude	numeric	0.0%	958	outliers
longitude	numeric	0.0%	1,030
name	text	0.0%	1,097	near_unique one_word short_text
description	text	0.0%	1,097	near_unique
category	categorical	0.0%	1	imbalance
date	categorical	1.7%	231
country	unknown	0.0%	—	skipped
mass_g	unknown	0.0%	—	skipped
meteorite_class	categorical	0.0%	125
fall_type	categorical	0.0%	1	imbalance

latitude

numeric feature outliers

Geographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most inhabited latitudes from southern Australia to northern Scandinavia. The distribution is left-skewed (skew -1.28) with a median of 36.1 sitting well above the mean of 30.04, indicating a northern-hemisphere bias with a tail of southern-hemisphere observations flagged as 90 outliers (8.2%). Near-unique values (958 of 1097) suggest each row is a distinct location. Treatment: Pair with longitude for spatial features; the southern-hemisphere outliers are likely legitimate, not errors. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 958
min: -44.12
max: 66.35
mean: 30.04
median: 36.1
std: 23.13
q1: 21.87
q3: 46.07
iqr: 24.2
skew: -1.276
kurtosis: 1.01
n_outliers: 90
outlier_rate: 0.08204
zero_rate: 0.001823

longitude

numeric feature

Geographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. The distribution is roughly symmetric (skew -0.23, kurtosis -0.62) and centered near 18.72, suggesting a slight Eurasian/African concentration but broad global coverage. With 1030 unique values across 1097 rows and no nulls, points are nearly all distinct. Treatment: Pair with latitude as a geospatial coordinate; avoid scaling as a plain numeric feature. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 1,030
min: -157.9
max: 174.4
mean: 20.13
median: 18.72
std: 68.87
q1: -4.233
q3: 76.27
iqr: 80.5
skew: -0.2257
kurtosis: -0.6185
n_outliers: 3
outlier_rate: 0.002735
zero_rate: 0.0009116

name

text identifier near_unique one_word short_text

This is a `name` column with 1097 fully unique short strings (n_unique == n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being a single word. Top words like `st.`, `county`, `san`, `santa`, `creek`, `la`, `el`, `de` strongly suggest these are place or geographic entity names rather than person names, with a Spanish/English mix. No nulls, no URLs, no emoji — clean but effectively an identifier-grade label. Treatment: Treat as a unique label/key; drop from modelling features or use only via geographic enrichment lookup. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 1,097
len_min: 2
len_max: 28
len_mean: 8.557
len_median: 8
len_p95: 15
word_mean: 1.209
word_median: 1
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,238
readability_flesch_mean: 40.67
emoji_rate: 0
url_rate: 0
one_word_rate: 0.8295
allcaps_rate: 0
boilerplate_rate: 0

description

text metadata near_unique

This appears to be a templated, machine-generated description string for meteorite records, with every row containing the tokens 'meteorite', 'mass:', 'found:', and 'fell.' exactly 1097 times. Every value is unique (n_unique=1097, duplicate_rate=0.0) yet length is tightly bounded (46-72 chars, median 53), confirming a fixed schema where only embedded fields like classification (l6. appears 260 times, h5. 163) and numeric values vary. The 'unknown.' token appearing 1099 times signals frequent missing sub-fields packed into the template. Treatment: Parse the template to extract structured fields (class, mass, found/fell) rather than embedding the raw string. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 1,097
len_min: 46
len_max: 72
len_mean: 54.31
len_median: 53
len_p95: 64
word_mean: 8.254
word_median: 8
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,372
readability_flesch_mean: 52.62
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

date

categorical timestamp

Date values stored as ISO strings, all snapped to January 1st of the year — so this is effectively a year-granularity field masquerading as a full date. Across 1097 rows there are 231 distinct years with very high entropy ratio (0.967), and the most common year (1933-01-01) accounts for only 1.6% of rows. Null rate is 1.73%. Treatment: Parse to datetime and extract the year as an integer feature; the month/day component carries no signal. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 19 (1.7%)
unique: 231
top_value: 1933-01-01
top_rate: 0.01577
cardinality: 231
entropy: 7.593
entropy_ratio: 0.967

country

unknown feature skipped

This column is labeled 'country' and likely holds country names or codes, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. The only confirmed signals are 1097 rows with a 0.0 null rate. No further evidence is available to characterize cardinality, format, or dominant values. Treatment: Re-profile to determine cardinality and format, then encode as a categorical. low · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: —

mass_g

unknown other skipped

The column is named mass_g, suggesting a mass measurement in grams across 1,097 rows with no nulls. However, saturn skipped profiling this column and reported no kind, uniqueness, or distribution stats, so its actual content and shape cannot be confirmed from the evidence. Treatment: Re-profile or manually inspect this column before use; current evidence is insufficient to decide handling. low · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: —

meteorite_class

categorical label

This column records meteorite classification codes (L6, H5, H6, etc.), the standard taxonomy for ordinary chondrites and other meteorite types. Cardinality is high at 125 distinct classes across 1,097 rows with no nulls, but the distribution is concentrated: L6 alone covers 23.7% and the top four classes (L6, H5, H6, L5) account for over half the data. Entropy ratio of 0.67 confirms a long tail of rare classes that will be sparsely represented. Treatment: Group rare classes into an 'other' bucket or use target encoding before modelling. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 125
top_value: L6
top_rate: 0.237
cardinality: 125
entropy: 4.639
entropy_ratio: 0.666

fall_type

categorical metadata imbalance

This column records a fall classification but contains the single value "Fell" across all 1097 rows, with no nulls. Entropy is 0 and cardinality is 1, so it carries no information for any downstream model or segmentation. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7

n: 1,097
nulls: 0 (0.0%)
unique: 1
top_value: Fell
top_rate: 1
cardinality: 1
entropy: 0
entropy_ratio: 0