natural hazards meteorites
Reading
This is a 1,097-row catalogue of witnessed meteorite falls, with each record carrying a name, description, date, lat/long coordinates and a meteorite class. Two columns (category and fall_type) are constant — every record is a 'witnessed_meteorite_falls' event with fall_type 'Fell' — so the analytic interest sits elsewhere. Meteorite class is the most informative categorical: 125 distinct classes but heavily concentrated, with L6 alone accounting for ~24% of falls and H5 the next largest at ~15%. Latitude is skewed toward the northern hemisphere (median 36.1, mean 30.0) with ~8% flagged as outliers, while longitude spreads broadly across the globe (-157.9 to 174.4). Start with meteorite_class to understand the dominant compositions, then look at the lat/long pair to see geographic coverage.
citing: row_count · column_count · category.top_value · fall_type.top_value · meteorite_class.n_unique · meteorite_class.top_values · meteorite_class.top_rate · latitude.mean · latitude.median · latitude.outlier_rate · longitude.min · longitude.max · date.n_unique · date.top_values
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| L6 | 260 | 23.7% |
| H5 | 163 | 14.9% |
| H6 | 91 | 8.3% |
| L5 | 76 | 6.9% |
| H4 | 50 | 4.6% |
| LL6 | 41 | 3.7% |
| Stone-uncl | 39 | 3.6% |
| OC | 24 | 2.2% |
| LL5 | 19 | 1.7% |
| Eucrite-mmict | 18 | 1.6% |
| L4 | 18 | 1.6% |
| Howardite | 16 | 1.5% |
| CM2 | 15 | 1.4% |
| H | 13 | 1.2% |
| L | 10 | 0.9% |
| Iron, IIIAB | 10 | 0.9% |
| Aubrite | 9 | 0.8% |
| Diogenite | 8 | 0.7% |
| EL6 | 8 | 0.7% |
| CV3 | 7 | 0.6% |
Show data table
| bin | count |
|---|---|
| -44.12 – -40.77 | 2 |
| -40.77 – -37.42 | 1 |
| -37.42 – -34.07 | 3 |
| -34.07 – -30.73 | 31 |
| -30.73 – -27.38 | 14 |
| -27.38 – -24.03 | 14 |
| -24.03 – -20.68 | 9 |
| -20.68 – -17.34 | 11 |
| -17.34 – -13.99 | 6 |
| -13.99 – -10.64 | 4 |
| -10.64 – -7.295 | 11 |
| -7.295 – -3.948 | 18 |
| -3.948 – -0.6002 | 9 |
| -0.6002 – 2.747 | 10 |
| 2.747 – 6.095 | 6 |
| 6.095 – 9.442 | 13 |
| 9.442 – 12.79 | 34 |
| 12.79 – 16.14 | 32 |
| 16.14 – 19.48 | 20 |
| 19.48 – 22.83 | 34 |
| 22.83 – 26.18 | 57 |
| 26.18 – 29.53 | 59 |
| 29.53 – 32.87 | 61 |
| 32.87 – 36.22 | 96 |
| 36.22 – 39.57 | 79 |
| 39.57 – 42.92 | 78 |
| 42.92 – 46.26 | 119 |
| 46.26 – 49.61 | 81 |
| 49.61 – 52.96 | 92 |
| 52.96 – 56.31 | 53 |
| 56.31 – 59.65 | 22 |
| 59.65 – 63 | 14 |
| 63 – 66.35 | 4 |
Show data table
| bin | count |
|---|---|
| -157.9 – -147.8 | 2 |
| -147.8 – -137.7 | 0 |
| -137.7 – -127.7 | 1 |
| -127.7 – -117.6 | 7 |
| -117.6 – -107.5 | 11 |
| -107.5 – -97.45 | 44 |
| -97.45 – -87.39 | 49 |
| -87.39 – -77.32 | 51 |
| -77.32 – -67.25 | 26 |
| -67.25 – -57.18 | 21 |
| -57.18 – -47.11 | 14 |
| -47.11 – -37.04 | 7 |
| -37.04 – -26.97 | 2 |
| -26.97 – -16.91 | 0 |
| -16.91 – -6.836 | 23 |
| -6.836 – 3.232 | 102 |
| 3.232 – 13.3 | 135 |
| 13.3 – 23.37 | 88 |
| 23.37 – 33.44 | 91 |
| 33.44 – 43.51 | 59 |
| 43.51 – 53.58 | 25 |
| 53.58 – 63.64 | 11 |
| 63.64 – 73.71 | 29 |
| 73.71 – 83.78 | 104 |
| 83.78 – 93.85 | 31 |
| 93.85 – 103.9 | 13 |
| 103.9 – 114 | 40 |
| 114 – 124.1 | 38 |
| 124.1 – 134.1 | 28 |
| 134.1 – 144.2 | 33 |
| 144.2 – 154.3 | 9 |
| 154.3 – 164.3 | 1 |
| 164.3 – 174.4 | 2 |
Show data table
| value | count | share |
|---|---|---|
| 1933-01-01 | 17 | 1.5% |
| 1949-01-01 | 13 | 1.2% |
| 1950-01-01 | 12 | 1.1% |
| 1976-01-01 | 11 | 1.0% |
| 1930-01-01 | 11 | 1.0% |
| 1938-01-01 | 11 | 1.0% |
| 1910-01-01 | 11 | 1.0% |
| 1868-01-01 | 11 | 1.0% |
| 1977-01-01 | 10 | 0.9% |
| 1939-01-01 | 10 | 0.9% |
| 1984-01-01 | 10 | 0.9% |
| 1934-01-01 | 10 | 0.9% |
| 1916-01-01 | 10 | 0.9% |
| 1924-01-01 | 10 | 0.9% |
| 1917-01-01 | 10 | 0.9% |
| 2008-01-01 | 9 | 0.8% |
| 2003-01-01 | 9 | 0.8% |
| 1998-01-01 | 9 | 0.8% |
| 1890-01-01 | 9 | 0.8% |
| 1986-01-01 | 9 | 0.8% |
Show data table
| chars | count |
|---|---|
| 46 – 47 | 1 |
| 47 – 47 | 5 |
| 47 – 48 | 0 |
| 48 – 49 | 29 |
| 49 – 49 | 79 |
| 49 – 50 | 0 |
| 50 – 51 | 118 |
| 51 – 51 | 137 |
| 51 – 52 | 0 |
| 52 – 52 | 129 |
| 52 – 53 | 110 |
| 53 – 54 | 0 |
| 54 – 54 | 76 |
| 54 – 55 | 68 |
| 55 – 56 | 0 |
| 56 – 56 | 58 |
| 56 – 57 | 54 |
| 57 – 58 | 0 |
| 58 – 58 | 34 |
| 58 – 59 | 0 |
| 59 – 60 | 40 |
| 60 – 60 | 22 |
| 60 – 61 | 0 |
| 61 – 62 | 26 |
| 62 – 62 | 21 |
| 62 – 63 | 0 |
| 63 – 64 | 20 |
| 64 – 64 | 20 |
| 64 – 65 | 0 |
| 65 – 66 | 14 |
| 66 – 66 | 9 |
| 66 – 67 | 0 |
| 67 – 67 | 11 |
| 67 – 68 | 4 |
| 68 – 69 | 0 |
| 69 – 69 | 3 |
| 69 – 70 | 5 |
| 70 – 71 | 0 |
| 71 – 71 | 1 |
| 71 – 72 | 3 |
Schema
10 columns| Alerts | ||||
|---|---|---|---|---|
| latitude | numeric | 0.0% | 958 |
outliers
|
| longitude | numeric | 0.0% | 1,030 |
|
| name | text | 0.0% | 1,097 |
near_unique
one_word
short_text
|
| description | text | 0.0% | 1,097 |
near_unique
|
| category | categorical | 0.0% | 1 |
imbalance
|
| date | categorical | 1.7% | 231 |
|
| country | unknown | 0.0% | — |
skipped
|
| mass_g | unknown | 0.0% | — |
skipped
|
| meteorite_class | categorical | 0.0% | 125 |
|
| fall_type | categorical | 0.0% | 1 |
imbalance
|
latitude
numeric feature outliersGeographic latitude coordinates spanning -44.12 to 66.35 degrees, covering most inhabited latitudes from southern Australia to northern Scandinavia. The distribution is left-skewed (skew -1.28) with a median of 36.1 sitting well above the mean of 30.04, indicating a northern-hemisphere bias with a tail of southern-hemisphere observations flagged as 90 outliers (8.2%). Near-unique values (958 of 1097) suggest each row is a distinct location. Treatment: Pair with longitude for spatial features; the southern-hemisphere outliers are likely legitimate, not errors.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 958
- min
- -44.12
- max
- 66.35
- mean
- 30.04
- median
- 36.1
- std
- 23.13
- q1
- 21.87
- q3
- 46.07
- iqr
- 24.2
- skew
- -1.276
- kurtosis
- 1.01
- n_outliers
- 90
- outlier_rate
- 0.08204
- zero_rate
- 0.001823
longitude
numeric featureGeographic longitude in decimal degrees, with values spanning -157.87 to 174.4 — essentially the full -180/180 range. The distribution is roughly symmetric (skew -0.23, kurtosis -0.62) and centered near 18.72, suggesting a slight Eurasian/African concentration but broad global coverage. With 1030 unique values across 1097 rows and no nulls, points are nearly all distinct. Treatment: Pair with latitude as a geospatial coordinate; avoid scaling as a plain numeric feature.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 1,030
- min
- -157.9
- max
- 174.4
- mean
- 20.13
- median
- 18.72
- std
- 68.87
- q1
- -4.233
- q3
- 76.27
- iqr
- 80.5
- skew
- -0.2257
- kurtosis
- -0.6185
- n_outliers
- 3
- outlier_rate
- 0.002735
- zero_rate
- 0.0009116
name
text identifier near_unique one_word short_textThis is a `name` column with 1097 fully unique short strings (n_unique == n, duplicate_rate 0.0), averaging 8.56 characters and 1.21 words, with 82.95% being a single word. Top words like `st.`, `county`, `san`, `santa`, `creek`, `la`, `el`, `de` strongly suggest these are place or geographic entity names rather than person names, with a Spanish/English mix. No nulls, no URLs, no emoji — clean but effectively an identifier-grade label. Treatment: Treat as a unique label/key; drop from modelling features or use only via geographic enrichment lookup.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 1,097
- len_min
- 2
- len_max
- 28
- len_mean
- 8.557
- len_median
- 8
- len_p95
- 15
- word_mean
- 1.209
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,238
- readability_flesch_mean
- 40.67
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.8295
- allcaps_rate
- 0
- boilerplate_rate
- 0
description
text metadata near_uniqueThis appears to be a templated, machine-generated description string for meteorite records, with every row containing the tokens 'meteorite', 'mass:', 'found:', and 'fell.' exactly 1097 times. Every value is unique (n_unique=1097, duplicate_rate=0.0) yet length is tightly bounded (46-72 chars, median 53), confirming a fixed schema where only embedded fields like classification (l6. appears 260 times, h5. 163) and numeric values vary. The 'unknown.' token appearing 1099 times signals frequent missing sub-fields packed into the template. Treatment: Parse the template to extract structured fields (class, mass, found/fell) rather than embedding the raw string.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 1,097
- len_min
- 46
- len_max
- 72
- len_mean
- 54.31
- len_median
- 53
- len_p95
- 64
- word_mean
- 8.254
- word_median
- 8
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,372
- readability_flesch_mean
- 52.62
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
category
categorical metadata imbalanceThis column is a constant categorical tag, holding the single value "witnessed_meteorite_falls" across all 1097 rows. With cardinality of 1 and entropy of 0, it carries no information and likely identifies the dataset's provenance or scope rather than describing individual records. Treatment: Drop before modelling; retain only as a dataset-level label.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- witnessed_meteorite_falls
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
date
categorical timestampDate values stored as ISO strings, all snapped to January 1st of the year — so this is effectively a year-granularity field masquerading as a full date. Across 1097 rows there are 231 distinct years with very high entropy ratio (0.967), and the most common year (1933-01-01) accounts for only 1.6% of rows. Null rate is 1.73%. Treatment: Parse to datetime and extract the year as an integer feature; the month/day component carries no signal.
- n
- 1,097
- nulls
- 19 (1.7%)
- unique
- 231
- top_value
- 1933-01-01
- top_rate
- 0.01577
- cardinality
- 231
- entropy
- 7.593
- entropy_ratio
- 0.967
country
unknown feature skippedThis column is labeled 'country' and likely holds country names or codes, but saturn skipped detailed profiling so kind, uniqueness, and value distribution are unknown. The only confirmed signals are 1097 rows with a 0.0 null rate. No further evidence is available to characterize cardinality, format, or dominant values. Treatment: Re-profile to determine cardinality and format, then encode as a categorical.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- —
mass_g
unknown other skippedThe column is named mass_g, suggesting a mass measurement in grams across 1,097 rows with no nulls. However, saturn skipped profiling this column and reported no kind, uniqueness, or distribution stats, so its actual content and shape cannot be confirmed from the evidence. Treatment: Re-profile or manually inspect this column before use; current evidence is insufficient to decide handling.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- —
meteorite_class
categorical labelThis column records meteorite classification codes (L6, H5, H6, etc.), the standard taxonomy for ordinary chondrites and other meteorite types. Cardinality is high at 125 distinct classes across 1,097 rows with no nulls, but the distribution is concentrated: L6 alone covers 23.7% and the top four classes (L6, H5, H6, L5) account for over half the data. Entropy ratio of 0.67 confirms a long tail of rare classes that will be sparsely represented. Treatment: Group rare classes into an 'other' bucket or use target encoding before modelling.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 125
- top_value
- L6
- top_rate
- 0.237
- cardinality
- 125
- entropy
- 4.639
- entropy_ratio
- 0.666
fall_type
categorical metadata imbalanceThis column records a fall classification but contains the single value "Fell" across all 1097 rows, with no nulls. Entropy is 0 and cardinality is 1, so it carries no information for any downstream model or segmentation. Treatment: Drop; constant column with zero entropy.
- n
- 1,097
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- Fell
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0