saturn

/home/coolhand/html/datavis/data_trove/data/wild/disasters/disasters_mashup.json 54,575 rows sample n=54,575 seed 42 2026-06-22T01:03:33+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/wild/disasters/disasters_mashup.json
Total rows54,575
Profiled sample54,575
Columns16
Generated2026-06-22T01:03:33+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
categorycategorical0.0%
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
datetext5.7%
subcategorycategorical0.0%
magnitudecategorical80.1%
fatalitiescategorical72.9%
injuriescategorical72.9%
damagetext72.9%
statecategorical72.9%
aircraft_typetext40.6%
event_idtext40.6%
vessel_typecategorical93.3%
cargocategorical93.3%
depth_kmunknown0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset medium anthropic:default

This dataset is a multi-hazard disaster event mashup of 54,575 records spanning aviation accidents, storms, earthquakes, and shipwrecks, each geolocated with latitude and longitude. Aviation accidents dominate heavily at nearly 59% of all records, with Cessna models being the most frequently involved aircraft — worth examining whether this reflects true prevalence or a reporting/sourcing bias. A second area of interest is the severity data: fatalities, injuries, and damage all carry a ~73% null rate, meaning consequence analysis is limited to roughly a quarter of the dataset and skewed toward zero-casualty events. The storm subcategory breakdown (Tornadoes, Flash Floods, Thunderstorm Wind) also deserves a closer look for geographic and seasonal clustering given the strong US state representation.

damage high anthropic:default

This column contains abbreviated monetary damage estimates (e.g., '2.5M', '250K', '0.00K') stored as free-form text, most likely representing financial loss or property damage figures from incident or insurance records. The null rate is extremely high at 72.94%, meaning nearly three-quarters of rows carry no damage value. The all-caps rate of 87.2% and one-word rate of 100% confirm a consistent but non-numeric encoding; the 1,014 unique values across 54,575 rows with a duplicate rate of 93.1% indicate a relatively coarse discrete scale. Analysts should note that string suffixes (K vs M) encode magnitude and must be parsed before any quantitative use.

date high anthropic:default

This column contains date strings in ISO-8601 format (YYYY-MM-DD), stored as text rather than a native date type. Nearly all top values fall on January 1st of their respective years (2002–2012), suggesting dates are truncated or snapped to year-start, which is analytically significant and likely not raw event timestamps. The duplicate rate is extremely high at 81.99%, consistent with annual granularity across 54,575 rows, and 9,264 unique values hint that some finer dates do exist beyond the dominant Jan-1 entries. Null rate is low at 5.74%.

event_id high anthropic:default

This column is an aviation or safety incident event identifier — the 14-character format (e.g., '20010519X00967') encodes a date prefix followed by an alphanumeric case code, consistent with NTSB accident/incident IDs. Two signals are surprising: a null rate of 40.61% means nearly half of rows lack an event ID entirely, and the duplicate rate of 18.46% (5,983 duplicates across 26,427 unique values) indicates multiple rows share the same event ID, implying a one-to-many relationship where each event spawns several records. All values are exactly 14 characters and fully uppercase, confirming a tightly controlled format with no malformed entries.

aircraft_type high anthropic:default

This column contains aircraft make-and-model designations (e.g., 'Cessna 172', 'Piper PA-28-140') from what appears to be an aviation incident or registration dataset. Two major surprises: first, 40.6% of rows are null, indicating substantial missing coverage; second, case inconsistency is severe — 'CESSNA 172' (360 occurrences) and 'Cessna 172' (189 occurrences) are counted as distinct values despite being the same aircraft, with ~49.5% of values in all-caps, inflating n_unique (9,478) and the duplicate rate (70.8%) artificially. The top words confirm a GA-heavy dataset dominated by Cessna, Piper, and Beech.

cargo high anthropic:default

This column records the type of cargo carried by vessels or vehicles, with 17 distinct categories including 'human', 'timber', 'coal', 'fertilizer', and 'fish'. It is overwhelmingly sparse: 93.31% of rows are null, and among the non-null rows the top value is an empty string (3,632 occurrences), meaning genuinely populated values number only in the single digits each. The entropy ratio of 0.018 confirms near-total concentration, and the presence of a German-language entry ('Fischkutter (Stahl)') signals a language mix in the rare populated records.

name high anthropic:default

This column contains descriptive incident or event names, predominantly aviation accidents and natural disaster events (floods, tornadoes). The duplicate rate is strikingly high at 62.3% — with 33,988 duplicates across only 20,587 unique values out of 54,575 rows — largely driven by generic labels like 'Unnamed Wreck' (2,184 occurrences) and repeated aircraft model patterns (e.g., 'Aviation Accident - CESSNA 172' variants). While 86.6% of detected-language tokens are English, 14 other languages appear (French: 60, German: 58, Spanish: 46, Japanese: 32), indicating a multilingual dataset that may require language-aware processing.

latitude high anthropic:default

This column contains geographic latitude values ranging from -77.42 to 82.17, consistent with global coordinate data. The median of 38.38 and IQR of 9.12 suggest the bulk of records cluster around mid-latitude Northern Hemisphere locations (roughly US/Europe), but the negative minimum (-77.42) indicates some Southern Hemisphere entries. Highly surprising is the negative skew of -2.51 combined with extreme kurtosis of 15.97 and 4,302 outliers (7.88% of rows), pointing to a heavy tail of anomalous low-latitude or Southern Hemisphere observations that likely warrant geographic subsetting or anomaly review.

longitude high anthropic:default

This column contains geographic longitude values, spanning from -179.28° to +178.83°, consistent with worldwide coordinates. The mean (-92.97°) and median (-92.81°) are tightly clustered in the central United States, suggesting the bulk of records are North American, yet 4,320 outliers (7.9% of rows) and an extreme kurtosis of 15.13 indicate a heavy-tailed distribution with a substantial minority of globally dispersed points. The positive skew of 2.84 confirms an asymmetric pull toward higher (less-negative or positive) longitude values, i.e., non-US locations.

vessel_type high anthropic:default

This column categorizes the type of vessel involved in an incident or record, with 23 distinct values including 'ship', 'submarine', 'aircraft', and oddly 'car'. Two major data quality issues stand out: the null rate is extreme at 93.31%, meaning only ~3,700 of 54,575 rows carry any value, and the top recorded value is an empty string (3,311 occurrences), which inflates the apparent top_rate to 90.6% — suggesting the true fill rate is even lower than the null_rate implies. The long-tail alert is consistent with rare values like 'schooner' (2), 'sailboat' (2), and 'steamer' (1), while 'car' appearing as a vessel type signals potential data entry errors or schema misuse.

fatalities high anthropic:default

This column represents a fatality count per incident, stored as a categorical/string type despite being numeric in nature. The null rate is severe at 72.94%, meaning nearly three-quarters of records have no value recorded — this is the primary alert. Among non-null values, the distribution is heavily right-skewed: '0' dominates at 69.1% of non-null rows, with counts dropping sharply through 49 distinct values, indicating rare but high-fatality events exist in the tail.

injuries high anthropic:default

This column represents an injury count per incident, stored as a categorical type despite containing integer values (0, 1, 2, …). The dominant concern is an extreme null rate of 72.94%, meaning nearly three-quarters of rows carry no injury data at all. Among non-null rows, the value '0' accounts for 68.14% of responses, indicating most recorded incidents involved no injuries, with a long tail reaching at least 178 distinct values — suggesting occasional high-casualty outliers.

state high anthropic:default

This column contains US state names (full uppercase spellings), acting as a geographic feature for records in the dataset. The critical issue is a 72.94% null rate, meaning nearly three-quarters of all 54,575 rows carry no state value — this is a severe missingness alert. Among non-null values, cardinality is 65 (slightly above 50 US states, suggesting territories or data anomalies), and distribution is moderately spread (entropy ratio 0.86) with Texas as the dominant value at 9.82% of non-null records.

magnitude high anthropic:default

This column represents earthquake or seismic event magnitude, stored as a categorical/string type despite being a numeric measurement with 291 distinct decimal values (e.g., 4.5, 4.6, 4.7). Two signals demand attention: the null rate is extremely high at 80.09%, meaning only ~10,866 of 54,575 rows carry a value. The dominant value '0' accounts for 35.56% of non-null records (3,863 occurrences), which is likely a sentinel or placeholder rather than a true zero magnitude, since genuine zero-magnitude events would be vanishingly rare and the next most frequent values cluster around 4.5–5.1.

depth_km low anthropic:default

This column represents earthquake or geological event depth in kilometres, a continuous numeric feature. The profiler skipped analysis entirely, so no distribution statistics, uniqueness counts, or range information are available. With 54,575 rows and a null rate of 0.0, the data is fully populated, but nothing can be said about skew, outliers, or value range from this evidence alone. An analyst should inspect the column directly before modelling.

category high anthropic:default

This column is a disaster/incident type label with exactly 4 categories: aviation_accident, storm, earthquake, and shipwreck. The distribution is notably skewed — aviation_accident dominates at 59.4% of all 54,575 rows (32,410 records), while earthquake and shipwreck are each underrepresented at roughly 6.7% apiece. The entropy ratio of 0.74 confirms meaningful but unbalanced spread across classes, which could bias classifiers trained on this target without resampling.

subcategory high anthropic:default

This column is a categorical event subcategory, most likely classifying incident or hazard reports across domains such as aviation, geophysical (seismic), meteorological (Tornado, Flash Flood, Thunderstorm Wind), and maritime events. 'aviation' dominates heavily at 59.4% of all 54,575 rows, creating pronounced class imbalance. A subtle data quality issue is present: some values use title case ('Tornado', 'Flash Flood', 'Thunderstorm Wind', 'Hail') while others are fully lowercase ('aviation', 'seismic', 'maritime'), suggesting records were ingested from at least two inconsistently formatted sources. Entropy ratio of 0.49 confirms the distribution is far from uniform.

Numeric correlation

Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
latitudelongitude
latitude+1.00-0.45
longitude-0.45+1.00

Languages detected

Per-string language detection across text columns (sampled).

Show data table
Per-language counts (total 4,964 detected strings).
langcountshare
en472695.2%
fr601.2%
de581.2%
es460.9%
ja320.6%
it130.3%
ru70.1%
zh60.1%
eu30.1%
pt30.1%
id30.1%
pl20.0%
sr10.0%
sv10.0%
ht10.0%
uk10.0%
lv10.0%

category categorical

rows54,575
null0 (0.0%)
unique4
top_valueaviation_accident
top_rate0.594
cardinality4
entropy1.483
entropy_ratio0.741
Show data table
Top values for category (4 unique shown, of 4 total).
valuecountshare
aviation_accident3241059.4%
storm1477027.1%
earthquake37426.9%
shipwreck36536.7%
Top values (rank 1–20)
  1. aviation_accident — 32,410
  2. storm — 14,770
  3. earthquake — 3,742
  4. shipwreck — 3,653

latitude numeric

skew=-2.51 7.9% rows beyond 1.5 IQR
rows54,575
null0 (0.0%)
unique32,209
min-77.425
max82.171
mean38.159
median38.377
std11.958
q133.654
q342.774
iqr9.120
skew-2.510
kurtosis15.966
n_outliers4,302
outlier_rate0.079
zero_rate0.000
Show data table
Histogram bins for latitude (median: 38.376667).
bincount
-77.42 – -73.441
-73.44 – -69.450
-69.45 – -65.460
-65.46 – -61.471
-61.47 – -57.481
-57.48 – -53.495
-53.49 – -49.530
-49.5 – -45.5122
-45.51 – -41.5237
-41.52 – -37.5379
-37.53 – -33.54176
-33.54 – -29.55103
-29.55 – -25.5635
-25.56 – -21.57108
-21.57 – -17.5856
-17.58 – -13.5920
-13.59 – -9.59723
-9.597 – -5.60741
-5.607 – -1.61766
-1.617 – 2.37364
2.373 – 6.36328
6.363 – 10.35171
10.35 – 14.3470
14.34 – 18.33112
18.33 – 22.32346
22.32 – 26.31895
26.31 – 30.34049
30.3 – 34.299547
34.29 – 38.2810928
38.28 – 42.2712642
42.27 – 46.267278
46.26 – 50.252535
50.25 – 54.241308
54.24 – 58.231095
58.23 – 62.221761
62.22 – 66.21697
66.21 – 70.2220
70.2 – 74.1922
74.19 – 78.182
78.18 – 82.171

longitude numeric

skew=+2.84 7.9% rows beyond 1.5 IQR
rows54,575
null0 (0.0%)
unique34,804
min-179.283
max178.828
mean-92.973
median-92.813
std39.505
q1-112.042
q3-82.177
iqr29.865
skew2.843
kurtosis15.128
n_outliers4,320
outlier_rate0.079
zero_rate0.000
Show data table
Histogram bins for longitude (median: -92.8126).
bincount
-179.3 – -170.349
-170.3 – -161.41005
-161.4 – -152.41182
-152.4 – -143.51679
-143.5 – -134.5289
-134.5 – -125.6833
-125.6 – -116.66128
-116.6 – -107.74912
-107.7 – -98.713964
-98.71 – -89.7610929
-89.76 – -80.812439
-80.8 – -71.857045
-71.85 – -62.91013
-62.9 – -53.94178
-53.94 – -44.99139
-44.99 – -36.04143
-36.04 – -27.0940
-27.09 – -18.1315
-18.13 – -9.18139
-9.181 – -0.2277348
-0.2277 – 8.725275
8.725 – 17.68834
17.68 – 26.63267
26.63 – 35.58121
35.58 – 44.5436
44.54 – 53.4980
53.49 – 62.4453
62.44 – 71.392
71.39 – 80.3520
80.35 – 89.32
89.3 – 98.254
98.25 – 107.219
107.2 – 116.226
116.2 – 125.137
125.1 – 134.118
134.1 – 14341
143 – 15259
152 – 160.922
160.9 – 169.9140
169.9 – 178.8150

name text

18 languages detected in sample 62.3% duplicate strings
rows54,575
null0 (0.0%)
unique20,587
len_min2
len_max153
len_mean32.381
len_median31.000
len_p9548.000
word_mean4.782
word_median5.000
n_empty0
n_duplicates33,988
duplicate_rate0.623
vocab_size8,062
readability_flesch_mean8.744
emoji_rate0.000
url_rate0.000
one_word_rate7.97e-03
allcaps_rate1.26e-03
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 32.38147503435639).
charscount
2 – 6129
6 – 10357
10 – 132564
13 – 17297
17 – 21296
21 – 252319
25 – 286611
28 – 3220097
32 – 367724
36 – 405480
40 – 443498
44 – 472132
47 – 511396
51 – 55827
55 – 59490
59 – 62191
62 – 6682
66 – 7019
70 – 7419
74 – 7814
78 – 815
81 – 858
85 – 892
89 – 935
93 – 961
96 – 1008
100 – 1040
104 – 1080
108 – 1110
111 – 1150
115 – 1191
119 – 1230
123 – 1270
127 – 1300
130 – 1342
134 – 1380
138 – 1420
142 – 1450
145 – 1490
149 – 1531
Sample values (first 10)
  1. Aviation Accident - Bombardier,_Inc. DHC-8-103
  2. Thunderstorm Wind in TEXAS, LUBBOCK
  3. Flash Flood in SOUTH CAROLINA, HORRY
  4. Flash Flood in ILLINOIS, JO DAVIESS
  5. Aviation Accident - Cessna 172
  6. 107 km NNE of Los Barriles, Mexico
  7. Aviation Accident - McDonnell_Douglas MD_83
  8. Thunderstorm Wind in PENNSYLVANIA, ELK
  9. Flood in VERMONT, CHITTENDEN
  10. Aviation Accident - SCHWEIZER 269C-1

date text

99.9% rows are a single word 99.8% rows are all-caps 95th-percentile length under 20 chars 82.0% duplicate strings
rows54,575
null3,134 (5.7%)
unique9,264
len_min2
len_max10
len_mean9.980
len_median10.000
len_p9510.000
word_mean1.002
word_median1.000
n_empty0
n_duplicates42,177
duplicate_rate0.820
vocab_size4,710
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate0.999
allcaps_rate0.998
boilerplate_rate0.000
Show data table
Character-length distribution for date (mean: 9.979529946929492).
charscount
2 – 21
2 – 20
2 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 40
4 – 40
4 – 40
4 – 4151
4 – 40
4 – 50
5 – 50
5 – 50
5 – 513
5 – 50
5 – 60
6 – 60
6 – 60
6 – 61
6 – 60
6 – 70
7 – 70
7 – 70
7 – 719
7 – 70
7 – 80
8 – 80
8 – 80
8 – 85
8 – 80
8 – 90
9 – 90
9 – 90
9 – 93
9 – 90
9 – 100
10 – 100
10 – 1051248
Sample values (first 10)
  1. 2017-01-01
  2. 2005-04-02
  3. 2017-04-29
  4. 2012-06-29
  5. 2015-01-01
  6. 1278558949
  7. 2011-01-01
  8. 2021-12-10
  9. 2012-03-02
  10. 2011-01-01

subcategory categorical

rows54,575
null0 (0.0%)
unique20
top_valueaviation
top_rate0.594
cardinality20
entropy2.115
entropy_ratio0.489
Show data table
Top values for subcategory (20 unique shown, of 20 total).
valuecountshare
aviation3241059.4%
Tornado633411.6%
seismic37426.9%
maritime36536.7%
Flash Flood23584.3%
Thunderstorm Wind22574.1%
Flood17773.3%
Hail12462.3%
Lightning5741.1%
Heavy Rain990.2%
Marine Strong Wind430.1%
Debris Flow430.1%
Marine Thunderstorm Wind250.0%
Marine High Wind50.0%
Dust Devil30.0%
Waterspout20.0%
Tropical Storm10.0%
High Wind10.0%
Heat10.0%
Marine Lightning10.0%
Top values (rank 1–20)
  1. aviation — 32,410
  2. Tornado — 6,334
  3. seismic — 3,742
  4. maritime — 3,653
  5. Flash Flood — 2,358
  6. Thunderstorm Wind — 2,257
  7. Flood — 1,777
  8. Hail — 1,246
  9. Lightning — 574
  10. Heavy Rain — 99
  11. Marine Strong Wind — 43
  12. Debris Flow — 43
  13. Marine Thunderstorm Wind — 25
  14. Marine High Wind — 5
  15. Dust Devil — 3
  16. Waterspout — 2
  17. Tropical Storm — 1
  18. High Wind — 1
  19. Heat — 1
  20. Marine Lightning — 1

magnitude categorical

80.1% null
rows54,575
null43,711 (80.1%)
unique291
top_value0
top_rate0.356
cardinality291
entropy4.732
entropy_ratio0.578
Show data table
Top values for magnitude (20 unique shown, of 291 total).
valuecountshare
038637.1%
4.56861.3%
4.65581.0%
4.74150.8%
1.753830.7%
4.83170.6%
4.92610.5%
52380.4%
2.752200.4%
5.12020.4%
5.21670.3%
70.001620.3%
50.001510.3%
2.001500.3%
5.31260.2%
2.501230.2%
61.001220.2%
65.001040.2%
52.00950.2%
5.4950.2%
Top values (rank 1–20)
  1. 0 — 3,863
  2. 4.5 — 686
  3. 4.6 — 558
  4. 4.7 — 415
  5. 1.75 — 383
  6. 4.8 — 317
  7. 4.9 — 261
  8. 5 — 238
  9. 2.75 — 220
  10. 5.1 — 202
  11. 5.2 — 167
  12. 70.00 — 162
  13. 50.00 — 151
  14. 2.00 — 150
  15. 5.3 — 126
  16. 2.50 — 123
  17. 61.00 — 122
  18. 65.00 — 104
  19. 52.00 — 95
  20. 5.4 — 95

fatalities categorical

72.9% null
rows54,575
null39,805 (72.9%)
unique49
top_value0
top_rate0.691
cardinality49
entropy1.423
entropy_ratio0.254
Show data table
Top values for fatalities (20 unique shown, of 49 total).
valuecountshare
01020918.7%
132085.9%
26491.2%
32220.4%
41120.2%
5740.1%
6660.1%
7380.1%
9250.0%
10240.0%
8210.0%
11200.0%
13110.0%
16100.0%
1290.0%
1480.0%
1760.0%
2060.0%
2540.0%
2330.0%
Top values (rank 1–20)
  1. 0 — 10,209
  2. 1 — 3,208
  3. 2 — 649
  4. 3 — 222
  5. 4 — 112
  6. 5 — 74
  7. 6 — 66
  8. 7 — 38
  9. 9 — 25
  10. 10 — 24
  11. 8 — 21
  12. 11 — 20
  13. 13 — 11
  14. 16 — 10
  15. 12 — 9
  16. 14 — 8
  17. 17 — 6
  18. 20 — 6
  19. 25 — 4
  20. 23 — 3

injuries categorical

72.9% null
rows54,575
null39,805 (72.9%)
unique178
top_value0
top_rate0.681
cardinality178
entropy2.468
entropy_ratio0.330
Show data table
Top values for injuries (20 unique shown, of 178 total).
valuecountshare
01006418.4%
18931.6%
25521.0%
33430.6%
42360.4%
52340.4%
102190.4%
61960.4%
121580.3%
71340.2%
81210.2%
201140.2%
151110.2%
11900.2%
9850.2%
13700.1%
14690.1%
30680.1%
25560.1%
16480.1%
Top values (rank 1–20)
  1. 0 — 10,064
  2. 1 — 893
  3. 2 — 552
  4. 3 — 343
  5. 4 — 236
  6. 5 — 234
  7. 10 — 219
  8. 6 — 196
  9. 12 — 158
  10. 7 — 134
  11. 8 — 121
  12. 20 — 114
  13. 15 — 111
  14. 11 — 90
  15. 9 — 85
  16. 13 — 70
  17. 14 — 69
  18. 30 — 68
  19. 25 — 56
  20. 16 — 48

damage text

100.0% rows are a single word 87.2% rows are all-caps 72.9% null 95th-percentile length under 20 chars 93.1% duplicate strings
rows54,575
null39,805 (72.9%)
unique1,014
len_min0
len_max8
len_mean4.381
len_median5.000
len_p957.000
word_mean1.000
word_median1.000
n_empty368
n_duplicates13,756
duplicate_rate0.931
vocab_size1,013
readability_flesch_mean116.977
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.872
boilerplate_rate0.000
Show data table
Character-length distribution for damage (mean: 4.380568720379147).
charscount
0 – 0368
0 – 00
0 – 10
1 – 10
1 – 10
1 – 1264
1 – 10
1 – 20
2 – 20
2 – 20
2 – 21252
2 – 20
2 – 30
3 – 30
3 – 30
3 – 31172
3 – 30
3 – 40
4 – 40
4 – 40
4 – 43414
4 – 40
4 – 50
5 – 50
5 – 50
5 – 56075
5 – 50
5 – 60
6 – 60
6 – 60
6 – 61450
6 – 60
6 – 70
7 – 70
7 – 70
7 – 7514
7 – 70
7 – 80
8 – 80
8 – 8261
Sample values (first 10)
  1. 250K
  2. 40.00M
  3. 3.00M
  4. 20.00K
  5. 3.17M
  6. 15.00M
  7. 1.00M
  8. 0.00K
  9. 0.00K
  10. 25K

state categorical

72.9% null
rows54,575
null39,805 (72.9%)
unique65
top_valueTEXAS
top_rate0.098
cardinality65
entropy5.182
entropy_ratio0.861
Show data table
Top values for state (20 unique shown, of 65 total).
valuecountshare
TEXAS14502.7%
MISSOURI6481.2%
ARKANSAS6021.1%
MISSISSIPPI5701.0%
GEORGIA5621.0%
ILLINOIS5601.0%
IOWA5271.0%
LOUISIANA5070.9%
TENNESSEE4990.9%
FLORIDA4980.9%
OKLAHOMA4900.9%
NEBRASKA4860.9%
ALABAMA4690.9%
WISCONSIN4630.8%
OHIO4410.8%
MICHIGAN4260.8%
NORTH CAROLINA4220.8%
KANSAS4180.8%
INDIANA4080.7%
KENTUCKY3830.7%
Top values (rank 1–20)
  1. TEXAS — 1,450
  2. MISSOURI — 648
  3. ARKANSAS — 602
  4. MISSISSIPPI — 570
  5. GEORGIA — 562
  6. ILLINOIS — 560
  7. IOWA — 527
  8. LOUISIANA — 507
  9. TENNESSEE — 499
  10. FLORIDA — 498
  11. OKLAHOMA — 490
  12. NEBRASKA — 486
  13. ALABAMA — 469
  14. WISCONSIN — 463
  15. OHIO — 441
  16. MICHIGAN — 426
  17. NORTH CAROLINA — 422
  18. KANSAS — 418
  19. INDIANA — 408
  20. KENTUCKY — 383

aircraft_type text

49.5% rows are all-caps 40.6% null 70.8% duplicate strings
rows54,575
null22,165 (40.6%)
unique9,478
len_min7
len_max50
len_mean15.851
len_median13.000
len_p9531.000
word_mean2.000
word_median2.000
n_empty0
n_duplicates22,932
duplicate_rate0.708
vocab_size7,261
readability_flesch_mean45.123
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.495
boilerplate_rate0.000
Show data table
Character-length distribution for aircraft_type (mean: 15.850848503548288).
charscount
7 – 8592
8 – 91240
9 – 103852
10 – 117354
11 – 122473
12 – 131129
13 – 151380
15 – 163032
16 – 171131
17 – 18853
18 – 191024
19 – 20779
20 – 21670
21 – 221117
22 – 23947
23 – 24528
24 – 25507
25 – 26403
26 – 27455
27 – 28447
28 – 30312
30 – 31352
31 – 32258
32 – 33282
33 – 34261
34 – 35276
35 – 36283
36 – 37145
37 – 3856
38 – 3946
39 – 4035
40 – 4148
41 – 4251
42 – 4438
44 – 4519
45 – 465
46 – 477
47 – 486
48 – 492
49 – 5015
Sample values (first 10)
  1. Piper PA-28-140
  2. Robinson R-44
  3. Cessna 180H
  4. WACO QCF-2
  5. PIPER PA_22
  6. PIPER PA_18-150
  7. Piper PA-38-112
  8. Cessna 182K
  9. Rotary_Air_Force_Marketing RAF_2000
  10. Bell UH-1B

event_id text

100.0% rows are a single word 100.0% rows are all-caps 40.6% null 95th-percentile length under 20 chars
rows54,575
null22,165 (40.6%)
unique26,427
len_min14
len_max14
len_mean14.000
len_median14.000
len_p9514.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates5,983
duplicate_rate0.185
vocab_size17,535
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Show data table
Character-length distribution for event_id (mean: 14.0).
charscount
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 1432410
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
14 – 140
Sample values (first 10)
  1. 20010627X01274
  2. 20060814X01163
  3. 20040930X01545
  4. 20091006X62448
  5. 20160516X70808
  6. 20150608X04410
  7. 20001212X20849
  8. 20050216X00203
  9. 20070322X00321
  10. 20020717X01150

vessel_type categorical

14 singleton categories 93.3% null
rows54,575
null50,922 (93.3%)
unique23
top_value
top_rate0.906
cardinality23
entropy0.576
entropy_ratio0.127
Show data table
Top values for vessel_type (20 unique shown, of 23 total).
valuecountshare
33116.1%
ship2750.5%
submarine180.0%
aircraft160.0%
plane100.0%
boat30.0%
schooner20.0%
car20.0%
sailboat20.0%
steamer10.0%
airplane10.0%
freightcar10.0%
train10.0%
paddle steamer10.0%
vehicle10.0%
motorbike10.0%
helicopter10.0%
Steam hoist10.0%
tractor10.0%
Airplane10.0%
Top values (rank 1–20)
  1. — 3,311
  2. ship — 275
  3. submarine — 18
  4. aircraft — 16
  5. plane — 10
  6. boat — 3
  7. schooner — 2
  8. car — 2
  9. sailboat — 2
  10. steamer — 1
  11. airplane — 1
  12. freightcar — 1
  13. train — 1
  14. paddle steamer — 1
  15. vehicle — 1
  16. motorbike — 1
  17. helicopter — 1
  18. Steam hoist — 1
  19. tractor — 1
  20. Airplane — 1

cargo categorical

13 singleton categories 93.3% null top value is 99.4% of rows
rows54,575
null50,922 (93.3%)
unique17
top_value
top_rate0.994
cardinality17
entropy0.073
entropy_ratio0.018
Show data table
Top values for cargo (17 unique shown, of 17 total).
valuecountshare
36326.7%
human40.0%
timber20.0%
coal20.0%
fertilizer10.0%
ore pellets10.0%
Fischkutter (Stahl)10.0%
seafood10.0%
fish10.0%
passengers10.0%
mexican army supposed drugs, but the crew and cargo was not found10.0%
iron ore10.0%
pulp10.0%
18 mines, 6 torpedos10.0%
sugar10.0%
containers;vehicles10.0%
container;oil10.0%
Top values (rank 1–20)
  1. — 3,632
  2. human — 4
  3. timber — 2
  4. coal — 2
  5. fertilizer — 1
  6. ore pellets — 1
  7. Fischkutter (Stahl) — 1
  8. seafood — 1
  9. fish — 1
  10. passengers — 1
  11. mexican army supposed drugs, but the crew and cargo was not found — 1
  12. iron ore — 1
  13. pulp — 1
  14. 18 mines, 6 torpedos — 1
  15. sugar — 1
  16. containers;vehicles — 1
  17. container;oil — 1

depth_km unknown

no profiler for kind=unknown
rows54,575
null0 (0.0%)