saturn

/home/coolhand/html/datavis/data_trove/data/wild/disasters/airplane_crashes.csv 5,268 rows sample n=5,268 seed 42 2026-06-21T23:24:17+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/wild/disasters/airplane_crashes.csv
Total rows5,268
Profiled sample5,268
Columns13
Generated2026-06-21T23:24:17+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
Datetext0.0%
Timetext42.1%
Locationtext0.4%
Operatortext0.3%
Flight #categorical79.7%
Routetext32.4%
Typetext0.5%
Registrationtext6.4%
cn/Intext23.3%
Aboardnumeric0.4%
Fatalitiesnumeric0.2%
Groundnumeric0.4%
Summarytext7.4%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset catalogues 5,268 aviation accidents spanning roughly a century, recording details such as date, operator, aircraft type, location, passengers aboard, fatalities, and ground casualties. Two numeric columns stand out immediately: Fatalities (mean 20, max 583) and Aboard (mean 28, max 644) are both highly right-skewed with significant outliers, suggesting a small number of catastrophic mass-casualty events dominate the tail. The Operator column reveals that Aeroflot (179 incidents) and U.S. military branches collectively account for a large share of recorded crashes, worth examining for era-specific clustering. Ground fatalities are near-zero in 95% of cases but spike dramatically in rare events (max 2,750), likely reflecting high-profile urban crashes.

Time high anthropic:default

This column contains clock times in HH:MM format (lengths 4–7 characters), almost certainly representing scheduled or recorded event times. Two signals warrant attention: the null rate is high at 42.12%, meaning nearly half of all 5,268 rows carry no time value, and the duplicate rate is 67.04% — expected for a time-of-day field with only 1,005 distinct values across non-null rows. The 'allcaps' alert is a false positive from saturn misclassifying colon-separated digit strings.

cn/In medium anthropic:default

This column ('cn/In') appears to be a short coded identifier or reference field — likely a chemical notation, index number, or abbreviated category code — given its near-universal single-word (98.4%), all-caps (96.6%) character and very short values (median length 5, max 20). The top word '/' appearing 49 times suggests some values are compound codes using slash-delimited notation (e.g., 'CN/IN' style references), while most top values are pure numeric strings ('178', '19', '229', etc.). Two signals warrant attention: the null rate is high at 23.3%, and despite 3,707 unique values across 5,268 rows, there are 333 duplicates, indicating this is not a strict unique identifier.

Registration high anthropic:default

This column contains vehicle or aircraft registration codes — short, almost entirely uppercase alphanumeric identifiers (allcaps_rate 99.2%, median length 6 characters) consistent with licence plates or tail numbers. With 4905 unique values out of 5268 rows and only 28 duplicates, it behaves as a near-unique identifier, though the 6.36% null rate and occasional slash-containing entries (top word '/' appears 36 times) suggest some composite or malformed registrations worth inspecting. The presence of tokens like 'HK-' (a Colombian aviation prefix) and 'NC10809' hints at international aircraft tail numbers rather than road vehicle plates.

Date high anthropic:default

This column contains dates stored as text strings in MM/DD/YYYY format, with every value exactly 10 characters long and zero nulls across 5,268 rows. The duplicate rate of ~9.8% (515 duplicates across only 4,753 unique values) is notable — multiple records share the same date, with the most frequent dates appearing up to 4 times, including historically significant dates like 09/11/2001 and 06/06/1944, suggesting the dataset may track events tied to recurring or landmark dates. The 'allcaps' alert is a false positive from the date format containing no letters.

Operator high anthropic:default

This column contains the name of the airline or military branch operating an aircraft involved in an incident, making it a categorical label field. With 2,476 unique values across 5,268 rows, the duplicate rate of 52.8% is expected for a label of this type — operators recur across multiple incidents. The multilingual alert is a natural artifact of international airline names (German, French, Italian, Spanish, Russian operators all present), not a data quality issue per se, though analysts should be aware that variant spellings of the same operator may inflate cardinality. Top values (Aeroflot at 179, U.S. Air Force at 176) reveal a mix of commercial and military operators.

Route high anthropic:default

This column represents aviation route descriptions, capturing both origin-destination pairs (e.g., 'Saigon - Paris', 'Bogota - Barranquilla') and flight purpose labels (e.g., 'Training', 'Sightseeing', 'Test flight'). The null rate of 32.38% is a significant concern, meaning roughly one-third of records lack route information. The multilingual alert is expected given the international nature of routes — English dominates at 2,567 detections but Spanish (237), Portuguese (100), German (88), and French (64) are well-represented, reflecting global aviation data. The high n_unique count (3,244 of 5,268 non-null values) with a duplicate rate of 8.93% (318 duplicates) confirms this is a descriptive label field with many distinct routes but some recurring purpose/training entries.

Fatalities high anthropic:default

This column records the number of fatalities per incident (likely aviation accidents, conflicts, or similar events). The distribution is extremely right-skewed (skew = 4.95, kurtosis = 42.79): the median is only 9 fatalities while the mean is 20.07 and the maximum reaches 583, indicating a long tail of mass-casualty events. 444 rows (8.4%) are flagged as outliers, and the IQR of 20 against a std of 33.2 confirms that most incidents are low-fatality but a meaningful minority are catastrophic.

Aboard high anthropic:default

This column records the number of people aboard a vehicle (likely an aircraft or ship) at the time of an incident. The distribution is severely right-skewed (skew=4.25, kurtosis=28.41): the median is only 13 passengers while the mean is 27.6, and the max reaches 644 — consistent with a few large commercial aircraft disasters pulling the tail far right. Roughly 10% of rows (529) are flagged as outliers, and the IQR spans just 5–30, meaning the vast majority of incidents involve small craft.

Flight # high anthropic:default

This column represents a flight number identifier, likely recording the flight designation for each row in the dataset. Two major issues stand out: 79.71% of values are null, making the column largely unpopulated, and the most frequent non-null value is a placeholder dash ('-') appearing 67 times, suggesting systematic missing-data encoding. With 724 unique values across only 1,073 non-null rows and an entropy ratio of 0.953, the distribution is near-uniform with a pronounced long tail — no single flight number dominates meaningfully beyond the placeholder.

Type high anthropic:default

This column captures aircraft model designations (e.g., 'Douglas DC-3', 'de Havilland Canada DHC-6 Twin Otter 300'), making it an aircraft type label in what appears to be an aviation incident or accident dataset. The duplicate rate of 53.3% (2,795 of 5,268 rows) is expected for a categorical-like field where many incidents share the same aircraft type, with 'Douglas DC-3' alone appearing 334 times. There are 2,446 unique values against a vocabulary of 2,534 words, indicating many near-unique variant spellings or sub-model suffixes (e.g., 'Douglas C-47', 'Douglas C-47A', 'Douglas C-47B' are counted separately), which is the key analyst surprise. Null rate is negligible at 0.51%.

Summary high anthropic:default

This column contains free-text narrative summaries of aviation incidents or accidents, as evidenced by dominant domain terms 'crashed', 'into', and 'aircraft' appearing thousands of times across 5,268 records. Text length varies widely (min 6, median 136, max 1,954 characters), suggesting entries range from brief one-liners to detailed multi-sentence accounts. A duplicate rate of 4.2% (205 duplicates) is mildly surprising for free-text summaries and may indicate repeated incident templates or copy-paste entries. Flesch readability of 61.7 indicates moderate accessibility, consistent with factual incident reporting prose.

Ground medium anthropic:default

This column likely represents a ground elevation, ground clearance, or grounding-related measurement (possibly in feet or meters) associated with physical infrastructure or flight/equipment records. The distribution is extreme: 95.8% of values are exactly zero, yet the maximum reaches 2750.0 with a skew of 50.34 and kurtosis of 2558.60, indicating a tiny fraction of records carry very large non-zero values. Only 50 unique values exist across 5,268 rows, and 219 observations (4.17%) are flagged as outliers — the near-zero IQR (Q1=Q3=0) confirms the overwhelming concentration at zero.

Location high anthropic:default

This column contains free-text geographic location descriptions, most commonly in 'City, Country/State' format (mean ~2.9 words, median length 19 characters), representing where individual events occurred. The high frequency of the word 'near' (1,272 occurrences out of 5,268 rows) indicates a substantial proportion of entries are approximate locations rather than precise place names, which could complicate geocoding. The duplicate rate of 18% (945 duplicates across 4,303 unique values) is expected for a location field but the long tail of near-unique entries (vocab size 4,541) suggests significant free-text variation in how locations are recorded.

Numeric correlation

Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
AboardFatalitiesGround
Aboard+1.00+0.04+0.06
Fatalities+0.04+1.00+0.05
Ground+0.06+0.05+1.00

Languages detected

Per-string language detection across text columns (sampled).

Show data table
Per-language counts (total 7,904 detected strings).
langcountshare
en590774.7%
es4615.8%
it3664.6%
de2903.7%
fr2473.1%
pt1552.0%
id931.2%
nl730.9%
sv510.6%
ca390.5%
pl310.4%
ru270.3%
no220.3%
sl200.3%
tr180.2%
ceb140.2%
hr140.2%
cs110.1%
eo70.1%
uk60.1%
hu60.1%
fi60.1%
ms60.1%
ro60.1%
da50.1%
bs30.0%
vi30.0%
sh30.0%
et30.0%
gl20.0%
lt20.0%
la20.0%
eu10.0%
ku10.0%
te10.0%
gd10.0%
ja10.0%

Date text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows5,268
null0 (0.0%)
unique4,753
len_min10
len_max10
len_mean10.000
len_median10.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates515
duplicate_rate0.098
vocab_size4,753
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Show data table
Character-length distribution for Date (mean: 10.0).
charscount
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 105268
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
Sample values (first 10)
  1. 09/03/1915
  2. 07/02/1990
  3. 12/05/1997
  4. 01/14/1995
  5. 03/09/1968
  6. 03/07/2006
  7. 12/31/1968
  8. 03/17/2000
  9. 04/23/1995
  10. 06/29/1929

Time text

99.9% rows are a single word 99.7% rows are all-caps 42.1% null 95th-percentile length under 20 chars 67.0% duplicate strings
rows5,268
null2,219 (42.1%)
unique1,005
len_min4
len_max7
len_mean5.003
len_median5.000
len_p955.000
word_mean1.001
word_median1.000
n_empty0
n_duplicates2,044
duplicate_rate0.670
vocab_size1,004
readability_flesch_mean121.215
emoji_rate0.000
url_rate0.000
one_word_rate0.999
allcaps_rate0.997
boilerplate_rate0.000
Show data table
Character-length distribution for Time (mean: 5.002623811085602).
charscount
4 – 47
4 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 53033
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 63
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 70
7 – 70
7 – 70
7 – 70
7 – 70
7 – 70
7 – 76
Sample values (first 10)
  1. 10:30
  2. 22:00
  3. 18:21
  4. 09:40
  5. 12:45
  6. 05:08
  7. 07:34
  8. 09:32
  9. 17:54
  10. 16:35

Location text

rows5,268
null20 (0.4%)
unique4,303
len_min5
len_max60
len_mean20.380
len_median19.000
len_p9531.000
word_mean2.866
word_median3.000
n_empty0
n_duplicates945
duplicate_rate0.180
vocab_size4,541
readability_flesch_mean24.031
emoji_rate0.000
url_rate0.000
one_word_rate0.011
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for Location (mean: 20.379954268292682).
charscount
5 – 621
6 – 88
8 – 922
9 – 1016
10 – 1261
12 – 13326
13 – 15280
15 – 16289
16 – 17753
17 – 19413
19 – 20828
20 – 22383
22 – 23313
23 – 24479
24 – 26189
26 – 27170
27 – 28244
28 – 3085
30 – 31114
31 – 3236
32 – 3428
34 – 3552
35 – 3725
37 – 3823
38 – 3926
39 – 4111
41 – 4218
42 – 443
44 – 4513
45 – 463
46 – 481
48 – 495
49 – 502
50 – 521
52 – 533
53 – 541
54 – 560
56 – 570
57 – 591
59 – 602
Sample values (first 10)
  1. Off Cuxhaven, Germany
  2. Near Port Morseby, New Guinea
  3. Little Grand Rapids, Canada
  4. Kathmandu, Nepal
  5. St. Louis, Missouri
  6. Labiano, Spain
  7. Near Bradford, Pennsylvania
  8. Ennadai Lake, Canada
  9. Near Palaly AFB, Sri Lanka
  10. Lake Constance, Switzerland

Operator text

31 languages detected in sample 52.8% duplicate strings
rows5,268
null18 (0.3%)
unique2,476
len_min3
len_max65
len_mean19.494
len_median19.000
len_p9535.000
word_mean3.047
word_median3.000
n_empty0
n_duplicates2,774
duplicate_rate0.528
vocab_size2,370
readability_flesch_mean19.611
emoji_rate0.000
url_rate0.000
one_word_rate0.165
allcaps_rate0.037
boilerplate_rate0.000
Show data table
Character-length distribution for Operator (mean: 19.493904761904762).
charscount
3 – 596
5 – 6233
6 – 8140
8 – 9462
9 – 11169
11 – 12184
12 – 14128
14 – 15395
15 – 17270
17 – 18447
18 – 20407
20 – 22143
22 – 23340
23 – 25205
25 – 26542
26 – 28166
28 – 29229
29 – 31102
31 – 32194
32 – 3454
34 – 36127
36 – 3762
37 – 3935
39 – 4030
40 – 4213
42 – 4325
43 – 4511
45 – 465
46 – 487
48 – 507
50 – 517
51 – 530
53 – 549
54 – 561
56 – 573
57 – 590
59 – 601
60 – 620
62 – 630
63 – 651
Sample values (first 10)
  1. Military - German Navy
  2. Eagle Air
  3. Military - Russian Air Force
  4. Air Taxi - Wolfe Air Aviation Ltd.
  5. Military - Russian Air Force
  6. TriCoastal Air
  7. China Air Lines
  8. Aeroperlas
  9. Bristow Helicopters
  10. Deutsche Lufthansa

Flight # categorical

543 singleton categories 79.7% null
rows5,268
null4,199 (79.7%)
unique724
top_value-
top_rate0.063
cardinality724
entropy9.058
entropy_ratio0.953
Show data table
Top values for Flight # (20 unique shown, of 724 total).
valuecountshare
-671.3%
1100.2%
470.1%
660.1%
2160.1%
10160.1%
90160.1%
750.1%
20150.1%
70150.1%
70650.1%
70350.1%
240.1%
20340.1%
30440.1%
60140.1%
51440.1%
1140.1%
21740.1%
11440.1%
Top values (rank 1–20)
  1. - — 67
  2. 1 — 10
  3. 4 — 7
  4. 6 — 6
  5. 21 — 6
  6. 101 — 6
  7. 901 — 6
  8. 7 — 5
  9. 201 — 5
  10. 701 — 5
  11. 706 — 5
  12. 703 — 5
  13. 2 — 4
  14. 203 — 4
  15. 304 — 4
  16. 601 — 4
  17. 514 — 4
  18. 11 — 4
  19. 217 — 4
  20. 114 — 4

Route text

31 languages detected in sample 32.4% null
rows5,268
null1,706 (32.4%)
unique3,244
len_min4
len_max59
len_mean22.088
len_median20.000
len_p9537.000
word_mean4.065
word_median4.000
n_empty0
n_duplicates318
duplicate_rate0.089
vocab_size3,647
readability_flesch_mean27.155
emoji_rate0.000
url_rate0.000
one_word_rate0.041
allcaps_rate2.81e-04
boilerplate_rate0.000
Show data table
Character-length distribution for Route (mean: 22.088152723189218).
charscount
4 – 58
5 – 74
7 – 893
8 – 106
10 – 115
11 – 12100
12 – 1499
14 – 15155
15 – 16452
16 – 18247
18 – 19443
19 – 20170
20 – 22179
22 – 23286
23 – 25155
25 – 26135
26 – 27245
27 – 2994
29 – 30213
30 – 3271
32 – 3349
33 – 3474
34 – 3639
36 – 3740
37 – 3851
38 – 4020
40 – 4127
41 – 4212
42 – 4410
44 – 4519
45 – 476
47 – 484
48 – 4917
49 – 519
51 – 526
52 – 543
54 – 554
55 – 568
56 – 582
58 – 592
Sample values (first 10)
  1. Lympne, England - Rotterdam, The Netherlands
  2. Isfahan - Terhan
  3. Mexico City - Reynosa - Matamoros
  4. Anchorage, AK - Hoholitna River, AK
  5. Panchkhal - Tribuvan
  6. Kongolo - Goma
  7. Honolulu - Lihue
  8. Jomsom - Pokhara
  9. Jaffna - Colombo
  10. El Paso, TX - Pueblo, CO

Type text

53.3% duplicate strings
rows5,268
null27 (0.5%)
unique2,446
len_min4
len_max40
len_mean18.326
len_median16.000
len_p9534.000
word_mean2.718
word_median2.000
n_empty0
n_duplicates2,795
duplicate_rate0.533
vocab_size2,534
readability_flesch_mean69.259
emoji_rate0.000
url_rate0.000
one_word_rate7.44e-03
allcaps_rate9.54e-03
boilerplate_rate0.000
Show data table
Character-length distribution for Type (mean: 18.325701202060674).
charscount
4 – 56
5 – 65
6 – 76
7 – 819
8 – 832
8 – 957
9 – 10178
10 – 11255
11 – 12685
12 – 130
13 – 14522
14 – 15331
15 – 16441
16 – 17369
17 – 18208
18 – 18158
18 – 19154
19 – 20166
20 – 21154
21 – 220
22 – 23109
23 – 24120
24 – 25158
25 – 26188
26 – 26174
26 – 27107
27 – 2873
28 – 2985
29 – 3039
30 – 310
31 – 3266
32 – 3358
33 – 3455
34 – 3543
35 – 3616
36 – 3625
36 – 3716
37 – 389
38 – 3921
39 – 40133
Sample values (first 10)
  1. Zeppelin L-10 (airship)
  2. Beech King Air B90
  3. Swearingen SA-226T Metro II
  4. de Havilland Canada DHC-6 Twin Otter 300
  5. Bell UH-1H / Bell UH-1H (helicopter)
  6. Swearingen SA.226TC Metro II
  7. Handley Page Dart Herald 201
  8. de Havilland Canada DHC-6 Twin Otter 300
  9. Hawker Siddeley HS-748-357/2B SCD
  10. Lockheed Vega

Registration text

99.4% of rows are unique strings 99.0% rows are a single word 99.2% rows are all-caps 95th-percentile length under 20 chars
rows5,268
null335 (6.4%)
unique4,905
len_min1
len_max15
len_mean6.394
len_median6.000
len_p9510.000
word_mean1.018
word_median1.000
n_empty0
n_duplicates28
duplicate_rate5.68e-03
vocab_size4,948
readability_flesch_mean103.026
emoji_rate0.000
url_rate0.000
one_word_rate0.990
allcaps_rate0.992
boilerplate_rate0.000
Show data table
Character-length distribution for Registration (mean: 6.393877964727347).
charscount
1 – 11
1 – 20
2 – 236
2 – 20
2 – 30
3 – 364
3 – 30
3 – 40
4 – 469
4 – 40
4 – 50
5 – 5398
5 – 60
6 – 60
6 – 63228
6 – 70
7 – 70
7 – 7512
7 – 80
8 – 80
8 – 8267
8 – 90
9 – 942
9 – 90
9 – 100
10 – 10206
10 – 100
10 – 110
11 – 1110
11 – 120
12 – 120
12 – 1212
12 – 130
13 – 130
13 – 1341
13 – 140
14 – 140
14 – 148
14 – 150
15 – 1539
Sample values (first 10)
  1. 77
  2. FAC-1150
  3. HP-986PS
  4. 4R-HVA
  5. PP-SAD
  6. P4-AOD
  7. PI-C1131
  8. LV-ZSR
  9. RA-65617
  10. P-BALSA

cn/In text

98.4% rows are a single word 96.6% rows are all-caps 23.3% null 95th-percentile length under 20 chars
rows5,268
null1,228 (23.3%)
unique3,707
len_min1
len_max20
len_mean5.645
len_median5.000
len_p9510.000
word_mean1.026
word_median1.000
n_empty0
n_duplicates333
duplicate_rate0.082
vocab_size3,739
readability_flesch_mean121.205
emoji_rate0.000
url_rate0.000
one_word_rate0.984
allcaps_rate0.966
boilerplate_rate0.000
Show data table
Character-length distribution for cn/In (mean: 5.64480198019802).
charscount
1 – 123
1 – 20
2 – 2113
2 – 30
3 – 3604
3 – 40
4 – 4866
4 – 50
5 – 5895
5 – 60
6 – 6268
6 – 70
7 – 7269
7 – 80
8 – 8281
8 – 90
9 – 9457
9 – 100
10 – 10125
10 – 100
10 – 110
11 – 1192
11 – 120
12 – 1214
12 – 130
13 – 139
13 – 140
14 – 142
14 – 150
15 – 155
15 – 160
16 – 164
16 – 170
17 – 175
17 – 180
18 – 182
18 – 190
19 – 192
19 – 200
20 – 204
Sample values (first 10)
  1. HP-25
  2. 24805/1878
  3. 12
  4. 10670
  5. 20436/788
  6. 742
  7. 3817
  8. 45108
  9. 31-033B
  10. 1957

Aboard numeric

skew=+4.25 10.1% rows beyond 1.5 IQR
rows5,268
null22 (0.4%)
unique239
min0.000
max644.000
mean27.555
median13.000
std43.077
q15.000
q330.000
iqr25.000
skew4.247
kurtosis28.414
n_outliers529
outlier_rate0.101
zero_rate3.81e-04
Show data table
Histogram bins for Aboard (median: 13.0).
bincount
0 – 16.12978
16.1 – 32.21055
32.2 – 48.3430
48.3 – 64.4230
64.4 – 80.5129
80.5 – 96.6105
96.6 – 112.775
112.7 – 128.856
128.8 – 144.946
144.9 – 16135
161 – 177.127
177.1 – 193.216
193.2 – 209.38
209.3 – 225.47
225.4 – 241.59
241.5 – 257.64
257.6 – 273.79
273.7 – 289.83
289.8 – 305.99
305.9 – 3223
322 – 338.12
338.1 – 354.23
354.2 – 370.31
370.3 – 386.41
386.4 – 402.52
402.5 – 418.60
418.6 – 434.70
434.7 – 450.80
450.8 – 466.90
466.9 – 4830
483 – 499.10
499.1 – 515.20
515.2 – 531.32
531.3 – 547.40
547.4 – 563.50
563.5 – 579.60
579.6 – 595.70
595.7 – 611.80
611.8 – 627.90
627.9 – 6441

Fatalities numeric

skew=+4.95 8.4% rows beyond 1.5 IQR
rows5,268
null12 (0.2%)
unique191
min0.000
max583.000
mean20.068
median9.000
std33.200
q13.000
q323.000
iqr20.000
skew4.948
kurtosis42.791
n_outliers444
outlier_rate0.084
zero_rate0.011
Show data table
Histogram bins for Fatalities (median: 9.0).
bincount
0 – 14.573314
14.57 – 29.15980
29.15 – 43.72343
43.72 – 58.3215
58.3 – 72.8896
72.88 – 87.4590
87.45 – 10251
102 – 116.642
116.6 – 131.239
131.2 – 145.819
145.8 – 160.318
160.3 – 174.99
174.9 – 189.511
189.5 – 2043
204 – 218.62
218.6 – 233.26
233.2 – 247.82
247.8 – 262.35
262.3 – 276.94
276.9 – 291.51
291.5 – 306.11
306.1 – 320.60
320.6 – 335.21
335.2 – 349.82
349.8 – 364.40
364.4 – 378.90
378.9 – 393.50
393.5 – 408.10
408.1 – 422.70
422.7 – 437.20
437.2 – 451.80
451.8 – 466.40
466.4 – 4810
481 – 495.50
495.5 – 510.10
510.1 – 524.71
524.7 – 539.30
539.3 – 553.90
553.9 – 568.40
568.4 – 5831

Ground numeric

skew=+50.34
rows5,268
null22 (0.4%)
unique50
min0.000
max2,750
mean1.609
median0.000
std53.988
q10.000
q30.000
iqr0.000
skew50.336
kurtosis2,559
n_outliers219
outlier_rate0.042
zero_rate0.958
Show data table
Histogram bins for Ground (median: 0.0).
bincount
0 – 68.755235
68.75 – 137.58
137.5 – 206.20
206.2 – 2751
275 – 343.80
343.8 – 412.50
412.5 – 481.20
481.2 – 5500
550 – 618.80
618.8 – 687.50
687.5 – 756.20
756.2 – 8250
825 – 893.80
893.8 – 962.50
962.5 – 10310
1031 – 11000
1100 – 11690
1169 – 12380
1238 – 13060
1306 – 13750
1375 – 14440
1444 – 15120
1512 – 15810
1581 – 16500
1650 – 17190
1719 – 17880
1788 – 18560
1856 – 19250
1925 – 19940
1994 – 20620
2062 – 21310
2131 – 22000
2200 – 22690
2269 – 23380
2338 – 24060
2406 – 24750
2475 – 25440
2544 – 26120
2612 – 26810
2681 – 27502

Summary text

95.8% of rows are unique strings
rows5,268
null390 (7.4%)
unique4,673
len_min6
len_max1,954
len_mean200.736
len_median136.000
len_p95584.000
word_mean33.240
word_median23.000
n_empty0
n_duplicates205
duplicate_rate0.042
vocab_size12,513
readability_flesch_mean61.678
emoji_rate0.000
url_rate0.000
one_word_rate4.10e-04
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for Summary (mean: 200.73575235752358).
charscount
6 – 55822
55 – 1031039
103 – 152800
152 – 201547
201 – 250364
250 – 298280
298 – 347231
347 – 396172
396 – 444123
444 – 493128
493 – 54286
542 – 59050
590 – 63957
639 – 68833
688 – 73637
736 – 78519
785 – 83415
834 – 88316
883 – 93111
931 – 98010
980 – 10295
1029 – 10771
1077 – 11266
1126 – 11753
1175 – 12244
1224 – 12723
1272 – 13212
1321 – 13701
1370 – 14181
1418 – 14673
1467 – 15161
1516 – 15642
1564 – 16131
1613 – 16624
1662 – 17100
1710 – 17590
1759 – 18080
1808 – 18570
1857 – 19050
1905 – 19541
Sample values (first 10)
  1. Crashed into trees while attempting to land after being shot down by British and French aircraft.
  2. Flew into a box canyon and crashed at an elevation of 4,000 ft. VFR flight by the pilot into instrument meteorological conditions, and the pilot's failure to maintain sufficient altitude and/or clearance from mountainous terrain. Factors related to the accident were: the adverse…
  3. Midair collision. The Beechcraft was on a flight from Lyon to Lorient, approaching Lorient, when it requested permission to fly over the ocean liner Norway. While circling the Norway, it collided with the Cessna. One killed aboard the Cessna, 14 aboard the Beechcraft. Failure of…
  4. The aircraft crashed into a 8,000 ft. mountain in the Sierra Grande range while climbing en route from Comodoro Rivadavia to Cordoba in heavy rain and strong turbulence. The passengers included military personnel and their dependents.
  5. The helicopter collided with trees after experiencing engine failure. Pilot overshot two suitable landing areas.
  6. The jetliner crashed into the Black Sea and broke up in driving rain and low visibility after making a second attempt to land. The plane disappeared from radar screens just under four miles from shore and crashed after making a turn and heading toward Adler airport for a landing.…
  7. Due to heavy traffic, the flight was diverted from the planned route. The aircraft failed to follow the assigned airway and crashed into a cloud obscured Montseny Mountain while on approach. The deviation from the assigned airway may have been caused by malfunctioning equipment. …
  8. The aircraft crashed into the Persian Gulf and exploded in flames while attempting to land at Bahrain International Airport. The crew decided to perform a missed approach after it was determined the aircraft was coming in too high and fast. Instructions were given for a 180 degre…
  9. Diverted from Madang to Bagasin, overran the runway and crashed.
  10. Crashed into a radio antenna tower and tore off a wing in dense fog.