saturn

/home/coolhand/html/datavis/data_trove/data/quirky/strange_places_v5.2.json 354,770 rows sample n=354,770 seed 42 2026-06-22T01:08:51+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/strange_places_v5.2.json
Total rows354,770
Profiled sample354,770
Columns48
Generated2026-06-22T01:08:51+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
latitudenumeric0.0%
longitudenumeric0.0%
nametext0.0%
descriptiontext0.0%
categorycategorical0.0%
datetext41.9%
countrycategorical55.3%
citytext82.9%
statecategorical58.5%
shapecategorical82.9%
duration_secondsnumeric82.9%
mass_gunknown0.0%
meteorite_classcategorical90.9%
fall_typecategorical90.9%
magnitudecategorical76.7%
depth_kmnumeric98.9%
placetext98.9%
earthquake_typecategorical98.9%
volcano_typecategorical100.0%
elevation_munknown0.0%
statuscategorical100.0%
last_eruptioncategorical100.0%
injuriescategorical75.6%
fatalitiescategorical75.6%
length_milestext79.8%
width_yardscategorical79.8%
typecategorical98.6%
temperaturecategorical98.6%
sourcecategorical51.6%
vessel_typecategorical99.0%
cargocategorical99.0%
peak_brightness_altitude_kmcategorical99.8%
velocity_km_scategorical99.9%
energy_joulescategorical99.8%
event_typecategorical95.8%
damage_propertytext95.8%
cave_typecategorical100.0%
cave_length_mcategorical99.8%
cave_depth_mcategorical99.9%
accesscategorical98.0%
cave_reftext97.9%
osm_idnumeric75.1%
osm_typecategorical75.1%
place_typecategorical94.9%
abandoned_yearcategorical99.7%
abandoned_reasonunknown0.0%
former_populationcategorical99.3%
heritagecategorical100.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This is a 354,770-row mashup of 14 heterogeneous 'strange places' datasets — spanning tornadoes, UFO sightings, cave entrances, meteorites, ghost towns, earthquakes, shipwrecks, and more — unified under a single 'category' column. The most important thing to examine first is the category distribution, which reveals that no single source dominates but tornadoes (~71K), caves (~70K), and UFO sightings (~61K) each make up roughly 17–20% of records. A second key signal is the pervasive sparsity: most domain-specific columns (depth_km, duration_seconds, shape, damage_property) carry null rates of 80–99%, meaning each column is only meaningful for the subset of rows belonging to its originating dataset. UFO sighting durations show extreme right-skew (median 180 s, max 66 million s) and earthquake depths are similarly skewed, both worth closer inspection within their respective subsets.

date high anthropic:default

This column contains ISO-format date strings (YYYY-MM-DD), stored as text rather than a proper date type, representing what appear to be annual publication or release dates — all top values fall on January 1st of a given year, suggesting date precision is year-level only. Two major data quality issues stand out: a 41.88% null rate (including 17,854 empty strings) and an 88.6% duplicate rate across 354,770 rows with only 23,500 unique values. The 'allcaps' alert is a false positive from the Saturn parser — ISO date strings trigger it due to lack of lowercase letters.

length_miles high anthropic:default

This column stores numeric distance measurements (miles) encoded as text strings — all values are single tokens like '0.1', '0.5', '1.0' with a mean character length of 3.69 and a max of 8. Two signals demand attention: the null rate is extremely high at 79.76%, meaning roughly four in five rows carry no value, and the duplicate rate among non-null values is 94.72%, reflecting a coarse, rounded measurement scale (only 3,795 unique values across 354,770 rows). The top value '0.1' alone appears 15,456 times, suggesting heavy concentration at short distances.

city high anthropic:default

This column contains US city names, confirmed by top values (Seattle, Phoenix, Las Vegas, Portland, Los Angeles) and top words ('beach', 'san', 'lake', 'springs'). The most striking issue is the 82.91% null rate — only roughly 1 in 6 rows has a city value at all, making this field sparsely populated. Despite that sparsity, the duplicate rate among non-null values is 84.91%, indicating that populated rows cluster around a relatively small set of repeated cities (9,149 unique values from 4,862 vocab tokens). The word 'city' appearing 531 times in top_words suggests some entries may literally contain placeholder text like 'Kansas City' or 'Oklahoma City' rather than being data quality noise.

duration_seconds high anthropic:default

This column records event or session durations in seconds, with values ranging from 0.01 s to 66,276,000 s (~766 days). The most striking issue is that 82.91% of rows are null, meaning duration is only captured for roughly 1-in-6 records. Among non-null values the distribution is catastrophically right-skewed (skew = 135.86, kurtosis = 19,379.84): the median is just 180 s while the mean inflates to 5,410 s, and 7,753 rows (12.79% of non-null) are flagged as outliers—the maximum of 66,276,000 s is almost certainly erroneous or represents a sentinel/unclosed-session value.

osm_type high anthropic:default

This column stores OpenStreetMap geometry type classifications, taking only three possible values: 'node', 'way', and 'relation'. Two signals demand attention: 75.08% of the 354,770 rows are null, meaning OSM type is only recorded for roughly a quarter of records, and among the non-null values the distribution is severely imbalanced — 'node' accounts for 96.39% of non-null entries (85,204 occurrences) versus 2,560 'way' and just 632 'relation'. The near-zero entropy ratio (0.158) confirms this column carries very little discriminative information as-is.

fall_type high anthropic:default

This column classifies meteorite recovery type, distinguishing between specimens that were 'Found' (discovered without an observed fall) versus 'Fell' (witnessed falling). Striking is the 90.93% null rate, meaning only ~32,186 of 354,770 records have a value at all. Among those with values, the distribution is heavily skewed: 'Found' accounts for 96.6% (31,090) versus 'Fell' at just 3.4% (1,096), which aligns with real-world meteorite data but constitutes a severe class imbalance alert.

description high anthropic:default

This column contains free-text descriptions of geographic or physical features — cave entrances, former hamlets, hot springs, shipwrecks, and tornado tracks (e.g. 'F0, 0.1mi long, 10yd wide') dominate the top values, suggesting a points-of-interest or geographic gazetteer dataset. The duplicate rate is strikingly high at 38.3%, driven by 136,053 repeated values out of 354,770 rows, largely from templated entries like 'Cave entrance' (52,067 occurrences) and storm-track boilerplate. Text is overwhelmingly English (4,893 sampled as English) but 21 languages are detected including German (28), Bashkir (13), Russian (9), and Belarusian (9), flagging a multilingual minority that may require separate handling. The wide spread between median length (40 chars) and mean (114 chars) with a p95 of 491 indicates a heavily right-skewed length distribution.

name high anthropic:default

This column contains the name or title of individual records in what appears to be a multi-domain dataset covering natural features (caves), weather events (tornadoes by US state), and UFO sightings. The duplicate rate is strikingly high at 46.5%, driven largely by templated strings like 'Unnamed Cave' (19,962 occurrences) and repeated tornado/state/count patterns. Despite the predominantly English content (3,363 language-detected values skewing English), the multilingual alert flags 30 detected languages including German (230), French (279), Italian (236), Russian (102), and Spanish (156), suggesting internationally-sourced named entities mixed into the dataset. Analysts should note that near-half of values are non-unique, so this column cannot serve as a reliable identifier.

latitude high anthropic:default

This column contains geographic latitude values, ranging from -87.37° to 88.5°, consistent with global coordinates. The distribution is surprisingly left-skewed (skew = -2.84) with high kurtosis (7.30), meaning there is a heavy tail toward negative (southern hemisphere) latitudes despite the median sitting at ~40.6°N — suggesting the bulk of records are mid-latitude northern hemisphere but a notable minority of extreme southern values pull the mean down. About 9.4% of rows (33,355) are flagged as outliers, likely driven by records near the poles or far southern hemisphere; the near-zero zero_rate (0.06%) is negligible but worth checking for sentinel nulls encoded as 0.

place_type high anthropic:default

This column captures the settlement/place classification type, likely from an OpenStreetMap-style geographic dataset, with values such as 'hamlet', 'isolated_dwelling', 'village', and 'town'. The most striking signal is the extreme null rate of 94.88%, meaning only ~18,400 of 354,770 rows carry a value — the column is essentially sparse. Among populated rows, 'hamlet' dominates at 66.57% of non-null values, and the presence of a raw 'yes' tag (131 occurrences) indicates dirty or uncleaned OSM data that needs remediation.

osm_id high anthropic:default

This column contains OpenStreetMap (OSM) numeric identifiers, likely referencing geographic features such as ways, relations, or nodes in the OSM database. The most striking issue is a 75.08% null rate across 354,770 rows, meaning only about one quarter of records carry an OSM linkage. Despite 88,395 unique values against ~88,693 non-null rows, the near-unique cardinality and platykurtic distribution (kurtosis ≈ -1.23) are consistent with IDs drawn broadly across OSM's ID space (min ~1.3M, max ~13.5B), with no outliers detected.

source high anthropic:default

This column records the data provider or attribution source for each row, with only 4 distinct values drawn from named external datasets (OpenStreetMap contributors, The Megalithic Portal, NOAA Storm Events Database, OpenStreetMap). The most striking signal is a 51.56% null rate — meaning over half of all 354,770 rows carry no source attribution, which is a data quality concern for provenance tracking. The top value 'OpenStreetMap contributors' accounts for 51.44% of non-null rows (88,396 records), while the closely related 'OpenStreetMap' (8,656 records) suggests inconsistent attribution for the same upstream source.

country high anthropic:default

This column captures country of origin or residence, using a mix of ISO 2-letter codes and full-name variants. The most alarming issue is a 55.29% null rate, meaning over half of 354,770 rows carry no country value. Compounding this, 'USA' and 'US' are effectively the same country but stored as two distinct values (86,583 and 60,634 respectively), together accounting for ~54.6% of non-null records — indicating inconsistent data entry that inflates apparent cardinality. There are also 9,497 empty-string records that escaped null detection, and the distribution is heavily US-dominated with 28 unique values at low entropy (1.34).

state high anthropic:default

This column contains US state abbreviations (and possibly territories or non-standard codes given 118 unique values vs. the expected 50–60), making it a geographic categorical feature. The most critical signal is a 58.5% null rate, meaning over half the 354,770 rows have no state recorded — a severe data quality issue. The top value is 'TX' at 8.6% of non-null rows, with CA and FL following; the 118-cardinality (nearly double the 50 US states) suggests the presence of territories, foreign country codes, or dirty values worth auditing.

fatalities high anthropic:default

This column represents a count of fatalities per incident, stored as a categorical type despite being inherently numeric. The null rate is severe at 75.59%, meaning only ~86,313 of 354,770 rows have a value. Among non-null rows, 92.86% record zero fatalities, with a long tail reaching at least 10; the low entropy ratio (0.088) confirms extreme concentration at '0'.

injuries medium anthropic:default

This column records injury counts per incident, stored as a categorical type despite being numeric in nature — values are integers ('0', '1', '2', …) with a cardinality of 233 distinct values. The null rate is severely high at 75.59%, meaning only ~86,827 of 354,770 rows have a recorded value, which is flagged as an alert. Among non-null rows, 85.4% report zero injuries, producing a heavily right-skewed distribution with low entropy (1.23, entropy ratio 0.157). The presence of 233 distinct values suggests some entries may encode ranges, text annotations, or data-entry anomalies beyond simple integers.

magnitude high anthropic:default

This column represents a magnitude scale (likely seismic, stellar, or similar physical measurement) stored as a categorical type despite being fundamentally numeric — values include integers (0, 1, 2, 3, 4) and decimals (4.5, 4.6, 4.7, 1.75). The null rate of 76.7% is alarming and triggered an alert, meaning over three-quarters of the 354,770 rows carry no value. An additional surprise is the presence of '-9', which appears 1,278 times and is almost certainly a sentinel/missing-value code rather than a true measurement. The top value '0' dominates non-null records at 44.4% of non-null observations, and entropy_ratio of 0.31 confirms a heavily skewed, low-diversity distribution despite 294 unique string representations.

width_yards medium anthropic:default

This column represents the width of some geographic or physical feature measured in yards, stored as a categorical type despite being numeric in nature. Nearly 80% of values are null (null_rate = 0.7976), making missingness the dominant signal. Among the 71,493 non-null records, values are round numbers (10, 50, 100, 30, 20, 200…) suggesting manual or estimated entries rather than precise measurements. The top value '10' accounts for 20.2% of non-null rows, and with 437 unique values and an entropy ratio of 0.51, the distribution is moderately concentrated.

shape high anthropic:default

This column captures the reported shape of UFO/unidentified aerial phenomena sightings, with 28 distinct categories such as 'light', 'triangle', 'circle', and 'fireball'. The most striking issue is an 82.91% null rate across 354,770 rows, meaning only ~60,600 records have a shape value at all. Among non-null records, 'light' dominates at 21.27%, and the presence of catch-all categories like 'unknown' (4,359) and 'other' (4,209) further dilutes the informativeness of the non-missing data.

meteorite_class high anthropic:default

This column contains meteorite classification codes (e.g., 'L6', 'H5', 'CM2'), representing standard petrologic-type designations for chondrite and other meteorite classes. The most striking feature is an extremely high null rate of 90.93%, meaning only roughly 32,000 of 354,770 rows carry a classification. Among classified records the distribution is moderately concentrated — 'L6' alone accounts for 20.3% of non-null values — with 395 distinct classes and an entropy ratio of ~0.51, indicating moderate spread across the taxonomy.

abandoned_reason low anthropic:default

This column contains abandoned-reason codes or labels — likely a categorical field recording why a record, transaction, or session was abandoned. The profiler emitted a 'skipped' alert with no stats or uniqueness counts, meaning the column's type could not be resolved and no frequency analysis was performed. With 354,770 non-null rows and a null rate of exactly 0.0, the field is fully populated, but its true cardinality, distribution, and value content are entirely unknown from this evidence.

elevation_m low anthropic:default

This column records elevation in metres for 354,770 rows with no nulls. The profiler emitted a 'skipped' alert and returned no computed statistics, so distribution shape, range, skew, and uniqueness are entirely unknown from this evidence. The name strongly implies a continuous numeric geographic feature, but no further characterisation can be made without re-running profiling.

mass_g low anthropic:default

The column 'mass_g' likely represents mass measurements in grams across 354,770 records, with zero nulls indicating complete data coverage. No distributional statistics are available — the profiler skipped this column — so skew, range, outliers, and uniqueness cannot be assessed from the evidence provided.

longitude high anthropic:default

This column contains geographic longitude values for 354,770 records, spanning the full valid range from -179.28° to 180°. The distribution is moderately right-skewed (skew = 0.755) with a mean of -31.75° and median of -42.66°, indicating a concentration of records in the Western Hemisphere (Americas/Atlantic). The IQR of 104.81° is extremely wide, suggesting genuinely global coverage rather than a region-specific dataset, and only 827 values (0.23%) are flagged as outliers.

category high anthropic:default

This column is a data-source/event-type label drawn from 14 distinct categories across 354,770 rows with zero nulls. The categories span scientific datasets (NOAA tornadoes, NASA meteorites, OSM features) and paranormal/anomalous phenomena (UFO sightings, Bigfoot, haunted places, megalithic portal), suggesting this is a multi-source 'strange phenomena' aggregation dataset. Distribution is moderately uneven — the top value 'noaa_tornadoes' holds 20.2% of rows (71,813), while 'bigfoot_sightings' has only 3,797 — but entropy of 2.99 against a ratio of 0.78 indicates reasonable spread across classes. No nulls and clean cardinality make this an immediately usable stratification variable.

Numeric correlation

Show data table
Pearson correlation across 5 numeric columns (values clipped to 2 decimals).
latitudelongitudeduration_secondsdepth_kmosm_id
latitude+1.00-0.39+0.01+0.01+0.34
longitude-0.39+1.00-0.05-0.05+0.17
duration_seconds+0.01-0.05+1.00-0.03+0.01
depth_km+0.01-0.05-0.03+1.00-0.09
osm_id+0.34+0.17+0.01-0.09+1.00

Languages detected

Per-string language detection across text columns (sampled).

Show data table
Per-language counts (total 9,808 detected strings).
langcountshare
en831084.7%
fr2852.9%
de2582.6%
it2402.4%
es1581.6%
ru1111.1%
ca550.6%
da380.4%
pt360.4%
nl300.3%
pl300.3%
eu290.3%
hu240.2%
be230.2%
id180.2%
sv170.2%
ar150.2%
cs140.1%
uk130.1%
no130.1%
ba130.1%
sk100.1%
ro90.1%
cy80.1%
el80.1%
tr70.1%
fi70.1%
ja70.1%
az60.1%
hr50.1%
ceb50.1%
sl20.0%
la10.0%
tt10.0%
ko10.0%
als10.0%

latitude numeric

skew=-2.84 9.4% rows beyond 1.5 IQR
rows354,770
null0 (0.0%)
unique215,964
min-87.367
max88.500
mean32.661
median40.598
std31.011
q133.689
q346.535
iqr12.846
skew-2.840
kurtosis7.302
n_outliers33,355
outlier_rate0.094
zero_rate6.37e-04
Show data table
Histogram bins for latitude (median: 40.5983333).
bincount
-87.37 – -82.977090
-82.97 – -78.571218
-78.57 – -74.184088
-74.18 – -69.789707
-69.78 – -65.384
-65.38 – -60.993
-60.99 – -56.591
-56.59 – -52.1920
-52.19 – -47.872
-47.8 – -43.4110
-43.4 – -39287
-39 – -34.61726
-34.61 – -30.21991
-30.21 – -25.81714
-25.81 – -21.42902
-21.42 – -17.02581
-17.02 – -12.62297
-12.62 – -8.227341
-8.227 – -3.83603
-3.83 – 0.5667690
0.5667 – 4.963435
4.963 – 9.361927
9.36 – 13.761608
13.76 – 18.151738
18.15 – 22.556214
22.55 – 26.955180
26.95 – 31.3420824
31.34 – 35.7447917
35.74 – 40.1455501
40.14 – 44.5377688
44.53 – 48.9345229
48.93 – 53.3328613
53.33 – 57.7224054
57.72 – 62.127237
62.12 – 66.521480
66.52 – 70.91490
70.91 – 75.3192
75.31 – 79.7186
79.71 – 84.19
84.1 – 88.53

longitude numeric

rows354,770
null0 (0.0%)
unique223,129
min-179.283
max180.000
mean-31.753
median-42.657
std72.107
q1-92.080
q312.734
iqr104.814
skew0.755
kurtosis0.116
n_outliers827
outlier_rate2.33e-03
zero_rate0.000
Show data table
Histogram bins for longitude (median: -42.65733525).
bincount
-179.3 – -170.390
-170.3 – -161.3969
-161.3 – -152.31120
-152.3 – -143.4814
-143.4 – -134.4413
-134.4 – -125.4839
-125.4 – -116.418752
-116.4 – -107.48803
-107.4 – -98.4421820
-98.44 – -89.4647491
-89.46 – -80.4845346
-80.48 – -71.524836
-71.5 – -62.524804
-62.52 – -53.53485
-53.53 – -44.55608
-44.55 – -35.57503
-35.57 – -26.5976
-26.59 – -17.61376
-17.61 – -8.6242511
-8.624 – 0.358340337
0.3583 – 9.3429096
9.34 – 18.3236101
18.32 – 27.312918
27.3 – 36.2913238
36.29 – 45.276091
45.27 – 54.255213
54.25 – 63.234220
63.23 – 72.22604
72.22 – 81.23575
81.2 – 90.18777
90.18 – 99.16745
99.16 – 108.11540
108.1 – 117.11434
117.1 – 126.11238
126.1 – 135.11806
135.1 – 144.11475
144.1 – 153.1730
153.1 – 1628482
162 – 1713693
171 – 180801

name text

31 languages detected in sample 46.5% duplicate strings
rows354,770
null0 (0.0%)
unique189,861
len_min1
len_max235
len_mean19.997
len_median17.000
len_p9532.000
word_mean3.564
word_median4.000
n_empty0
n_duplicates164,909
duplicate_rate0.465
vocab_size15,811
readability_flesch_mean64.794
emoji_rate2.82e-06
url_rate0.000
one_word_rate0.094
allcaps_rate0.013
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 19.99670208867717).
charscount
1 – 78769
7 – 1361484
13 – 19132248
19 – 2449793
24 – 3074096
30 – 3619324
36 – 423592
42 – 481239
48 – 54408
54 – 60415
60 – 65461
65 – 71495
71 – 77537
77 – 83395
83 – 89431
89 – 95348
95 – 100245
100 – 106167
106 – 112117
112 – 11869
118 – 12451
124 – 13026
130 – 13622
136 – 14114
141 – 1475
147 – 1536
153 – 1591
159 – 1653
165 – 1714
171 – 1760
176 – 1822
182 – 1880
188 – 1940
194 – 2001
200 – 2060
206 – 2120
212 – 2170
217 – 2230
223 – 2290
229 – 2352
Sample values (first 10)
  1. Kainsaz
  2. El Chaparral
  3. Cleaven Dyke
  4. Dolmen de Chams
  5. Tornado in KS, 20
  6. The Harrison Cemetery
  7. Tornado in NC, 37
  8. Mousseau-les-Bray 2
  9. Dolmen de La Griutera
  10. Grove Mountains 021502

description text

22 languages detected in sample 38.3% duplicate strings
rows354,770
null0 (0.0%)
unique218,717
len_min1
len_max500
len_mean114.050
len_median40.000
len_p95491.000
word_mean24.071
word_median7.000
n_empty0
n_duplicates136,053
duplicate_rate0.383
vocab_size38,639
readability_flesch_mean66.652
emoji_rate0.000
url_rate8.15e-03
one_word_rate0.010
allcaps_rate4.26e-03
boilerplate_rate2.51e-04
Show data table
Character-length distribution for description (mean: 114.04985201679962).
charscount
1 – 1371484
13 – 2653646
26 – 3849789
38 – 5114135
51 – 6330283
63 – 7628147
76 – 888588
88 – 1016163
101 – 1135647
113 – 1264082
126 – 1389912
138 – 1513581
151 – 163678
163 – 176439
176 – 188426
188 – 201525
201 – 213479
213 – 226430
226 – 238412
238 – 250576
250 – 2631185
263 – 2751361
275 – 2881320
288 – 3005787
300 – 313878
313 – 3251496
325 – 3381410
338 – 3501914
350 – 3631940
363 – 3751927
375 – 3881960
388 – 4002295
400 – 4132190
413 – 4252487
425 – 4382493
438 – 4503014
450 – 4632997
463 – 4753598
475 – 4885468
488 – 50019628
Sample values (first 10)
  1. Meteorite Kainsaz - CO3.2. Mass: Unknowng. Found: Fell.
  2. Former isolated_dwelling; Former population: 2024
  3. Type: Cursus County/Region: Perth and Kinross Alt Name: Blairgowrie Road Pos Accuracy: 4 Condition: 3 (5 is best) Ambience: 3 Access: 3 (5 is best) Lat: 56.548916969168    Long: -3.362524704168 Cursus in Perth and Kinro…
  4. Type: Burial Chamber or Dolmen County/Region: Languedoc:Gard (30) Alt Name: Bordezac dolmen Pos Accuracy: 5 Condition: 3 (5 is best) Ambience: 4 Access: 4 (5 is best) Lat: 44.3113    Long: 4.06723 Dolmen de Chams (or Bo…
  5. F0, 0.39mi long, 50yd wide
  6. the site in question is the "glowing tombstone" , at a distance you can see one tombstone glow in the dark. The tombstone glows until you get to the edge of the cemetery, then it goes dark. You cant really pinpoint the actual tomb and there's no lights around to illuminate it eit…
  7. F0, 0.83mi long, 30yd wide
  8. Type: Passage Grave County/Region: Ile-de-France:Seine-et-Marne 77 Pos Accuracy: 2 (5 is best) Lat: 48.41    Long: 3.229 Hypogee in Ile-de-France:Seine-et-Marne 77. Mousseau-les-Bray 2 is a Hypogee in the community of Mouss…
  9. Type: Burial Chamber or Dolmen County/Region: Cataluña Pos Accuracy: 5 (5 is best) Lat: 41.813443    Long: 2.1982 Burial Chamber (Dolmen) in Cataluña. .... (c) Meg. Portal contributors. Link To More Information
  10. Meteorite Grove Mountains 021502 - L6. Mass: Unknowng. Found: Found.

category categorical

rows354,770
null0 (0.0%)
unique14
top_valuenoaa_tornadoes
top_rate0.202
cardinality14
entropy2.985
entropy_ratio0.784
Show data table
Top values for category (14 unique shown, of 14 total).
valuecountshare
noaa_tornadoes7181320.2%
osm_caves7024219.8%
ufo_sightings6063217.1%
megalithic_portal6002816.9%
nasa_meteorites321869.1%
osm_ghost_towns181545.1%
noaa_storm_events147704.2%
haunted_places97172.7%
noaa_thermal_springs50031.4%
bigfoot_sightings37971.1%
usgs_earthquakes37421.1%
noaa_shipwrecks36531.0%
nasa_fireballs8630.2%
usgs_volcanoes1700.0%
Top values (rank 1–20)
  1. noaa_tornadoes — 71,813
  2. osm_caves — 70,242
  3. ufo_sightings — 60,632
  4. megalithic_portal — 60,028
  5. nasa_meteorites — 32,186
  6. osm_ghost_towns — 18,154
  7. noaa_storm_events — 14,770
  8. haunted_places — 9,717
  9. noaa_thermal_springs — 5,003
  10. bigfoot_sightings — 3,797
  11. usgs_earthquakes — 3,742
  12. noaa_shipwrecks — 3,653
  13. nasa_fireballs — 863
  14. usgs_volcanoes — 170

date text

99.5% rows are a single word 91.3% rows are all-caps 41.9% null 95th-percentile length under 20 chars 88.6% duplicate strings
rows354,770
null148,570 (41.9%)
unique23,500
len_min0
len_max30
len_mean9.331
len_median10.000
len_p9510.000
word_mean1.005
word_median1.000
n_empty17,854
n_duplicates182,700
duplicate_rate0.886
vocab_size8,565
readability_flesch_mean112.113
emoji_rate0.000
url_rate0.000
one_word_rate0.995
allcaps_rate0.913
boilerplate_rate0.000
Show data table
Character-length distribution for date (mean: 9.330581959262851).
charscount
0 – 117854
1 – 20
2 – 21
2 – 30
3 – 40
4 – 4151
4 – 513
5 – 60
6 – 71
7 – 821
8 – 85
8 – 90
9 – 103
10 – 10183475
10 – 1117
11 – 120
12 – 136
13 – 148
14 – 1410
14 – 150
15 – 168
16 – 167
16 – 1712
17 – 180
18 – 19180
19 – 204425
20 – 201
20 – 210
21 – 221
22 – 220
22 – 230
23 – 240
24 – 250
25 – 260
26 – 260
26 – 270
27 – 280
28 – 280
28 – 290
29 – 301
Sample values (first 10)
  1. 1932-01-01
  2. 2015-05-08
  3. 1971-02-21
  4. 2010-07-28
  5. 2005-06-19
  6. 2009-03-26
  7. 2009-01-01

country categorical

55.3% null
rows354,770
null196,154 (55.3%)
unique28
top_valueUSA
top_rate0.546
cardinality28
entropy1.341
entropy_ratio0.279
Show data table
Top values for country (20 unique shown, of 28 total).
valuecountshare
USA8658324.4%
US6063417.1%
94972.7%
RU14810.4%
BY2050.1%
KZ1560.0%
HT130.0%
KY90.0%
AU60.0%
DE50.0%
GB50.0%
IQ30.0%
RO20.0%
EC20.0%
IT20.0%
TW10.0%
MX10.0%
CW10.0%
BS10.0%
MT10.0%
Top values (rank 1–20)
  1. USA — 86,583
  2. US — 60,634
  3. — 9,497
  4. RU — 1,481
  5. BY — 205
  6. KZ — 156
  7. HT — 13
  8. KY — 9
  9. AU — 6
  10. DE — 5
  11. GB — 5
  12. IQ — 3
  13. RO — 2
  14. EC — 2
  15. IT — 2
  16. TW — 1
  17. MX — 1
  18. CW — 1
  19. BS — 1
  20. MT — 1

city text

72.9% rows are a single word 82.9% null 95th-percentile length under 20 chars 84.9% duplicate strings
rows354,770
null294,138 (82.9%)
unique9,149
len_min3
len_max23
len_mean8.829
len_median9.000
len_p9514.000
word_mean1.288
word_median1.000
n_empty0
n_duplicates51,483
duplicate_rate0.849
vocab_size4,862
readability_flesch_mean21.737
emoji_rate0.000
url_rate0.000
one_word_rate0.729
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for city (mean: 8.828638342789286).
charscount
3 – 474
4 – 40
4 – 41497
4 – 50
5 – 63313
6 – 60
6 – 67780
6 – 70
7 – 89464
8 – 80
8 – 87830
8 – 90
9 – 108339
10 – 100
10 – 107192
10 – 110
11 – 125509
12 – 120
12 – 123694
12 – 130
13 – 142353
14 – 140
14 – 141537
14 – 150
15 – 16800
16 – 160
16 – 16787
16 – 170
17 – 18203
18 – 180
18 – 18155
18 – 190
19 – 2044
20 – 200
20 – 2025
20 – 210
21 – 2213
22 – 220
22 – 2221
22 – 232
Sample values (first 10)
  1. anchorage
  2. duncannon
  3. pflugerville
  4. whitesburg
  5. baton rouge
  6. boscobel
  7. hanover
  8. manassas
  9. amarillo
  10. litchfield park

state categorical

58.5% null
rows354,770
null207,555 (58.5%)
unique118
top_valueTX
top_rate0.086
cardinality118
entropy5.668
entropy_ratio0.824
Show data table
Top values for state (20 unique shown, of 118 total).
valuecountshare
TX127273.6%
CA87912.5%
FL73722.1%
IL53291.5%
KS51271.4%
OK50561.4%
MO39051.1%
CO37661.1%
WA36481.0%
IA36471.0%
OH35211.0%
NE35061.0%
AL31960.9%
PA31930.9%
NC31860.9%
GA31120.9%
MN31100.9%
MS30860.9%
NY29510.8%
LA29270.8%
Top values (rank 1–20)
  1. TX — 12,727
  2. CA — 8,791
  3. FL — 7,372
  4. IL — 5,329
  5. KS — 5,127
  6. OK — 5,056
  7. MO — 3,905
  8. CO — 3,766
  9. WA — 3,648
  10. IA — 3,647
  11. OH — 3,521
  12. NE — 3,506
  13. AL — 3,196
  14. PA — 3,193
  15. NC — 3,186
  16. GA — 3,112
  17. MN — 3,110
  18. MS — 3,086
  19. NY — 2,951
  20. LA — 2,927

shape categorical

82.9% null
rows354,770
null294,138 (82.9%)
unique28
top_valuelight
top_rate0.213
cardinality28
entropy3.774
entropy_ratio0.785
Show data table
Top values for shape (20 unique shown, of 28 total).
valuecountshare
light128953.6%
triangle62681.8%
circle58901.7%
fireball49391.4%
unknown43591.2%
other42091.2%
sphere41341.2%
disk38531.1%
oval28810.8%
formation19080.5%
cigar15690.4%
changing15170.4%
flash10250.3%
rectangle10100.3%
cylinder9770.3%
diamond8840.2%
chevron7740.2%
teardrop5600.2%
egg5550.2%
cone2350.1%
Top values (rank 1–20)
  1. light — 12,895
  2. triangle — 6,268
  3. circle — 5,890
  4. fireball — 4,939
  5. unknown — 4,359
  6. other — 4,209
  7. sphere — 4,134
  8. disk — 3,853
  9. oval — 2,881
  10. formation — 1,908
  11. cigar — 1,569
  12. changing — 1,517
  13. flash — 1,025
  14. rectangle — 1,010
  15. cylinder — 977
  16. diamond — 884
  17. chevron — 774
  18. teardrop — 560
  19. egg — 555
  20. cone — 235

duration_seconds numeric

82.9% null skew=+135.86 12.8% rows beyond 1.5 IQR
rows354,770
null294,138 (82.9%)
unique444
min0.010
max66,276,000
mean5,410
median180.000
std414,387
q130.000
q3600.000
iqr570.000
skew135.861
kurtosis19,380
n_outliers7,753
outlier_rate0.128
zero_rate0.000
Show data table
Histogram bins for duration_seconds (median: 180.0).
bincount
0.01 – 1.657e+0660612
1.657e+06 – 3.314e+0611
3.314e+06 – 4.971e+060
4.971e+06 – 6.628e+063
6.628e+06 – 8.285e+061
8.285e+06 – 9.941e+060
9.941e+06 – 1.16e+072
1.16e+07 – 1.326e+070
1.326e+07 – 1.491e+070
1.491e+07 – 1.657e+070
1.657e+07 – 1.823e+070
1.823e+07 – 1.988e+070
1.988e+07 – 2.154e+070
2.154e+07 – 2.32e+070
2.32e+07 – 2.485e+070
2.485e+07 – 2.651e+070
2.651e+07 – 2.817e+070
2.817e+07 – 2.982e+070
2.982e+07 – 3.148e+070
3.148e+07 – 3.314e+070
3.314e+07 – 3.479e+070
3.479e+07 – 3.645e+070
3.645e+07 – 3.811e+070
3.811e+07 – 3.977e+070
3.977e+07 – 4.142e+070
4.142e+07 – 4.308e+070
4.308e+07 – 4.474e+070
4.474e+07 – 4.639e+070
4.639e+07 – 4.805e+070
4.805e+07 – 4.971e+070
4.971e+07 – 5.136e+070
5.136e+07 – 5.302e+072
5.302e+07 – 5.468e+070
5.468e+07 – 5.633e+070
5.633e+07 – 5.799e+070
5.799e+07 – 5.965e+070
5.965e+07 – 6.131e+070
6.131e+07 – 6.296e+070
6.296e+07 – 6.462e+070
6.462e+07 – 6.628e+071

mass_g unknown

no profiler for kind=unknown
rows354,770
null0 (0.0%)

meteorite_class categorical

90.9% null
rows354,770
null322,584 (90.9%)
unique395
top_valueL6
top_rate0.203
cardinality395
entropy4.370
entropy_ratio0.507
Show data table
Top values for meteorite_class (20 unique shown, of 395 total).
valuecountshare
L665441.8%
H556141.6%
H433360.9%
H632340.9%
L527500.8%
LL518990.5%
LL69630.3%
L48310.2%
H4/53800.1%
CM22810.1%
Iron, IIIAB2720.1%
H32440.1%
LL2200.1%
E32050.1%
L31760.0%
LL41600.0%
H5/61560.0%
Ureilite1550.0%
Howardite1270.0%
Diogenite1250.0%
Top values (rank 1–20)
  1. L6 — 6,544
  2. H5 — 5,614
  3. H4 — 3,336
  4. H6 — 3,234
  5. L5 — 2,750
  6. LL5 — 1,899
  7. LL6 — 963
  8. L4 — 831
  9. H4/5 — 380
  10. CM2 — 281
  11. Iron, IIIAB — 272
  12. H3 — 244
  13. LL — 220
  14. E3 — 205
  15. L3 — 176
  16. LL4 — 160
  17. H5/6 — 156
  18. Ureilite — 155
  19. Howardite — 127
  20. Diogenite — 125

fall_type categorical

90.9% null top value is 96.6% of rows
rows354,770
null322,584 (90.9%)
unique2
top_valueFound
top_rate0.966
cardinality2
entropy0.214
entropy_ratio0.214
Show data table
Top values for fall_type (2 unique shown, of 2 total).
valuecountshare
Found310908.8%
Fell10960.3%
Top values (rank 1–20)
  1. Found — 31,090
  2. Fell — 1,096

magnitude categorical

76.7% null
rows354,770
null272,093 (76.7%)
unique294
top_value0
top_rate0.444
cardinality294
entropy2.514
entropy_ratio0.307
Show data table
Top values for magnitude (20 unique shown, of 294 total).
valuecountshare
03667510.3%
1245426.9%
299042.8%
326300.7%
-912780.4%
4.56860.2%
45910.2%
4.65580.2%
4.74150.1%
1.753830.1%
4.83170.1%
52970.1%
4.92610.1%
2.752200.1%
5.12020.1%
5.21670.0%
70.001620.0%
50.001510.0%
2.001500.0%
5.31260.0%
Top values (rank 1–20)
  1. 0 — 36,675
  2. 1 — 24,542
  3. 2 — 9,904
  4. 3 — 2,630
  5. -9 — 1,278
  6. 4.5 — 686
  7. 4 — 591
  8. 4.6 — 558
  9. 4.7 — 415
  10. 1.75 — 383
  11. 4.8 — 317
  12. 5 — 297
  13. 4.9 — 261
  14. 2.75 — 220
  15. 5.1 — 202
  16. 5.2 — 167
  17. 70.00 — 162
  18. 50.00 — 151
  19. 2.00 — 150
  20. 5.3 — 126

depth_km numeric

98.9% null skew=+3.07 8.4% rows beyond 1.5 IQR
rows354,770
null351,028 (98.9%)
unique1,505
min-2.261
max248.700
mean23.712
median10.000
std28.790
q110.000
q329.102
iqr19.102
skew3.072
kurtosis11.611
n_outliers314
outlier_rate0.084
zero_rate2.67e-03
Show data table
Histogram bins for depth_km (median: 10.0).
bincount
-2.261 – 4.013219
4.013 – 10.291730
10.29 – 16.56370
16.56 – 22.84258
22.84 – 29.11230
29.11 – 35.38250
35.38 – 41.66167
41.66 – 47.93129
47.93 – 54.2156
54.21 – 60.4831
60.48 – 66.7543
66.75 – 73.0327
73.03 – 79.319
79.3 – 85.5829
85.58 – 91.8521
91.85 – 98.1219
98.12 – 104.424
104.4 – 110.712
110.7 – 116.99
116.9 – 123.214
123.2 – 129.513
129.5 – 135.819
135.8 – 14214
142 – 148.35
148.3 – 154.66
154.6 – 160.90
160.9 – 167.17
167.1 – 173.44
173.4 – 179.70
179.7 – 1861
186 – 192.25
192.2 – 198.51
198.5 – 204.83
204.8 – 211.12
211.1 – 217.33
217.3 – 223.61
223.6 – 229.90
229.9 – 236.20
236.2 – 242.40
242.4 – 248.71

place text

98.9% null
rows354,770
null351,028 (98.9%)
unique3,002
len_min4
len_max59
len_mean29.466
len_median29.000
len_p9536.000
word_mean6.293
word_median6.000
n_empty0
n_duplicates740
duplicate_rate0.198
vocab_size1,036
readability_flesch_mean69.914
emoji_rate0.000
url_rate0.000
one_word_rate5.34e-04
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for place (mean: 29.465793693212184).
charscount
4 – 51
5 – 70
7 – 81
8 – 100
10 – 110
11 – 120
12 – 142
14 – 1511
15 – 1640
16 – 181
18 – 1920
19 – 2019
20 – 228
22 – 23219
23 – 2560
25 – 26122
26 – 27543
27 – 29499
29 – 30823
30 – 32325
32 – 33362
33 – 34378
34 – 36105
36 – 3740
37 – 3837
38 – 4025
40 – 4134
41 – 423
42 – 440
44 – 4515
45 – 4722
47 – 4814
48 – 493
49 – 511
51 – 522
52 – 545
54 – 550
55 – 560
56 – 580
58 – 592
Sample values (first 10)
  1. 88 km N of Yakutat, Alaska
  2. 100 km SW of Topolobampo, Mexico
  3. Gulf of Alaska
  4. 8 km NNW of Tahoe Vista, California
  5. 88 km SSW of Nikolski, Alaska
  6. 33 km NE of Chignik, Alaska
  7. 204 km W of Port McNeill, Canada
  8. 69 km S of Nikolski, Alaska
  9. 181 km W of Ferndale, California
  10. 47 km NW of Ninilchik, Alaska

earthquake_type categorical

98.9% null top value is 99.9% of rows
rows354,770
null351,028 (98.9%)
unique3
top_valueearthquake
top_rate0.999
cardinality3
entropy0.010
entropy_ratio6.40e-03
Show data table
Top values for earthquake_type (3 unique shown, of 3 total).
valuecountshare
earthquake37391.1%
explosion20.0%
landslide10.0%
Top values (rank 1–20)
  1. earthquake — 3,739
  2. explosion — 2
  3. landslide — 1

volcano_type categorical

100.0% null top value is 100.0% of rows
rows354,770
null354,600 (100.0%)
unique1
top_valueUnknown
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for volcano_type (1 unique shown, of 1 total).
valuecountshare
Unknown1700.0%
Top values (rank 1–20)
  1. Unknown — 170

elevation_m unknown

no profiler for kind=unknown
rows354,770
null0 (0.0%)

status categorical

100.0% null top value is 100.0% of rows
rows354,770
null354,600 (100.0%)
unique1
top_valueUnknown
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for status (1 unique shown, of 1 total).
valuecountshare
Unknown1700.0%
Top values (rank 1–20)
  1. Unknown — 170

last_eruption categorical

100.0% null top value is 100.0% of rows
rows354,770
null354,600 (100.0%)
unique1
top_valueUnknown
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for last_eruption (1 unique shown, of 1 total).
valuecountshare
Unknown1700.0%
Top values (rank 1–20)
  1. Unknown — 170

injuries categorical

75.6% null
rows354,770
null268,187 (75.6%)
unique233
top_value0
top_rate0.854
cardinality233
entropy1.234
entropy_ratio0.157
Show data table
Top values for injuries (20 unique shown, of 233 total).
valuecountshare
07394320.8%
134021.0%
219570.6%
311180.3%
47270.2%
56250.2%
65000.1%
103620.1%
73320.1%
82920.1%
122800.1%
92020.1%
201850.1%
151810.1%
111700.0%
131360.0%
141240.0%
301160.0%
251000.0%
16910.0%
Top values (rank 1–20)
  1. 0 — 73,943
  2. 1 — 3,402
  3. 2 — 1,957
  4. 3 — 1,118
  5. 4 — 727
  6. 5 — 625
  7. 6 — 500
  8. 10 — 362
  9. 7 — 332
  10. 8 — 292
  11. 12 — 280
  12. 9 — 202
  13. 20 — 185
  14. 15 — 181
  15. 11 — 170
  16. 13 — 136
  17. 14 — 124
  18. 30 — 116
  19. 25 — 100
  20. 16 — 91

fatalities categorical

75.6% null
rows354,770
null268,187 (75.6%)
unique57
top_value0
top_rate0.929
cardinality57
entropy0.513
entropy_ratio0.088
Show data table
Top values for fatalities (20 unique shown, of 57 total).
valuecountshare
08039722.7%
140531.1%
29320.3%
33570.1%
41900.1%
51210.0%
61120.0%
7710.0%
9400.0%
10390.0%
8340.0%
11340.0%
16220.0%
13190.0%
12150.0%
17130.0%
14110.0%
2190.0%
1880.0%
2080.0%
Top values (rank 1–20)
  1. 0 — 80,397
  2. 1 — 4,053
  3. 2 — 932
  4. 3 — 357
  5. 4 — 190
  6. 5 — 121
  7. 6 — 112
  8. 7 — 71
  9. 9 — 40
  10. 10 — 39
  11. 8 — 34
  12. 11 — 34
  13. 16 — 22
  14. 13 — 19
  15. 12 — 15
  16. 17 — 13
  17. 14 — 11
  18. 21 — 9
  19. 18 — 8
  20. 20 — 8

length_miles text

100.0% rows are a single word 100.0% rows are all-caps 79.8% null 95th-percentile length under 20 chars 94.7% duplicate strings
rows354,770
null282,957 (79.8%)
unique3,795
len_min3
len_max8
len_mean3.688
len_median3.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates68,018
duplicate_rate0.947
vocab_size2,268
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Show data table
Character-length distribution for length_miles (mean: 3.6875078328436355).
charscount
3 – 347630
3 – 30
3 – 30
3 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 411687
4 – 40
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 50
5 – 5795
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 610712
6 – 60
6 – 60
6 – 60
6 – 70
7 – 70
7 – 70
7 – 70
7 – 7986
7 – 70
7 – 70
7 – 80
8 – 80
8 – 80
8 – 80
8 – 83
Sample values (first 10)
  1. 0.7
  2. 1.7
  3. 0.0200
  4. 12.3
  5. 0.2
  6. 7.6000
  7. 0.1
  8. 0.6200
  9. 1.4
  10. 0.1

width_yards categorical

79.8% null
rows354,770
null282,957 (79.8%)
unique437
top_value10
top_rate0.202
cardinality437
entropy4.493
entropy_ratio0.512
Show data table
Top values for width_yards (20 unique shown, of 437 total).
valuecountshare
10144924.1%
50106033.0%
10072432.0%
3048821.4%
2044311.2%
20030460.9%
2525300.7%
15022340.6%
7520260.6%
4020060.6%
30015180.4%
3311610.3%
1710370.3%
4009770.3%
2508280.2%
238120.2%
607650.2%
4406820.2%
5006650.2%
806050.2%
Top values (rank 1–20)
  1. 10 — 14,492
  2. 50 — 10,603
  3. 100 — 7,243
  4. 30 — 4,882
  5. 20 — 4,431
  6. 200 — 3,046
  7. 25 — 2,530
  8. 150 — 2,234
  9. 75 — 2,026
  10. 40 — 2,006
  11. 300 — 1,518
  12. 33 — 1,161
  13. 17 — 1,037
  14. 400 — 977
  15. 250 — 828
  16. 23 — 812
  17. 60 — 765
  18. 440 — 682
  19. 500 — 665
  20. 80 — 605

type categorical

98.6% null top value is 100.0% of rows
rows354,770
null349,767 (98.6%)
unique1
top_valuehot_spring
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for type (1 unique shown, of 1 total).
valuecountshare
hot_spring50031.4%
Top values (rank 1–20)
  1. hot_spring — 5,003

temperature categorical

34 singleton categories 98.6% null top value is 97.4% of rows
rows354,770
null349,767 (98.6%)
unique44
top_value
top_rate0.974
cardinality44
entropy0.257
entropy_ratio0.047
Show data table
Top values for temperature (20 unique shown, of 44 total).
valuecountshare
48741.4%
hot730.0%
9040.0%
10040.0%
2130.0%
9530.0%
3720.0%
4320.0%
2820.0%
4020.0%
37-39°10.0%
35-37 °C10.0%
58°C10.0%
52,110.0%
25-3010.0%
98°C10.0%
40-43°10.0%
7710.0%
25°C10.0%
5210.0%
Top values (rank 1–20)
  1. — 4,874
  2. hot — 73
  3. 90 — 4
  4. 100 — 4
  5. 21 — 3
  6. 95 — 3
  7. 37 — 2
  8. 43 — 2
  9. 28 — 2
  10. 40 — 2
  11. 37-39° — 1
  12. 35-37 °C — 1
  13. 58°C — 1
  14. 52,1 — 1
  15. 25-30 — 1
  16. 98°C — 1
  17. 40-43° — 1
  18. 77 — 1
  19. 25°C — 1
  20. 52 — 1

source categorical

51.6% null
rows354,770
null182,920 (51.6%)
unique4
top_valueOpenStreetMap contributors
top_rate0.514
cardinality4
entropy1.545
entropy_ratio0.772
Show data table
Top values for source (4 unique shown, of 4 total).
valuecountshare
OpenStreetMap contributors8839624.9%
The Megalithic Portal6002816.9%
NOAA Storm Events Database147704.2%
OpenStreetMap86562.4%
Top values (rank 1–20)
  1. OpenStreetMap contributors — 88,396
  2. The Megalithic Portal — 60,028
  3. NOAA Storm Events Database — 14,770
  4. OpenStreetMap — 8,656

vessel_type categorical

14 singleton categories 99.0% null
rows354,770
null351,117 (99.0%)
unique23
top_value
top_rate0.906
cardinality23
entropy0.576
entropy_ratio0.127
Show data table
Top values for vessel_type (20 unique shown, of 23 total).
valuecountshare
33110.9%
ship2750.1%
submarine180.0%
aircraft160.0%
plane100.0%
boat30.0%
schooner20.0%
car20.0%
sailboat20.0%
steamer10.0%
airplane10.0%
freightcar10.0%
train10.0%
paddle steamer10.0%
vehicle10.0%
motorbike10.0%
helicopter10.0%
Steam hoist10.0%
tractor10.0%
Airplane10.0%
Top values (rank 1–20)
  1. — 3,311
  2. ship — 275
  3. submarine — 18
  4. aircraft — 16
  5. plane — 10
  6. boat — 3
  7. schooner — 2
  8. car — 2
  9. sailboat — 2
  10. steamer — 1
  11. airplane — 1
  12. freightcar — 1
  13. train — 1
  14. paddle steamer — 1
  15. vehicle — 1
  16. motorbike — 1
  17. helicopter — 1
  18. Steam hoist — 1
  19. tractor — 1
  20. Airplane — 1

cargo categorical

13 singleton categories 99.0% null top value is 99.4% of rows
rows354,770
null351,117 (99.0%)
unique17
top_value
top_rate0.994
cardinality17
entropy0.073
entropy_ratio0.018
Show data table
Top values for cargo (17 unique shown, of 17 total).
valuecountshare
36321.0%
human40.0%
timber20.0%
coal20.0%
fertilizer10.0%
ore pellets10.0%
Fischkutter (Stahl)10.0%
seafood10.0%
fish10.0%
passengers10.0%
mexican army supposed drugs, but the crew and cargo was not found10.0%
iron ore10.0%
pulp10.0%
18 mines, 6 torpedos10.0%
sugar10.0%
containers;vehicles10.0%
container;oil10.0%
Top values (rank 1–20)
  1. — 3,632
  2. human — 4
  3. timber — 2
  4. coal — 2
  5. fertilizer — 1
  6. ore pellets — 1
  7. Fischkutter (Stahl) — 1
  8. seafood — 1
  9. fish — 1
  10. passengers — 1
  11. mexican army supposed drugs, but the crew and cargo was not found — 1
  12. iron ore — 1
  13. pulp — 1
  14. 18 mines, 6 torpedos — 1
  15. sugar — 1
  16. containers;vehicles — 1
  17. container;oil — 1

peak_brightness_altitude_km categorical

99.8% null
rows354,770
null354,193 (99.8%)
unique224
top_value37.0
top_rate0.061
cardinality224
entropy7.187
entropy_ratio0.921
Show data table
Top values for peak_brightness_altitude_km (20 unique shown, of 224 total).
valuecountshare
37.0350.0%
31.5150.0%
33.3150.0%
38.0110.0%
29.6110.0%
35.2110.0%
40.7110.0%
32.0100.0%
26.0100.0%
32.490.0%
42.080.0%
26.580.0%
33.080.0%
25.070.0%
50.070.0%
36.070.0%
35.060.0%
39.060.0%
28.760.0%
3760.0%
Top values (rank 1–20)
  1. 37.0 — 35
  2. 31.5 — 15
  3. 33.3 — 15
  4. 38.0 — 11
  5. 29.6 — 11
  6. 35.2 — 11
  7. 40.7 — 11
  8. 32.0 — 10
  9. 26.0 — 10
  10. 32.4 — 9
  11. 42.0 — 8
  12. 26.5 — 8
  13. 33.0 — 8
  14. 25.0 — 7
  15. 50.0 — 7
  16. 36.0 — 7
  17. 35.0 — 6
  18. 39.0 — 6
  19. 28.7 — 6
  20. 37 — 6

velocity_km_s categorical

99.9% null
rows354,770
null354,421 (99.9%)
unique158
top_value13.6
top_rate0.017
cardinality158
entropy7.052
entropy_ratio0.966
Show data table
Top values for velocity_km_s (20 unique shown, of 158 total).
valuecountshare
13.660.0%
15.260.0%
16.960.0%
17.850.0%
20.150.0%
17.450.0%
13.150.0%
16.250.0%
19.850.0%
16.550.0%
15.950.0%
14.150.0%
18.150.0%
14.950.0%
12.950.0%
12.250.0%
19.640.0%
17.040.0%
14.440.0%
18.340.0%
Top values (rank 1–20)
  1. 13.6 — 6
  2. 15.2 — 6
  3. 16.9 — 6
  4. 17.8 — 5
  5. 20.1 — 5
  6. 17.4 — 5
  7. 13.1 — 5
  8. 16.2 — 5
  9. 19.8 — 5
  10. 16.5 — 5
  11. 15.9 — 5
  12. 14.1 — 5
  13. 18.1 — 5
  14. 14.9 — 5
  15. 12.9 — 5
  16. 12.2 — 5
  17. 19.6 — 4
  18. 17.0 — 4
  19. 14.4 — 4
  20. 18.3 — 4

energy_joules categorical

361 singleton categories 99.8% null
rows354,770
null353,907 (99.8%)
unique518
top_value2.1
top_rate0.017
cardinality518
entropy8.634
entropy_ratio0.958
Show data table
Top values for energy_joules (20 unique shown, of 518 total).
valuecountshare
2.1150.0%
2.0130.0%
3.2130.0%
3.0100.0%
2.880.0%
2.380.0%
3.580.0%
2.780.0%
2.280.0%
4.180.0%
3.370.0%
2.570.0%
4.060.0%
2.960.0%
3.160.0%
5.860.0%
10.460.0%
11.860.0%
3.650.0%
4.450.0%
Top values (rank 1–20)
  1. 2.1 — 15
  2. 2.0 — 13
  3. 3.2 — 13
  4. 3.0 — 10
  5. 2.8 — 8
  6. 2.3 — 8
  7. 3.5 — 8
  8. 2.7 — 8
  9. 2.2 — 8
  10. 4.1 — 8
  11. 3.3 — 7
  12. 2.5 — 7
  13. 4.0 — 6
  14. 2.9 — 6
  15. 3.1 — 6
  16. 5.8 — 6
  17. 10.4 — 6
  18. 11.8 — 6
  19. 3.6 — 5
  20. 4.4 — 5

event_type categorical

95.8% null
rows354,770
null340,000 (95.8%)
unique17
top_valueTornado
top_rate0.429
cardinality17
entropy2.336
entropy_ratio0.572
Show data table
Top values for event_type (17 unique shown, of 17 total).
valuecountshare
Tornado63341.8%
Flash Flood23580.7%
Thunderstorm Wind22570.6%
Flood17770.5%
Hail12460.4%
Lightning5740.2%
Heavy Rain990.0%
Marine Strong Wind430.0%
Debris Flow430.0%
Marine Thunderstorm Wind250.0%
Marine High Wind50.0%
Dust Devil30.0%
Waterspout20.0%
Tropical Storm10.0%
High Wind10.0%
Heat10.0%
Marine Lightning10.0%
Top values (rank 1–20)
  1. Tornado — 6,334
  2. Flash Flood — 2,358
  3. Thunderstorm Wind — 2,257
  4. Flood — 1,777
  5. Hail — 1,246
  6. Lightning — 574
  7. Heavy Rain — 99
  8. Marine Strong Wind — 43
  9. Debris Flow — 43
  10. Marine Thunderstorm Wind — 25
  11. Marine High Wind — 5
  12. Dust Devil — 3
  13. Waterspout — 2
  14. Tropical Storm — 1
  15. High Wind — 1
  16. Heat — 1
  17. Marine Lightning — 1

damage_property text

100.0% rows are a single word 87.2% rows are all-caps 95.8% null 95th-percentile length under 20 chars 93.1% duplicate strings
rows354,770
null340,000 (95.8%)
unique1,014
len_min0
len_max8
len_mean4.381
len_median5.000
len_p957.000
word_mean1.000
word_median1.000
n_empty368
n_duplicates13,756
duplicate_rate0.931
vocab_size1,013
readability_flesch_mean116.977
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.872
boilerplate_rate0.000
Show data table
Character-length distribution for damage_property (mean: 4.380568720379147).
charscount
0 – 0368
0 – 00
0 – 10
1 – 10
1 – 10
1 – 1264
1 – 10
1 – 20
2 – 20
2 – 20
2 – 21252
2 – 20
2 – 30
3 – 30
3 – 30
3 – 31172
3 – 30
3 – 40
4 – 40
4 – 40
4 – 43414
4 – 40
4 – 50
5 – 50
5 – 50
5 – 56075
5 – 50
5 – 60
6 – 60
6 – 60
6 – 61450
6 – 60
6 – 70
7 – 70
7 – 70
7 – 7514
7 – 70
7 – 80
8 – 80
8 – 8261
Sample values (first 10)
  1. 250K
  2. 40.00M
  3. 3.00M
  4. 20.00K
  5. 3.17M
  6. 15.00M
  7. 1.00M
  8. 0.00K
  9. 0.00K
  10. 25K

cave_type categorical

3 singleton categories 100.0% null
rows354,770
null354,729 (100.0%)
unique5
top_valuepit
top_rate0.878
cardinality5
entropy0.769
entropy_ratio0.331
Show data table
Top values for cave_type (5 unique shown, of 5 total).
valuecountshare
pit360.0%
ponor20.0%
showcave10.0%
sinkhole10.0%
overhang10.0%
Top values (rank 1–20)
  1. pit — 36
  2. ponor — 2
  3. showcave — 1
  4. sinkhole — 1
  5. overhang — 1

cave_length_m categorical

158 singleton categories 99.8% null
rows354,770
null354,128 (99.8%)
unique237
top_value5
top_rate0.050
cardinality237
entropy6.919
entropy_ratio0.877
Show data table
Top values for cave_length_m (20 unique shown, of 237 total).
valuecountshare
5320.0%
6260.0%
10250.0%
3230.0%
4230.0%
7200.0%
8190.0%
15160.0%
20140.0%
12130.0%
30130.0%
2110.0%
1190.0%
6080.0%
4.580.0%
1380.0%
1680.0%
1780.0%
2580.0%
970.0%
Top values (rank 1–20)
  1. 5 — 32
  2. 6 — 26
  3. 10 — 25
  4. 3 — 23
  5. 4 — 23
  6. 7 — 20
  7. 8 — 19
  8. 15 — 16
  9. 20 — 14
  10. 12 — 13
  11. 30 — 13
  12. 2 — 11
  13. 11 — 9
  14. 60 — 8
  15. 4.5 — 8
  16. 13 — 8
  17. 16 — 8
  18. 17 — 8
  19. 25 — 8
  20. 9 — 7

cave_depth_m categorical

88 singleton categories 99.9% null
rows354,770
null354,472 (99.9%)
unique124
top_value0
top_rate0.211
cardinality124
entropy5.797
entropy_ratio0.834
Show data table
Top values for cave_depth_m (20 unique shown, of 124 total).
valuecountshare
0630.0%
10130.0%
3110.0%
190.0%
590.0%
480.0%
2570.0%
3060.0%
660.0%
260.0%
1150.0%
3550.0%
2840.0%
1440.0%
4040.0%
7040.0%
1230.0%
830.0%
1530.0%
2330.0%
Top values (rank 1–20)
  1. 0 — 63
  2. 10 — 13
  3. 3 — 11
  4. 1 — 9
  5. 5 — 9
  6. 4 — 8
  7. 25 — 7
  8. 30 — 6
  9. 6 — 6
  10. 2 — 6
  11. 11 — 5
  12. 35 — 5
  13. 28 — 4
  14. 14 — 4
  15. 40 — 4
  16. 70 — 4
  17. 12 — 3
  18. 8 — 3
  19. 15 — 3
  20. 23 — 3

access categorical

98.0% null
rows354,770
null347,515 (98.0%)
unique20
top_valueyes
top_rate0.379
cardinality20
entropy2.234
entropy_ratio0.517
Show data table
Top values for access (20 unique shown, of 20 total).
valuecountshare
yes27530.8%
no22790.6%
private8300.2%
permit5860.2%
permissive4480.1%
customers2730.1%
unknown510.0%
destination110.0%
restricted90.0%
tidal20.0%
request20.0%
key20.0%
discouraged20.0%
designated10.0%
official10.0%
forestry10.0%
agricultural10.0%
guided10.0%
university10.0%
cancello_all'ingresso10.0%
Top values (rank 1–20)
  1. yes — 2,753
  2. no — 2,279
  3. private — 830
  4. permit — 586
  5. permissive — 448
  6. customers — 273
  7. unknown — 51
  8. destination — 11
  9. restricted — 9
  10. tidal — 2
  11. request — 2
  12. key — 2
  13. discouraged — 2
  14. designated — 1
  15. official — 1
  16. forestry — 1
  17. agricultural — 1
  18. guided — 1
  19. university — 1
  20. cancello_all'ingresso — 1

cave_ref text

93.6% rows are a single word 85.6% rows are all-caps 97.9% null 95th-percentile length under 20 chars
rows354,770
null347,184 (97.9%)
unique7,162
len_min1
len_max38
len_mean6.341
len_median7.000
len_p958.000
word_mean1.068
word_median1.000
n_empty0
n_duplicates424
duplicate_rate0.056
vocab_size7,005
readability_flesch_mean117.765
emoji_rate0.000
url_rate0.000
one_word_rate0.936
allcaps_rate0.856
boilerplate_rate0.000
Show data table
Character-length distribution for cave_ref (mean: 6.340891115212233).
charscount
1 – 236
2 – 3210
3 – 4356
4 – 5954
5 – 6412
6 – 71123
7 – 72763
7 – 81372
8 – 9232
9 – 1029
10 – 1144
11 – 1228
12 – 1311
13 – 140
14 – 152
15 – 162
16 – 173
17 – 183
18 – 191
19 – 201
20 – 201
20 – 210
21 – 220
22 – 230
23 – 240
24 – 250
25 – 260
26 – 270
27 – 282
28 – 290
29 – 300
30 – 310
31 – 320
32 – 320
32 – 330
33 – 340
34 – 350
35 – 360
36 – 370
37 – 381
Sample values (first 10)
  1. C 23
  2. 3125
  3. JP-88
  4. 12
  5. 5381-116
  6. 5372-60
  7. 1862/15a
  8. 102
  9. 1008
  10. 1868/28

osm_id numeric

75.1% null
rows354,770
null266,374 (75.1%)
unique88,395
min1,334,095
max13,469,693,701
mean6,182,550,153
median6,047,018,322
std3,992,760,871
q12,627,612,475
q39,530,325,718
iqr6,902,713,242
skew0.132
kurtosis-1.228
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for osm_id (median: 6047018322.5).
bincount
1.334e+06 – 3.38e+083968
3.38e+08 – 6.748e+082767
6.748e+08 – 1.011e+093230
1.011e+09 – 1.348e+093501
1.348e+09 – 1.685e+093041
1.685e+09 – 2.022e+092397
2.022e+09 – 2.358e+091663
2.358e+09 – 2.695e+091794
2.695e+09 – 3.032e+092343
3.032e+09 – 3.368e+092457
3.368e+09 – 3.705e+092751
3.705e+09 – 4.042e+092046
4.042e+09 – 4.379e+091826
4.379e+09 – 4.715e+092556
4.715e+09 – 5.052e+092957
5.052e+09 – 5.389e+092055
5.389e+09 – 5.725e+091481
5.725e+09 – 6.062e+091420
6.062e+09 – 6.399e+092234
6.399e+09 – 6.736e+091597
6.736e+09 – 7.072e+092035
7.072e+09 – 7.409e+092535
7.409e+09 – 7.746e+092516
7.746e+09 – 8.082e+091909
8.082e+09 – 8.419e+092018
8.419e+09 – 8.756e+091425
8.756e+09 – 9.092e+092194
9.092e+09 – 9.429e+091852
9.429e+09 – 9.766e+093471
9.766e+09 – 1.01e+102391
1.01e+10 – 1.044e+101034
1.044e+10 – 1.078e+101192
1.078e+10 – 1.111e+101627
1.111e+10 – 1.145e+103575
1.145e+10 – 1.179e+101447
1.179e+10 – 1.212e+101836
1.212e+10 – 1.246e+101355
1.246e+10 – 1.28e+102246
1.28e+10 – 1.313e+101582
1.313e+10 – 1.347e+102072

osm_type categorical

75.1% null top value is 96.4% of rows
rows354,770
null266,374 (75.1%)
unique3
top_valuenode
top_rate0.964
cardinality3
entropy0.250
entropy_ratio0.158
Show data table
Top values for osm_type (3 unique shown, of 3 total).
valuecountshare
node8520424.0%
way25600.7%
relation6320.2%
Top values (rank 1–20)
  1. node — 85,204
  2. way — 2,560
  3. relation — 632

place_type categorical

31 singleton categories 94.9% null
rows354,770
null336,616 (94.9%)
unique48
top_valuehamlet
top_rate0.666
cardinality48
entropy1.498
entropy_ratio0.268
Show data table
Top values for place_type (20 unique shown, of 48 total).
valuecountshare
hamlet120863.4%
isolated_dwelling29770.8%
village23880.7%
locality2510.1%
yes1310.0%
farm1220.0%
neighbourhood730.0%
town380.0%
suburb230.0%
quarter70.0%
square70.0%
island40.0%
local40.0%
allotments40.0%
house30.0%
city30.0%
islet20.0%
county10.0%
bus_station10.0%
hamtel10.0%
Top values (rank 1–20)
  1. hamlet — 12,086
  2. isolated_dwelling — 2,977
  3. village — 2,388
  4. locality — 251
  5. yes — 131
  6. farm — 122
  7. neighbourhood — 73
  8. town — 38
  9. suburb — 23
  10. quarter — 7
  11. square — 7
  12. island — 4
  13. local — 4
  14. allotments — 4
  15. house — 3
  16. city — 3
  17. islet — 2
  18. county — 1
  19. bus_station — 1
  20. hamtel — 1

abandoned_year categorical

97 singleton categories 99.7% null
rows354,770
null353,545 (99.7%)
unique147
top_valueyes
top_rate0.436
cardinality147
entropy2.939
entropy_ratio0.408
Show data table
Top values for abandoned_year (20 unique shown, of 147 total).
valuecountshare
yes5340.2%
village4330.1%
2022200.0%
1986110.0%
hamlet90.0%
197860.0%
197460.0%
198750.0%
202340.0%
195040.0%
198540.0%
198330.0%
~150030.0%
2013-12-0230.0%
isolated_dwelling30.0%
2022-12-2630.0%
194630.0%
193830.0%
195530.0%
196230.0%
Top values (rank 1–20)
  1. yes — 534
  2. village — 433
  3. 2022 — 20
  4. 1986 — 11
  5. hamlet — 9
  6. 1978 — 6
  7. 1974 — 6
  8. 1987 — 5
  9. 2023 — 4
  10. 1950 — 4
  11. 1985 — 4
  12. 1983 — 3
  13. ~1500 — 3
  14. 2013-12-02 — 3
  15. isolated_dwelling — 3
  16. 2022-12-26 — 3
  17. 1946 — 3
  18. 1938 — 3
  19. 1955 — 3
  20. 1962 — 3

abandoned_reason unknown

no profiler for kind=unknown
rows354,770
null0 (0.0%)

former_population categorical

99.3% null
rows354,770
null352,243 (99.3%)
unique75
top_value0
top_rate0.495
cardinality75
entropy2.605
entropy_ratio0.418
Show data table
Top values for former_population (20 unique shown, of 75 total).
valuecountshare
012510.4%
20105740.2%
20242440.1%
2010-10-14810.0%
2008690.0%
2021-08-31530.0%
1999370.0%
2009120.0%
1110.0%
2021110.0%
2021-09-01100.0%
2021-10-01100.0%
198990.0%
2018-01-0190.0%
200580.0%
201670.0%
1070.0%
360.0%
200160.0%
260.0%
Top values (rank 1–20)
  1. 0 — 1,251
  2. 2010 — 574
  3. 2024 — 244
  4. 2010-10-14 — 81
  5. 2008 — 69
  6. 2021-08-31 — 53
  7. 1999 — 37
  8. 2009 — 12
  9. 1 — 11
  10. 2021 — 11
  11. 2021-09-01 — 10
  12. 2021-10-01 — 10
  13. 1989 — 9
  14. 2018-01-01 — 9
  15. 2005 — 8
  16. 2016 — 7
  17. 10 — 7
  18. 3 — 6
  19. 2001 — 6
  20. 2 — 6

heritage categorical

5 singleton categories 100.0% null
rows354,770
null354,763 (100.0%)
unique6
top_value2
top_rate0.286
cardinality6
entropy2.522
entropy_ratio0.976
Show data table
Top values for heritage (6 unique shown, of 6 total).
valuecountshare
220.0%
810.0%
yes10.0%
410.0%
district10.0%
310.0%
Top values (rank 1–20)
  1. 2 — 2
  2. 8 — 1
  3. yes — 1
  4. 4 — 1
  5. district — 1
  6. 3 — 1