saturn

/home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json 6,914 rows sample n=6,914 seed 42 2026-06-21T22:43:45+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json
Total rows6,914
Profiled sample6,914
Columns14
Generated2026-06-21T22:43:45+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is an OpenStreetMap-derived catalogue of 6,914 shipwrecks and related maritime hazards mapped globally. The most important thing to explore first is the `type` and `seamark_type` columns, which reveal that the overwhelming majority (~73-78%) of entries are labelled simply 'shipwreck' or 'wreck', with a long tail of submarines, aircraft, barges, and other vessels worth examining. A secondary point of interest is the high null rates across many descriptive fields — `heritage` (99.8% null), `year_sunk` (99.5% null), and `wikipedia` (95.5% null) — meaning rich contextual data exists for only a tiny fraction of wrecks, and the dataset is far more useful as a spatial inventory than a historical record. The `access` column, where populated, shows most accessible wrecks are open ('yes'), but a meaningful share require permits or are private, which could interest dive-site analysts.

name high anthropic:default

This column contains the names of individual shipwrecks, as confirmed by dominant top words: 'shipwreck' (5032 occurrences across 6914 rows), 'wreck', 'ss', 'uss', and 'hms'. With 6841 unique values out of 6914 rows and a near-zero null rate, it is essentially a name/label field — but the 73 duplicates (1.06% duplicate rate) are mildly surprising and may indicate the same wreck is referenced under the same name in multiple records. Lengths cluster tightly (median 20, p95 21 characters) with a long tail reaching 153, suggesting most names are concise vessel names while a minority carry extended descriptions.

lat high anthropic:default

This column represents geographic latitude values, spanning from -77.42° (near Antarctica) to 82.17° (high Arctic), with 6,902 unique values across 6,914 rows. The mean (33.15°) sits notably below the median (43.85°), driven by a left skew of -1.42 — indicating a cluster of records in mid-to-high northern latitudes with a pull from southern hemisphere or equatorial observations. Roughly 12.5% of values (864 rows) are flagged as outliers, likely corresponding to polar or deep southern hemisphere coordinates that deviate from the dominant northern mid-latitude band.

lon high anthropic:default

This column contains geographic longitude values, spanning the full valid range from -179.28° to 179.45° and covering both hemispheres. The mean (3.07°) and median (8.32°) are both modestly east of the Prime Meridian, suggesting a concentration of records in Europe/Africa, while the wide IQR of 58.74° and std of 69.12° confirm global scatter. Notably, 806 rows (11.66%) are flagged as outliers, likely corresponding to locations in the Americas or Pacific — not erroneous values, but genuine geographic extremes relative to the modal cluster.

year_sunk high anthropic:default

This column records the year (or date) a vessel was sunk, but it is almost entirely empty — 99.46% of the 6,914 rows are null, leaving only about 38 non-null values. Among those, the formats are wildly inconsistent: bare years ('1942', '1854'), full dates in multiple formats ('30 June 1890', 'June 7, 1928', '1937-09-02'), partial dates ('1963-02'), and even a range ('1643..1663'), making normalisation non-trivial. With 36 unique values across ~38 populated rows the column is near-unique relative to its populated set, and the top value '1942' appears only twice.

type high anthropic:default

This column classifies underwater or maritime wreck sites by vessel/object type, with 31 distinct categories across 6,914 records and no nulls. The distribution is heavily dominated by 'shipwreck' (73.5% of records) and 'wreck' (19.5%), together accounting for over 93% of all entries — the remaining 29 categories share just ~6.5%, confirming the long-tail alert. The near-redundancy between 'shipwreck', 'wreck', and 'ship' (plus 'boat', 'barge') suggests inconsistent taxonomy that may need consolidation before modelling.

wikipedia high anthropic:default

This column stores Wikipedia article links associated with dataset entities (ships and aircraft), formatted as language-prefixed slugs (e.g., 'en:SS Edmund Fitzgerald', 'fr:Armorique (navire)'). The null rate is extremely high at 95.47%, meaning only ~313 of 6,914 rows have any Wikipedia reference. Among populated values, cardinality is very high (307 unique values across ~313 non-null rows), with the top value appearing only 4 times — indicating near-unique coverage and a long-tail distribution. A language mix is present (English 'en:', French 'fr:', Arabic 'ar:'), which could complicate any downstream lookup or joining logic.

wikidata high anthropic:default

This column stores Wikidata entity identifiers (Q-codes), linking dataset records to Wikidata knowledge graph entries. The most striking signal is the extreme null rate of 94.78%, meaning only ~360 of 6,914 rows carry a Wikidata link at all. Among the 353 unique Q-codes present, the distribution is nearly flat — the top value 'Q1286267' appears only 4 times, entropy ratio is 0.998, and the long-tail alert confirms almost no repeated values — suggesting each populated row points to a distinct entity with minimal reuse.

description high anthropic:default

This column contains free-text descriptions of maritime wrecks or nautical features, with entries referencing WWII-era vessels, jetties, fishing boats, and abandoned craft. The most striking signal is the 94.87% null rate — nearly the entire dataset lacks a description — making this column nearly unusable at scale. Among the 304 unique values across 6,914 rows, entropy is very high (8.05, ratio 0.976), indicating wide diversity in phrasing, and a language mix is evident (e.g., French 'Chaloupe abandonnée à terre' alongside English entries).

heritage high anthropic:default

This column appears to encode a 'heritage' flag or classification with only 4 distinct values ('1', '2', 'no', 'yes'), suggesting a binary or ordinal attribute that may have been inconsistently encoded across sources. The critical finding is a null rate of 99.81%, meaning only 13 of 6,914 rows have any value at all — rendering this column nearly useless for modelling. Among those 13 non-null values, '2' dominates at 76.9%, while 'no', 'yes', and '1' each appear only once, indicating a mixed encoding scheme (numeric vs. boolean strings) on an already negligible sample.

access high anthropic:default

This column appears to encode access permission or restriction tags for geographic features (likely OpenStreetMap-style data), with values such as 'yes', 'no', 'permit', 'private', 'permissive', and 'customers'. The striking finding is a 92.64% null rate — only 509 of 6,914 rows carry a value — meaning this attribute is almost entirely absent from the dataset. Among the non-null values, 'yes' dominates heavily at 66.99% of populated rows, suggesting most tagged features have open access.

depth high anthropic:default

This column represents a numeric depth measurement (likely in meters or similar units) stored as a categorical string, with values ranging from small decimals like '1.1' to integers like '19.2'. The most striking signal is a null rate of 77.36%, meaning only ~1,556 of 6,914 rows carry a value — a severe missingness that warrants investigation into whether it is structurally absent (e.g., not applicable to certain record types) or a data quality issue. Among populated rows, cardinality is very high (502 unique values) with an entropy ratio of 0.956, indicating nearly uniform spread and essentially no dominant depth value — the top value '12.4' appears only 11 times. The column should be cast to numeric before any modelling use.

seamark_type high anthropic:default

This column contains a nautical/maritime classification type for seamarks, most likely drawn from an OpenStreetMap or similar marine charting schema. The distribution is severely dominated by 'wreck' at 78.3% of 6,914 rows, with the next largest category 'dangerous' at only 8.6%, giving a low entropy ratio of 0.307. The mix of subtypes (hull_showing, mast_showing, distributed_remains, hulk) suggests these are sub-classifications of wrecks that could have been normalized into a hierarchy rather than a flat taxonomy. The 6.6% null rate warrants attention if completeness matters for navigation safety contexts.

osm_id high anthropic:default

This column is an OpenStreetMap (OSM) object identifier — a large integer surrogate key assigned by the OSM platform to geographic features. Every one of the 6,914 rows has a distinct value with zero nulls, confirming it functions purely as a unique identifier. The value range (13 M to ~13.7 B) and flat distribution (kurtosis −1.45, near-uniform spread across a ~8.4 B IQR) are consistent with OSM's incrementally assigned ID space across different data vintages. No outliers are flagged and the mild positive skew (0.44) suggests a slight concentration of older, lower-numbered IDs.

osm_type high anthropic:default

This column encodes the OpenStreetMap geometry type, distinguishing between point features ('node') and linear/polygonal features ('way'). With only 2 distinct values across 6,914 rows and zero nulls, it is a clean binary categorical. The distribution is moderately skewed: 'node' accounts for 72.3% (5,000 rows) versus 'way' at 27.7% (1,914 rows), which is consistent with OSM datasets where point POIs outnumber way geometries.

Numeric correlation

name text

98.9% of rows are unique strings
rows6,914
null0 (0.0%)
unique6,841
len_min2
len_max153
len_mean18.354
len_median20.000
len_p9521.000
word_mean2.058
word_median2.000
n_empty0
n_duplicates73
duplicate_rate0.011
vocab_size7,602
readability_flesch_mean73.369
emoji_rate0.000
url_rate0.000
one_word_rate0.085
allcaps_rate0.014
boilerplate_rate0.000
Sample values (first 10)
  1. Falke
  2. Shipwreck 1389423810
  3. Shipwreck 10053641911
  4. Shipwreck 9160141348
  5. Shipwreck 9456262933
  6. Shipwreck 11826210406
  7. Shipwreck 9980843157
  8. Shipwreck 11826207717
  9. Shipwreck 9164313171
  10. City of Waterford

lat numeric

12.5% rows beyond 1.5 IQR
rows6,914
null0 (0.0%)
unique6,902
min-77.425
max82.171
mean33.148
median43.852
std29.885
q126.582
q353.867
iqr27.285
skew-1.417
kurtosis0.867
n_outliers864
outlier_rate0.125
zero_rate0.000

lon numeric

11.7% rows beyond 1.5 IQR
rows6,914
null0 (0.0%)
unique6,910
min-179.283
max179.447
mean3.067
median8.322
std69.125
q1-40.753
q317.990
iqr58.744
skew0.509
kurtosis0.921
n_outliers806
outlier_rate0.117
zero_rate0.000

year_sunk categorical

35 singleton categories 99.5% null
rows6,914
null6,877 (99.5%)
unique36
top_value1942
top_rate0.054
cardinality36
entropy5.155
entropy_ratio0.997
Top values (rank 1–20)
  1. 1942 — 2
  2. 30 June 1890 — 1
  3. 1854 — 1
  4. 1971 — 1
  5. 1937-09-02 — 1
  6. 1963-02 — 1
  7. 1643..1663 — 1
  8. 1982 — 1
  9. June 7, 1928 — 1
  10. 1435 — 1
  11. 1920-12-16 — 1
  12. 1490s — 1
  13. ~1700 — 1
  14. 20 April 1943 — 1
  15. 25 May 1963 — 1
  16. 1710 — 1
  17. 1915 — 1
  18. 1909 — 1
  19. 1951 — 1
  20. 1952 — 1

type categorical

17 singleton categories
rows6,914
null0 (0.0%)
unique31
top_valueshipwreck
top_rate0.735
cardinality31
entropy1.166
entropy_ratio0.235
Top values (rank 1–20)
  1. shipwreck — 5,081
  2. wreck — 1,345
  3. ship — 381
  4. barge — 27
  5. submarine — 18
  6. aircraft — 17
  7. plane — 10
  8. boat — 4
  9. vehicle — 3
  10. motor_vehicle — 3
  11. schooner — 2
  12. car — 2
  13. sailboat — 2
  14. battleship — 2
  15. steamer — 1
  16. airplane — 1
  17. freightcar — 1
  18. train — 1
  19. paddle steamer — 1
  20. motorbike — 1

wikipedia categorical

303 singleton categories 95.5% null
rows6,914
null6,601 (95.5%)
unique307
top_valueen:SS Edmund Fitzgerald
top_rate0.013
cardinality307
entropy8.245
entropy_ratio0.998
Top values (rank 1–20)
  1. en:SS Edmund Fitzgerald — 4
  2. fr:Armorique (navire) — 2
  3. en:Curtiss C-46 Commando — 2
  4. en:USS Amesbury — 2
  5. en:SS America (1939) — 1
  6. ar:سفينة زيستل جورم — 1
  7. en:BOS 400 — 1
  8. en:New Carissa — 1
  9. en:MV Cita — 1
  10. en:SS Richard Montgomery — 1
  11. en:Kroombit Tops National Park#Crash site — 1
  12. en:Astron (ship) — 1
  13. en:SS Yongala — 1
  14. et:Raketa (laev, 1949) — 1
  15. en:USNS General Hoyt S. Vandenberg (T-AGM-10) — 1
  16. en:USS Oriskany (CV-34) — 1
  17. en:USS Massachusetts (BB-2) — 1
  18. en:Water Witch (schooner) — 1
  19. en:Burlington Bay Horse Ferry — 1
  20. en:Champlain II — 1

wikidata categorical

347 singleton categories 94.8% null
rows6,914
null6,553 (94.8%)
unique353
top_valueQ1286267
top_rate0.011
cardinality353
entropy8.446
entropy_ratio0.998
Top values (rank 1–20)
  1. Q1286267 — 4
  2. Q959696 — 2
  3. Q215692 — 2
  4. Q2862787 — 2
  5. Q1145708 — 2
  6. Q11675753 — 2
  7. Q463091 — 1
  8. Q32276 — 1
  9. Q115709756 — 1
  10. Q14213801 — 1
  11. Q2877353 — 1
  12. Q7006376 — 1
  13. Q6719379 — 1
  14. Q41771616 — 1
  15. Q7394285 — 1
  16. Q7420193 — 1
  17. Q1359321 — 1
  18. Q4811601 — 1
  19. Q1424289 — 1
  20. Q1618842 — 1

description categorical

282 singleton categories 94.9% null
rows6,914
null6,559 (94.9%)
unique304
top_valueWWII era concrete fuel barge converted into breakwater
top_rate0.039
cardinality304
entropy8.052
entropy_ratio0.976
Top values (rank 1–20)
  1. WWII era concrete fuel barge converted into breakwater — 14
  2. Wrecks — 7
  3. WWII concrete barge sunk as part of jetty, partially covered by jetty and fill — 5
  4. Location is based on divers hand drawn maps. Due to the wreak breaking up and salvage, the wreak is scattered over a large area. — 4
  5. Partially sunken ships — 4
  6. Concrete petrol barge sunk as part of breakwater — 4
  7. Wrecks of Zulu fishing boats — 3
  8. Chaloupe abandonnée à terre — 3
  9. WWII era concrete fuel barge sunk as part of jetty foundation — 3
  10. Armada Ship — 2
  11. remains of sunken wooden boats — 2
  12. Hundido el 3 de julio de 1898 durante la batalla naval de Santiago de Cuba en la Guerra Hispano-Cubana-Norteamericana. — 2
  13. 09/09/2006 : Epave en bois, longue de 20 mètres, large de 4 mètres et haute de 3 mètres. — 2
  14. Steamer — 2
  15. Iron-hulled barque — 2
  16. On shore wreck of a small abandoned wooden ship. — 2
  17. Épave — 2
  18. Dojście do wraków w zasadzie wolne. Jednak mogą wystąpić sytuacje gdy będzie to utrudnione lub niemożliwe. — 2
  19. Wrecked sealing vessel — 2
  20. Staten Island boat graveyard — 2

heritage categorical

3 singleton categories 99.8% null
rows6,914
null6,901 (99.8%)
unique4
top_value2
top_rate0.769
cardinality4
entropy1.145
entropy_ratio0.573
Top values (rank 1–20)
  1. 2 — 10
  2. no — 1
  3. yes — 1
  4. 1 — 1

access categorical

92.6% null
rows6,914
null6,405 (92.6%)
unique8
top_valueyes
top_rate0.670
cardinality8
entropy1.647
entropy_ratio0.549
Top values (rank 1–20)
  1. yes — 341
  2. no — 73
  3. permit — 27
  4. private — 27
  5. unknown — 20
  6. permissive — 17
  7. customers — 3
  8. foot — 1

depth categorical

77.4% null
rows6,914
null5,349 (77.4%)
unique502
top_value12.4
top_rate7.03e-03
cardinality502
entropy8.579
entropy_ratio0.956
Top values (rank 1–20)
  1. 12.4 — 11
  2. 16 — 11
  3. 18 — 11
  4. 15.5 — 11
  5. 19.2 — 11
  6. 1.1 — 10
  7. 17.4 — 10
  8. 15.6 — 10
  9. 7 — 10
  10. 14 — 10
  11. 5 — 10
  12. 15.1 — 10
  13. 6.4 — 9
  14. 9 — 9
  15. 8 — 9
  16. 19 — 9
  17. 15.2 — 9
  18. 20 — 9
  19. 16.4 — 9
  20. 18.5 — 9

seamark_type categorical

rows6,914
null459 (6.6%)
unique15
top_valuewreck
top_rate0.783
cardinality15
entropy1.200
entropy_ratio0.307
Top values (rank 1–20)
  1. wreck — 5,055
  2. dangerous — 598
  3. non-dangerous — 358
  4. distributed_remains — 306
  5. hulk — 56
  6. hull_showing — 46
  7. shoreline_construction — 14
  8. mast_showing — 8
  9. obstruction — 7
  10. harbour — 2
  11. restricted_area — 1
  12. plane — 1
  13. beacon_special_purpose — 1
  14. landmark — 1
  15. no — 1

osm_id numeric

rows6,914
null0 (0.0%)
unique6,914
min13,059,633
max13,709,704,966
mean5,365,308,387
median3,145,392,420
std4,463,893,576
q11,347,916,861
q39,788,409,785
iqr8,440,492,924
skew0.436
kurtosis-1.453
n_outliers0
outlier_rate0.000
zero_rate0.000

osm_type categorical

rows6,914
null0 (0.0%)
unique2
top_valuenode
top_rate0.723
cardinality2
entropy0.851
entropy_ratio0.851
Top values (rank 1–20)
  1. node — 5,000
  2. way — 1,914