This dataset is an OpenStreetMap-derived catalogue of 6,914 shipwrecks and related maritime hazards mapped globally. The most important thing to explore first is the `type` and `seamark_type` columns, which reveal that the overwhelming majority (~73-78%) of entries are labelled simply 'shipwreck' or 'wreck', with a long tail of submarines, aircraft, barges, and other vessels worth examining. A secondary point of interest is the high null rates across many descriptive fields — `heritage` (99.8% null), `year_sunk` (99.5% null), and `wikipedia` (95.5% null) — meaning rich contextual data exists for only a tiny fraction of wrecks, and the dataset is far more useful as a spatial inventory than a historical record. The `access` column, where populated, shows most accessible wrecks are open ('yes'), but a meaningful share require permits or are private, which could interest dive-site analysts.
saturn
/home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json 6,914 rows sample n=6,914 seed 42 2026-06-21T22:43:45+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json |
| Total rows | 6,914 |
| Profiled sample | 6,914 |
| Columns | 14 |
| Generated | 2026-06-21T22:43:45+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.
This column contains the names of individual shipwrecks, as confirmed by dominant top words: 'shipwreck' (5032 occurrences across 6914 rows), 'wreck', 'ss', 'uss', and 'hms'. With 6841 unique values out of 6914 rows and a near-zero null rate, it is essentially a name/label field — but the 73 duplicates (1.06% duplicate rate) are mildly surprising and may indicate the same wreck is referenced under the same name in multiple records. Lengths cluster tightly (median 20, p95 21 characters) with a long tail reaching 153, suggesting most names are concise vessel names while a minority carry extended descriptions.
This column represents geographic latitude values, spanning from -77.42° (near Antarctica) to 82.17° (high Arctic), with 6,902 unique values across 6,914 rows. The mean (33.15°) sits notably below the median (43.85°), driven by a left skew of -1.42 — indicating a cluster of records in mid-to-high northern latitudes with a pull from southern hemisphere or equatorial observations. Roughly 12.5% of values (864 rows) are flagged as outliers, likely corresponding to polar or deep southern hemisphere coordinates that deviate from the dominant northern mid-latitude band.
This column contains geographic longitude values, spanning the full valid range from -179.28° to 179.45° and covering both hemispheres. The mean (3.07°) and median (8.32°) are both modestly east of the Prime Meridian, suggesting a concentration of records in Europe/Africa, while the wide IQR of 58.74° and std of 69.12° confirm global scatter. Notably, 806 rows (11.66%) are flagged as outliers, likely corresponding to locations in the Americas or Pacific — not erroneous values, but genuine geographic extremes relative to the modal cluster.
This column records the year (or date) a vessel was sunk, but it is almost entirely empty — 99.46% of the 6,914 rows are null, leaving only about 38 non-null values. Among those, the formats are wildly inconsistent: bare years ('1942', '1854'), full dates in multiple formats ('30 June 1890', 'June 7, 1928', '1937-09-02'), partial dates ('1963-02'), and even a range ('1643..1663'), making normalisation non-trivial. With 36 unique values across ~38 populated rows the column is near-unique relative to its populated set, and the top value '1942' appears only twice.
This column classifies underwater or maritime wreck sites by vessel/object type, with 31 distinct categories across 6,914 records and no nulls. The distribution is heavily dominated by 'shipwreck' (73.5% of records) and 'wreck' (19.5%), together accounting for over 93% of all entries — the remaining 29 categories share just ~6.5%, confirming the long-tail alert. The near-redundancy between 'shipwreck', 'wreck', and 'ship' (plus 'boat', 'barge') suggests inconsistent taxonomy that may need consolidation before modelling.
This column stores Wikipedia article links associated with dataset entities (ships and aircraft), formatted as language-prefixed slugs (e.g., 'en:SS Edmund Fitzgerald', 'fr:Armorique (navire)'). The null rate is extremely high at 95.47%, meaning only ~313 of 6,914 rows have any Wikipedia reference. Among populated values, cardinality is very high (307 unique values across ~313 non-null rows), with the top value appearing only 4 times — indicating near-unique coverage and a long-tail distribution. A language mix is present (English 'en:', French 'fr:', Arabic 'ar:'), which could complicate any downstream lookup or joining logic.
This column stores Wikidata entity identifiers (Q-codes), linking dataset records to Wikidata knowledge graph entries. The most striking signal is the extreme null rate of 94.78%, meaning only ~360 of 6,914 rows carry a Wikidata link at all. Among the 353 unique Q-codes present, the distribution is nearly flat — the top value 'Q1286267' appears only 4 times, entropy ratio is 0.998, and the long-tail alert confirms almost no repeated values — suggesting each populated row points to a distinct entity with minimal reuse.
This column contains free-text descriptions of maritime wrecks or nautical features, with entries referencing WWII-era vessels, jetties, fishing boats, and abandoned craft. The most striking signal is the 94.87% null rate — nearly the entire dataset lacks a description — making this column nearly unusable at scale. Among the 304 unique values across 6,914 rows, entropy is very high (8.05, ratio 0.976), indicating wide diversity in phrasing, and a language mix is evident (e.g., French 'Chaloupe abandonnée à terre' alongside English entries).
This column appears to encode a 'heritage' flag or classification with only 4 distinct values ('1', '2', 'no', 'yes'), suggesting a binary or ordinal attribute that may have been inconsistently encoded across sources. The critical finding is a null rate of 99.81%, meaning only 13 of 6,914 rows have any value at all — rendering this column nearly useless for modelling. Among those 13 non-null values, '2' dominates at 76.9%, while 'no', 'yes', and '1' each appear only once, indicating a mixed encoding scheme (numeric vs. boolean strings) on an already negligible sample.
This column appears to encode access permission or restriction tags for geographic features (likely OpenStreetMap-style data), with values such as 'yes', 'no', 'permit', 'private', 'permissive', and 'customers'. The striking finding is a 92.64% null rate — only 509 of 6,914 rows carry a value — meaning this attribute is almost entirely absent from the dataset. Among the non-null values, 'yes' dominates heavily at 66.99% of populated rows, suggesting most tagged features have open access.
This column represents a numeric depth measurement (likely in meters or similar units) stored as a categorical string, with values ranging from small decimals like '1.1' to integers like '19.2'. The most striking signal is a null rate of 77.36%, meaning only ~1,556 of 6,914 rows carry a value — a severe missingness that warrants investigation into whether it is structurally absent (e.g., not applicable to certain record types) or a data quality issue. Among populated rows, cardinality is very high (502 unique values) with an entropy ratio of 0.956, indicating nearly uniform spread and essentially no dominant depth value — the top value '12.4' appears only 11 times. The column should be cast to numeric before any modelling use.
This column contains a nautical/maritime classification type for seamarks, most likely drawn from an OpenStreetMap or similar marine charting schema. The distribution is severely dominated by 'wreck' at 78.3% of 6,914 rows, with the next largest category 'dangerous' at only 8.6%, giving a low entropy ratio of 0.307. The mix of subtypes (hull_showing, mast_showing, distributed_remains, hulk) suggests these are sub-classifications of wrecks that could have been normalized into a hierarchy rather than a flat taxonomy. The 6.6% null rate warrants attention if completeness matters for navigation safety contexts.
This column is an OpenStreetMap (OSM) object identifier — a large integer surrogate key assigned by the OSM platform to geographic features. Every one of the 6,914 rows has a distinct value with zero nulls, confirming it functions purely as a unique identifier. The value range (13 M to ~13.7 B) and flat distribution (kurtosis −1.45, near-uniform spread across a ~8.4 B IQR) are consistent with OSM's incrementally assigned ID space across different data vintages. No outliers are flagged and the mild positive skew (0.44) suggests a slight concentration of older, lower-numbered IDs.
This column encodes the OpenStreetMap geometry type, distinguishing between point features ('node') and linear/polygonal features ('way'). With only 2 distinct values across 6,914 rows and zero nulls, it is a clean binary categorical. The distribution is moderately skewed: 'node' accounts for 72.3% (5,000 rows) versus 'way' at 27.7% (1,914 rows), which is consistent with OSM datasets where point POIs outnumber way geometries.
Numeric correlation
name text
Sample values (first 10)
- Falke
- Shipwreck 1389423810
- Shipwreck 10053641911
- Shipwreck 9160141348
- Shipwreck 9456262933
- Shipwreck 11826210406
- Shipwreck 9980843157
- Shipwreck 11826207717
- Shipwreck 9164313171
- City of Waterford
lat numeric
lon numeric
year_sunk categorical
Top values (rank 1–20)
- 1942 — 2
- 30 June 1890 — 1
- 1854 — 1
- 1971 — 1
- 1937-09-02 — 1
- 1963-02 — 1
- 1643..1663 — 1
- 1982 — 1
- June 7, 1928 — 1
- 1435 — 1
- 1920-12-16 — 1
- 1490s — 1
- ~1700 — 1
- 20 April 1943 — 1
- 25 May 1963 — 1
- 1710 — 1
- 1915 — 1
- 1909 — 1
- 1951 — 1
- 1952 — 1
type categorical
Top values (rank 1–20)
- shipwreck — 5,081
- wreck — 1,345
- ship — 381
- barge — 27
- submarine — 18
- aircraft — 17
- plane — 10
- boat — 4
- vehicle — 3
- motor_vehicle — 3
- schooner — 2
- car — 2
- sailboat — 2
- battleship — 2
- steamer — 1
- airplane — 1
- freightcar — 1
- train — 1
- paddle steamer — 1
- motorbike — 1
wikipedia categorical
Top values (rank 1–20)
- en:SS Edmund Fitzgerald — 4
- fr:Armorique (navire) — 2
- en:Curtiss C-46 Commando — 2
- en:USS Amesbury — 2
- en:SS America (1939) — 1
- ar:سفينة زيستل جورم — 1
- en:BOS 400 — 1
- en:New Carissa — 1
- en:MV Cita — 1
- en:SS Richard Montgomery — 1
- en:Kroombit Tops National Park#Crash site — 1
- en:Astron (ship) — 1
- en:SS Yongala — 1
- et:Raketa (laev, 1949) — 1
- en:USNS General Hoyt S. Vandenberg (T-AGM-10) — 1
- en:USS Oriskany (CV-34) — 1
- en:USS Massachusetts (BB-2) — 1
- en:Water Witch (schooner) — 1
- en:Burlington Bay Horse Ferry — 1
- en:Champlain II — 1
wikidata categorical
Top values (rank 1–20)
- Q1286267 — 4
- Q959696 — 2
- Q215692 — 2
- Q2862787 — 2
- Q1145708 — 2
- Q11675753 — 2
- Q463091 — 1
- Q32276 — 1
- Q115709756 — 1
- Q14213801 — 1
- Q2877353 — 1
- Q7006376 — 1
- Q6719379 — 1
- Q41771616 — 1
- Q7394285 — 1
- Q7420193 — 1
- Q1359321 — 1
- Q4811601 — 1
- Q1424289 — 1
- Q1618842 — 1
description categorical
Top values (rank 1–20)
- WWII era concrete fuel barge converted into breakwater — 14
- Wrecks — 7
- WWII concrete barge sunk as part of jetty, partially covered by jetty and fill — 5
- Location is based on divers hand drawn maps. Due to the wreak breaking up and salvage, the wreak is scattered over a large area. — 4
- Partially sunken ships — 4
- Concrete petrol barge sunk as part of breakwater — 4
- Wrecks of Zulu fishing boats — 3
- Chaloupe abandonnée à terre — 3
- WWII era concrete fuel barge sunk as part of jetty foundation — 3
- Armada Ship — 2
- remains of sunken wooden boats — 2
- Hundido el 3 de julio de 1898 durante la batalla naval de Santiago de Cuba en la Guerra Hispano-Cubana-Norteamericana. — 2
- 09/09/2006 : Epave en bois, longue de 20 mètres, large de 4 mètres et haute de 3 mètres. — 2
- Steamer — 2
- Iron-hulled barque — 2
- On shore wreck of a small abandoned wooden ship. — 2
- Épave — 2
- Dojście do wraków w zasadzie wolne. Jednak mogą wystąpić sytuacje gdy będzie to utrudnione lub niemożliwe. — 2
- Wrecked sealing vessel — 2
- Staten Island boat graveyard — 2
heritage categorical
Top values (rank 1–20)
- 2 — 10
- no — 1
- yes — 1
- 1 — 1
access categorical
Top values (rank 1–20)
- yes — 341
- no — 73
- permit — 27
- private — 27
- unknown — 20
- permissive — 17
- customers — 3
- foot — 1
depth categorical
Top values (rank 1–20)
- 12.4 — 11
- 16 — 11
- 18 — 11
- 15.5 — 11
- 19.2 — 11
- 1.1 — 10
- 17.4 — 10
- 15.6 — 10
- 7 — 10
- 14 — 10
- 5 — 10
- 15.1 — 10
- 6.4 — 9
- 9 — 9
- 8 — 9
- 19 — 9
- 15.2 — 9
- 20 — 9
- 16.4 — 9
- 18.5 — 9
seamark_type categorical
Top values (rank 1–20)
- wreck — 5,055
- dangerous — 598
- non-dangerous — 358
- distributed_remains — 306
- hulk — 56
- hull_showing — 46
- shoreline_construction — 14
- mast_showing — 8
- obstruction — 7
- harbour — 2
- restricted_area — 1
- plane — 1
- beacon_special_purpose — 1
- landmark — 1
- no — 1
osm_id numeric
osm_type categorical
Top values (rank 1–20)
- node — 5,000
- way — 1,914