saturn·

quirky shipwrecks

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json

Saturn profiled 5,569 rows across 14 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json",
    "--findings", "quirky-shipwrecks.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogues 5,569 shipwrecks (and a handful of related features) sourced from OpenStreetMap, with 14 columns covering geography (lat/lon), OSM identifiers, type classifications, and optional metadata like depth, year sunk, and Wikipedia links. The collection is overwhelmingly homogeneous in category: 'wreck' accounts for 98.4% of seamark_type and 'shipwreck' for 91.2% of type, so the interesting variation lives elsewhere. Geographic spread is global — longitude ranges from -179.28 to 179.45 and latitude from -77.42 to 82.17 — making the lat/lon distribution the most informative view. Be aware that descriptive fields are largely empty: heritage is 99.8% null, year_sunk 99.3% null, depth 96.3% null, and Wikipedia/Wikidata links are missing for ~94% of records, so any analysis beyond location and basic typing will be working with a small subset.

citing: row_count · column_count · seamark_type.top_rate · type.top_rate · lon.min · lon.max · lat.min · lat.max · heritage.null_rate · year_sunk.null_rate · depth.null_rate · wikipedia.null_rate · osm_type.top_rate

Out[4]:

saturn.schema() · 14 columns

column kind n null% unique alerts
name text 5,569 0.0% 5,497 near_unique
lat numeric 5,569 0.0% 5,561
lon numeric 5,569 0.0% 5,568 outliers
year_sunk categorical 5,569 99.3% 36 long_tail null_rate
type categorical 5,569 0.0% 30 long_tail
wikipedia categorical 5,569 94.4% 307 long_tail null_rate
wikidata categorical 5,569 93.5% 353 long_tail null_rate
description categorical 5,569 93.9% 291 long_tail null_rate
heritage categorical 5,569 99.8% 4 long_tail null_rate
access categorical 5,569 90.9% 8 null_rate
depth categorical 5,569 96.3% 154 long_tail null_rate
seamark_type categorical 5,569 8.2% 10 imbalance
osm_id numeric 5,569 0.0% 5,569
osm_type categorical 5,569 0.0% 2
Fig 1.
lon · Longitude spans the full globe; look for clusters along major shipping lanes and coastlines.
Show data table
Histogram bins for lon (median: 2.033333).
bincount
-179.3 – -170.354
-170.3 – -161.317
-161.3 – -152.47
-152.4 – -143.45
-143.4 – -134.44
-134.4 – -125.515
-125.5 – -116.5266
-116.5 – -107.515
-107.5 – -98.573
-98.57 – -89.639
-89.6 – -80.63161
-80.63 – -71.66520
-71.66 – -62.7163
-62.7 – -53.73211
-53.73 – -44.76146
-44.76 – -35.79157
-35.79 – -26.8239
-26.82 – -17.8525
-17.85 – -8.88698
-8.886 – 0.08197586
0.08197 – 9.05539
9.05 – 18.02923
18.02 – 26.99302
26.99 – 35.96239
35.96 – 44.9286
44.92 – 53.89100
53.89 – 62.8661
62.86 – 71.838
71.83 – 80.831
80.8 – 89.767
89.76 – 98.737
98.73 – 107.723
107.7 – 116.734
116.7 – 125.644
125.6 – 134.648
134.6 – 143.644
143.6 – 152.5109
152.5 – 161.569
161.5 – 170.5163
170.5 – 179.4201
Fig 2.
lat · Latitude skews toward the northern hemisphere (median ~40.6°), reflecting denser mapping there.
Show data table
Histogram bins for lat (median: 40.6434862).
bincount
-77.42 – -73.441
-73.44 – -69.450
-69.45 – -65.460
-65.46 – -61.471
-61.47 – -57.481
-57.48 – -53.4920
-53.49 – -49.539
-49.5 – -45.5141
-45.51 – -41.5250
-41.52 – -37.53107
-37.53 – -33.54235
-33.54 – -29.55110
-29.55 – -25.5655
-25.56 – -21.57117
-21.57 – -17.5867
-17.58 – -13.5928
-13.59 – -9.59730
-9.597 – -5.60772
-5.607 – -1.61773
-1.617 – 2.37367
2.373 – 6.36339
6.363 – 10.35180
10.35 – 14.34104
14.34 – 18.3380
18.33 – 22.32106
22.32 – 26.3184
26.31 – 30.374
30.3 – 34.29140
34.29 – 38.28470
38.28 – 42.27734
42.27 – 46.26578
46.26 – 50.25466
50.25 – 54.24668
54.24 – 58.23332
58.23 – 62.22185
62.22 – 66.2177
66.21 – 70.298
70.2 – 74.1934
74.19 – 78.185
78.18 – 82.171
Fig 3.
seamark_type · Almost everything is tagged 'wreck' (98%); the small slice of hulks and other types is the only variation.
Show data table
Top values for seamark_type (10 unique shown, of 10 total).
valuecountshare
wreck502690.2%
hulk561.0%
shoreline_construction140.3%
obstruction70.1%
harbour20.0%
restricted_area10.0%
plane10.0%
beacon_special_purpose10.0%
landmark10.0%
no10.0%
Fig 4.
type · Confirms the shipwreck dominance, with a long tail of barges, submarines, and aircraft worth isolating.
Show data table
Top values for type (20 unique shown, of 30 total).
valuecountshare
shipwreck508191.2%
ship3816.8%
barge270.5%
submarine180.3%
aircraft170.3%
plane100.2%
boat40.1%
vehicle30.1%
motor_vehicle30.1%
schooner20.0%
car20.0%
sailboat20.0%
battleship20.0%
steamer10.0%
airplane10.0%
freightcar10.0%
train10.0%
paddle steamer10.0%
motorbike10.0%
helicopter10.0%
Fig 5.
osm_type · Roughly two-thirds are point nodes and one-third are ways (polygons/lines), which affects how each wreck should be rendered.
Show data table
Top values for osm_type (2 unique shown, of 2 total).
valuecountshare
node365665.6%
way191334.4%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
latnumeric0.0%
lonnumeric0.0%
year_sunkcategorical99.3%
typecategorical0.0%
wikipediacategorical94.4%
wikidatacategorical93.5%
descriptioncategorical93.9%
heritagecategorical99.8%
accesscategorical90.9%
depthcategorical96.3%
seamark_typecategorical8.2%
osm_idnumeric0.0%
osm_typecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
latlonosm_id
lat+1.00-0.26-0.10
lon-0.26+1.00+0.24
osm_id-0.10+0.24+1.00

name text identifier

This column holds short text labels for individual records, almost certainly vessel or wreck names: 5,497 of 5,569 values are unique, mean length is 17.8 characters with a median of 2 words, and the dominant token "shipwreck" appears 3,706 times alongside nautical prefixes like "ss", "uss", and "hms". Despite the near_unique alert, there are 72 duplicates (1.3%) worth inspecting, and the recurring "shipwreck"/"(wrack)" tokens suggest names follow a templated pattern rather than being free prose.

Treatment: Treat as a name identifier; strip boilerplate tokens like "shipwreck" before any text matching, and do not use as a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["name"].stats

statvalue
n5,569
nulls0 (0.0%)
unique5,497
len_min 2
len_max 153
len_mean 17.81
len_median 20
len_p95 21
word_mean 2.073
word_median 2
n_empty 0
n_duplicates 72
duplicate_rate 0.01293
vocab_size 6,255
readability_flesch_mean 71.33
emoji_rate 0
url_rate 0
one_word_rate 0.1034
allcaps_rate 0.01724
boilerplate_rate 0
alert: near_unique98.7% of rows are unique strings
Fig 8.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 17.812713233973785).
charscount
2 – 6170
6 – 10475
10 – 13487
13 – 17304
17 – 213551
21 – 25425
25 – 2871
28 – 3234
32 – 3616
36 – 4012
40 – 4410
44 – 475
47 – 513
51 – 550
55 – 591
59 – 621
62 – 661
66 – 701
70 – 740
74 – 781
78 – 810
81 – 850
85 – 890
89 – 930
93 – 960
96 – 1000
100 – 1040
104 – 1080
108 – 1110
111 – 1150
115 – 1190
119 – 1230
123 – 1270
127 – 1300
130 – 1340
134 – 1380
138 – 1420
142 – 1450
145 – 1490
149 – 1531

lat numeric feature

This is a latitude feature spanning -77.42 to 82.17, covering nearly the full geographic range from Antarctica to the high Arctic. The distribution is left-skewed (skew -1.14) with a median of 40.64 well above the mean of 28.41, suggesting a concentration of points in northern mid-latitudes with a tail of southern hemisphere observations. With 5561 unique values across 5569 rows and no nulls, each record carries a near-distinct coordinate.

Treatment: Pair with the matching longitude column for geospatial features rather than using as a standalone scalar.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["lat"].stats

statvalue
n5,569
nulls0 (0.0%)
unique5,561
min -77.42
max 82.17
mean 28.41
median 40.64
std 31.36
q1 12.54
q3 50.36
iqr 37.82
skew -1.136
kurtosis 0.08299
n_outliers 112
outlier_rate 0.02011
zero_rate 0
Fig 9.
Distribution of lat. Vertical dash marks the median.
Show data table
Histogram bins for lat (median: 40.6434862).
bincount
-77.42 – -73.441
-73.44 – -69.450
-69.45 – -65.460
-65.46 – -61.471
-61.47 – -57.481
-57.48 – -53.4920
-53.49 – -49.539
-49.5 – -45.5141
-45.51 – -41.5250
-41.52 – -37.53107
-37.53 – -33.54235
-33.54 – -29.55110
-29.55 – -25.5655
-25.56 – -21.57117
-21.57 – -17.5867
-17.58 – -13.5928
-13.59 – -9.59730
-9.597 – -5.60772
-5.607 – -1.61773
-1.617 – 2.37367
2.373 – 6.36339
6.363 – 10.35180
10.35 – 14.34104
14.34 – 18.3380
18.33 – 22.32106
22.32 – 26.3184
26.31 – 30.374
30.3 – 34.29140
34.29 – 38.28470
38.28 – 42.27734
42.27 – 46.26578
46.26 – 50.25466
50.25 – 54.24668
54.24 – 58.23332
58.23 – 62.22185
62.22 – 66.2177
66.21 – 70.298
70.2 – 74.1934
74.19 – 78.185
78.18 – 82.171

lon numeric feature

This column holds longitude coordinates, with values ranging from -179.28 to 179.45 spanning the full globe and 5568 unique values across 5569 rows. The distribution is mildly right-skewed (0.53) with a median of 2.03 sitting near the prime meridian, and the IQR of 80.73 suggests broad geographic coverage. The flagged 542 outliers (9.7%) likely reflect points in the Pacific tails rather than data errors, given valid lon bounds.

Treatment: Pair with latitude as a geospatial feature; avoid treating outliers as anomalies since extremes are valid longitudes.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["lon"].stats

statvalue
n5,569
nulls0 (0.0%)
unique5,568
min -179.3
max 179.4
mean 1.413
median 2.033
std 76.75
q1 -58.6
q3 22.13
iqr 80.73
skew 0.5273
kurtosis 0.2435
n_outliers 542
outlier_rate 0.09732
zero_rate 0
alert: outliers9.7% rows beyond 1.5 IQR
Fig 10.
Distribution of lon. Vertical dash marks the median.
Show data table
Histogram bins for lon (median: 2.033333).
bincount
-179.3 – -170.354
-170.3 – -161.317
-161.3 – -152.47
-152.4 – -143.45
-143.4 – -134.44
-134.4 – -125.515
-125.5 – -116.5266
-116.5 – -107.515
-107.5 – -98.573
-98.57 – -89.639
-89.6 – -80.63161
-80.63 – -71.66520
-71.66 – -62.7163
-62.7 – -53.73211
-53.73 – -44.76146
-44.76 – -35.79157
-35.79 – -26.8239
-26.82 – -17.8525
-17.85 – -8.88698
-8.886 – 0.08197586
0.08197 – 9.05539
9.05 – 18.02923
18.02 – 26.99302
26.99 – 35.96239
35.96 – 44.9286
44.92 – 53.89100
53.89 – 62.8661
62.86 – 71.838
71.83 – 80.831
80.8 – 89.767
89.76 – 98.737
98.73 – 107.723
107.7 – 116.734
116.7 – 125.644
125.6 – 134.648
134.6 – 143.644
143.6 – 152.5109
152.5 – 161.569
161.5 – 170.5163
170.5 – 179.4201

year_sunk categorical metadata

This column records the year (or fuller date) a vessel sank, but it's almost entirely empty — 99.34% null with only 36 distinct values across 5569 rows. Date formats are inconsistent: bare years like '1942' and '1435', ISO strings like '1937-09-02', verbose forms like 'June 7, 1928', and even ranges like '1643..1663' coexist. Entropy ratio of 0.997 confirms the few populated values are nearly all unique, with '1942' the only repeat (2 occurrences).

Treatment: Parse to a normalized year integer and treat as sparse metadata; too null-heavy to use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["year_sunk"].stats

statvalue
n5,569
nulls5,532 (99.3%)
unique36
top_value 1942
top_rate 0.05405
cardinality 36
entropy 5.155
entropy_ratio 0.9972
alert: long_tail35 singleton categories
alert: null_rate99.3% null
Fig 11.
Top values for year_sunk.
Show data table
Top values for year_sunk (20 unique shown, of 36 total).
valuecountshare
194220.0%
30 June 189010.0%
185410.0%
197110.0%
1937-09-0210.0%
1963-0210.0%
1643..166310.0%
198210.0%
June 7, 192810.0%
143510.0%
1920-12-1610.0%
1490s10.0%
~170010.0%
20 April 194310.0%
25 May 196310.0%
171010.0%
191510.0%
190910.0%
195110.0%
195210.0%

type categorical label

Categorical type label for each record, dominated overwhelmingly by maritime wreckage: 'shipwreck' accounts for 5081 of 5569 rows (91.2% top_rate) with 30 distinct values total. Entropy ratio of 0.115 confirms the long_tail alert — the remaining 29 categories split fewer than 500 rows, with several near-synonyms ('ship'/'boat'/'schooner', 'aircraft'/'plane', 'vehicle'/'motor_vehicle') suggesting inconsistent labelling that could be consolidated.

Treatment: Collapse synonymous categories and consider binarising as shipwreck-vs-other given the extreme imbalance.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["type"].stats

statvalue
n5,569
nulls0 (0.0%)
unique30
top_value shipwreck
top_rate 0.9124
cardinality 30
entropy 0.565
entropy_ratio 0.1151
alert: long_tail17 singleton categories
Fig 12.
Top values for type.
Show data table
Top values for type (20 unique shown, of 30 total).
valuecountshare
shipwreck508191.2%
ship3816.8%
barge270.5%
submarine180.3%
aircraft170.3%
plane100.2%
boat40.1%
vehicle30.1%
motor_vehicle30.1%
schooner20.0%
car20.0%
sailboat20.0%
battleship20.0%
steamer10.0%
airplane10.0%
freightcar10.0%
train10.0%
paddle steamer10.0%
motorbike10.0%
helicopter10.0%

wikipedia categorical metadata

This column holds Wikipedia article references prefixed with a language code (e.g., 'en:SS Edmund Fitzgerald', 'fr:Armorique (navire)', 'ar:...'), likely linking each record to an encyclopedia entry about a ship, aircraft, or wreck. It is overwhelmingly sparse — 94.38% null with only 307 distinct values across 5569 rows — and the distribution is nearly flat (entropy ratio 0.998, top value appears just 4 times, top_rate 1.28%). The presence of multiple language prefixes (en, fr, ar) signals a mixed-language reference field rather than a clean categorical.

Treatment: Treat as an optional external reference link; drop for modelling or split off the language prefix if needed.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["wikipedia"].stats

statvalue
n5,569
nulls5,256 (94.4%)
unique307
top_value en:SS Edmund Fitzgerald
top_rate 0.01278
cardinality 307
entropy 8.245
entropy_ratio 0.998
alert: long_tail303 singleton categories
alert: null_rate94.4% null
Fig 13.
Top values for wikipedia.
Show data table
Top values for wikipedia (20 unique shown, of 307 total).
valuecountshare
en:SS Edmund Fitzgerald40.1%
fr:Armorique (navire)20.0%
en:Curtiss C-46 Commando20.0%
en:USS Amesbury20.0%
en:SS America (1939)10.0%
ar:سفينة زيستل جورم10.0%
en:BOS 40010.0%
en:New Carissa10.0%
en:MV Cita10.0%
en:SS Richard Montgomery10.0%
en:Kroombit Tops National Park#Crash site10.0%
en:Astron (ship)10.0%
en:SS Yongala10.0%
et:Raketa (laev, 1949)10.0%
en:USNS General Hoyt S. Vandenberg (T-AGM-10)10.0%
en:USS Oriskany (CV-34)10.0%
en:USS Massachusetts (BB-2)10.0%
en:Water Witch (schooner)10.0%
en:Burlington Bay Horse Ferry10.0%
en:Champlain II10.0%

wikidata categorical foreign_key

This column holds Wikidata Q-identifiers (e.g., Q1286267), linking rows to entities in the Wikidata knowledge graph. It is overwhelmingly sparse — 93.52% null — and among the 5569 rows only 353 unique values appear, with the most common identifier showing up just 4 times (top_rate 0.011). Entropy ratio of 0.998 confirms the non-null values are nearly all distinct, consistent with a foreign key rather than a categorical feature.

Treatment: Left-join on this id to enrich with Wikidata attributes; do not use as a model feature directly.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["wikidata"].stats

statvalue
n5,569
nulls5,208 (93.5%)
unique353
top_value Q1286267
top_rate 0.01108
cardinality 353
entropy 8.446
entropy_ratio 0.9979
alert: long_tail347 singleton categories
alert: null_rate93.5% null
Fig 14.
Top values for wikidata.
Show data table
Top values for wikidata (20 unique shown, of 353 total).
valuecountshare
Q128626740.1%
Q95969620.0%
Q21569220.0%
Q286278720.0%
Q114570820.0%
Q1167575320.0%
Q46309110.0%
Q3227610.0%
Q11570975610.0%
Q1421380110.0%
Q287735310.0%
Q700637610.0%
Q671937910.0%
Q4177161610.0%
Q739428510.0%
Q742019310.0%
Q135932110.0%
Q481160110.0%
Q142428910.0%
Q161884210.0%

description categorical free_text

Free-text descriptive notes about wrecks, barges, and other maritime features, populated for only ~6% of rows (null_rate 0.9386). Among the 342 non-null entries there are 291 distinct strings with entropy_ratio 0.975, so values are nearly all unique short narratives; the modal phrase 'WWII era concrete fuel barge converted into breakwater' appears just 14 times (top_rate 0.041). Mixed languages are present (e.g., French 'Chaloupe abandonnée à terre' alongside English), confirming this is curator-authored prose rather than a controlled vocabulary.

Treatment: Treat as sparse free text; tokenize/embed for search or keyword extraction rather than using as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["description"].stats

statvalue
n5,569
nulls5,227 (93.9%)
unique291
top_value WWII era concrete fuel barge converted into breakwater
top_rate 0.04094
cardinality 291
entropy 7.983
entropy_ratio 0.9753
alert: long_tail269 singleton categories
alert: null_rate93.9% null
Fig 15.
Top values for description.
Show data table
Top values for description (20 unique shown, of 291 total).
valuecountshare
WWII era concrete fuel barge converted into breakwater140.3%
Wrecks70.1%
WWII concrete barge sunk as part of jetty, partially covered by jetty and fill50.1%
Location is based on divers hand drawn maps. Due to the wreak breaking up and salvage, the wreak is scattered over a large area.40.1%
Partially sunken ships40.1%
Concrete petrol barge sunk as part of breakwater40.1%
Wrecks of Zulu fishing boats30.1%
Chaloupe abandonnée à terre30.1%
WWII era concrete fuel barge sunk as part of jetty foundation30.1%
Armada Ship20.0%
remains of sunken wooden boats20.0%
Hundido el 3 de julio de 1898 durante la batalla naval de Santiago de Cuba en la Guerra Hispano-Cubana-Norteamericana.20.0%
09/09/2006 : Epave en bois, longue de 20 mètres, large de 4 mètres et haute de 3 mètres.20.0%
Steamer20.0%
Iron-hulled barque20.0%
On shore wreck of a small abandoned wooden ship.20.0%
Épave20.0%
Dojście do wraków w zasadzie wolne. Jednak mogą wystąpić sytuacje gdy będzie to utrudnione lub niemożliwe.20.0%
Wrecked sealing vessel20.0%
Staten Island boat graveyard20.0%

heritage categorical metadata

A categorical 'heritage' field that is effectively empty: 99.77% null, with only 13 non-null values across 4 distinct levels. The observed values are inconsistent ('2', '1', 'yes', 'no'), suggesting a coding scheme that was never standardized or fully populated.

Treatment: Drop; null_rate of 0.9977 leaves too little signal and the value coding is inconsistent.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["heritage"].stats

statvalue
n5,569
nulls5,556 (99.8%)
unique4
top_value 2
top_rate 0.7692
cardinality 4
entropy 1.145
entropy_ratio 0.5726
alert: long_tail3 singleton categories
alert: null_rate99.8% null
Fig 16.
Top values for heritage.
Show data table
Top values for heritage (4 unique shown, of 4 total).
valuecountshare
2100.2%
no10.0%
yes10.0%
110.0%

access categorical metadata

This is an OpenStreetMap-style 'access' tag indicating who may use a feature, with values like 'yes', 'no', 'permit', 'private', 'permissive', 'customers', and 'foot'. It is overwhelmingly null (90.88%), and among the 508 populated rows 'yes' dominates at 66.93%, leaving the other 7 categories thinly represented. Cardinality is only 8 with entropy ratio 0.55, so signal beyond presence/absence is limited.

Treatment: Collapse rare levels and encode as a low-cardinality categorical, or reduce to a populated/'yes' indicator given the 90.88% null rate.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["access"].stats

statvalue
n5,569
nulls5,061 (90.9%)
unique8
top_value yes
top_rate 0.6693
cardinality 8
entropy 1.649
entropy_ratio 0.5497
alert: null_rate90.9% null
Fig 17.
Top values for access.
Show data table
Top values for access (8 unique shown, of 8 total).
valuecountshare
yes3406.1%
no731.3%
permit270.5%
private270.5%
unknown200.4%
permissive170.3%
customers30.1%
foot10.0%

depth categorical feature

A free-text 'depth' field, almost certainly a measurement (likely meters) but stored as strings with mixed formats — bare numbers like '7', '16', '14' coexist with unit-suffixed values like '30m', '25m', and decimals like '12.2'. It is overwhelmingly missing (null_rate 0.9627) and extremely diffuse among the 208 populated rows: 154 unique values, top value '7' covers only 2.88%, and entropy_ratio 0.975 indicates a near-uniform long tail.

Treatment: Strip unit suffixes and parse to numeric meters; given >96% nulls, treat as low-signal and consider dropping or flagging presence only.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["depth"].stats

statvalue
n5,569
nulls5,361 (96.3%)
unique154
top_value 7
top_rate 0.02885
cardinality 154
entropy 7.085
entropy_ratio 0.975
alert: long_tail120 singleton categories
alert: null_rate96.3% null
Fig 18.
Top values for depth.
Show data table
Top values for depth (20 unique shown, of 154 total).
valuecountshare
760.1%
30m40.1%
1640.1%
1440.1%
840.1%
1040.1%
25m30.1%
12.230.1%
15.630.1%
1530.1%
430.1%
530.1%
6.120.0%
-0.320.0%
9720.0%
1120.0%
6.420.0%
4220.0%
1820.0%
320.0%

seamark_type categorical feature

Categorical seamark classification with 10 distinct values, almost entirely dominated by 'wreck' at 98.36% of non-null rows (5026 of 5569). The remaining categories are extreme long-tail (hulk at 56, then single-digit counts down to one), and 8.24% of rows are null. Entropy ratio of 0.044 confirms the column carries almost no discriminative signal.

Treatment: Drop or collapse to a binary 'is_wreck' flag; near-constant.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["seamark_type"].stats

statvalue
n5,569
nulls459 (8.2%)
unique10
top_value wreck
top_rate 0.9836
cardinality 10
entropy 0.1477
entropy_ratio 0.04447
alert: imbalancetop value is 98.4% of rows
Fig 19.
Top values for seamark_type.
Show data table
Top values for seamark_type (10 unique shown, of 10 total).
valuecountshare
wreck502690.2%
hulk561.0%
shoreline_construction140.3%
obstruction70.1%
harbour20.0%
restricted_area10.0%
plane10.0%
beacon_special_purpose10.0%
landmark10.0%
no10.0%

osm_id numeric identifier

This is almost certainly the OpenStreetMap object id: every one of the 5569 rows is unique, no nulls, no zeros, and values span 13M to 13.5B which matches OSM's monotonically growing id space. The distribution is right-skewed (skew 1.07) with mean 4.03B well above the median 2.35B, reflecting OSM's accumulation of newer, higher ids over time rather than anything analytically meaningful.

Treatment: Drop from modelling; retain as a join key to OSM source data.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["osm_id"].stats

statvalue
n5,569
nulls0 (0.0%)
unique5,569
min 1.306e+07
max 1.354e+10
mean 4.032e+09
median 2.349e+09
std 3.875e+09
q1 1.181e+09
q3 6.516e+09
iqr 5.335e+09
skew 1.071
kurtosis -0.2044
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 20.
Distribution of osm_id. Vertical dash marks the median.
Show data table
Histogram bins for osm_id (median: 2348528796.0).
bincount
1.306e+07 – 3.511e+08261
3.511e+08 – 6.892e+08442
6.892e+08 – 1.027e+09361
1.027e+09 – 1.365e+09685
1.365e+09 – 1.703e+09726
1.703e+09 – 2.041e+09181
2.041e+09 – 2.38e+09136
2.38e+09 – 2.718e+0946
2.718e+09 – 3.056e+09213
3.056e+09 – 3.394e+09623
3.394e+09 – 3.732e+0944
3.732e+09 – 4.07e+0945
4.07e+09 – 4.408e+0955
4.408e+09 – 4.746e+0929
4.746e+09 – 5.084e+0953
5.084e+09 – 5.422e+0945
5.422e+09 – 5.76e+0927
5.76e+09 – 6.098e+0986
6.098e+09 – 6.436e+0955
6.436e+09 – 6.774e+09174
6.774e+09 – 7.112e+0959
7.112e+09 – 7.45e+0945
7.45e+09 – 7.789e+0939
7.789e+09 – 8.127e+0932
8.127e+09 – 8.465e+0970
8.465e+09 – 8.803e+0950
8.803e+09 – 9.141e+0955
9.141e+09 – 9.479e+09182
9.479e+09 – 9.817e+0956
9.817e+09 – 1.015e+1063
1.015e+10 – 1.049e+1027
1.049e+10 – 1.083e+1035
1.083e+10 – 1.117e+1062
1.117e+10 – 1.151e+1037
1.151e+10 – 1.185e+1067
1.185e+10 – 1.218e+1082
1.218e+10 – 1.252e+1056
1.252e+10 – 1.286e+10105
1.286e+10 – 1.32e+10123
1.32e+10 – 1.354e+1037

osm_type categorical feature

This column records the OpenStreetMap geometry type for each row, taking only two values: "node" (3656 rows, 65.6%) and "way" (1913 rows). With cardinality 2 and entropy ratio 0.928, the split is fairly balanced but tilted toward nodes, and there are no nulls across all 5569 rows.

Treatment: Encode as a binary indicator (node vs way) for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["osm_type"].stats

statvalue
n5,569
nulls0 (0.0%)
unique2
top_value node
top_rate 0.6565
cardinality 2
entropy 0.9281
entropy_ratio 0.9281
Fig 21.
Top values for osm_type.
Show data table
Top values for osm_type (2 unique shown, of 2 total).
valuecountshare
node365665.6%
way191334.4%

How to cite

click to copy

BibTeX
@misc{saturn-quirky-shipwrecks-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky shipwrecks},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-shipwrecks}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky shipwrecks. Source: /home/coolhand/html/datavis/data_trove/data/quirky/shipwrecks.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-shipwrecks