saturn·

quirky geothermal

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/geothermal.json

Saturn profiled 8,776 rows across 13 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/geothermal.json",
    "--findings", "quirky-geothermal.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 8,776 geothermal features (hot springs and geysers) sourced from OpenStreetMap, with 13 columns covering location, type, and optional metadata like temperature and tourism use. The core signal is in the `type` and `osm_type` fields: roughly 80% are hot springs and 20% geysers, and most entries are point nodes rather than ways. Geographic coverage is global but skewed — latitude leans heavily toward the northern hemisphere with a long southern tail flagged as outliers, while longitude spans the full range. Be aware that nearly all the descriptive fields (`country`, `wikipedia`, `temperature`, `description`, `access`, `tourism`, `intermittent`) have null rates above 97%, so they're only useful for the small annotated subset. Within that subset, `tourism` is dominated by 'attraction' and `intermittent` is overwhelmingly 'no', which limits their analytic value.

citing: row_count · column_count · columns.type.top_values · columns.osm_type.top_values · columns.lat.stats · columns.lon.stats · columns.tourism.top_values · columns.country.null_rate · columns.temperature.top_values · columns.name.stats

Out[4]:

saturn.schema() · 13 columns

column kind n null% unique alerts
name text 8,776 0.0% 8,316 allcaps
lat numeric 8,776 0.0% 8,758 high_skew outliers
lon numeric 8,776 0.0% 8,747
country categorical 8,776 99.9% 6 long_tail null_rate
type categorical 8,776 0.0% 2
temperature categorical 8,776 98.3% 56 long_tail null_rate
wikipedia categorical 8,776 98.5% 124 long_tail null_rate
description categorical 8,776 98.1% 141 long_tail null_rate
intermittent categorical 8,776 97.9% 2 null_rate
access categorical 8,776 99.0% 5 null_rate
tourism categorical 8,776 97.5% 10 null_rate
osm_id numeric 8,776 0.0% 8,776
osm_type categorical 8,776 0.0% 2
Fig 1.
type · Shows the hot spring vs geyser split — about 81% are hot springs.
Show data table
Top values for type (2 unique shown, of 2 total).
valuecountshare
hot_spring708280.7%
geyser169419.3%
Fig 2.
osm_type · Most features are mapped as nodes (76%) rather than ways, indicating point-based geometry dominates.
Show data table
Top values for osm_type (2 unique shown, of 2 total).
valuecountshare
node670576.4%
way207123.6%
Fig 3.
lat · Latitude distribution is northern-skewed with a long southern tail; watch for the 889 flagged outliers.
Show data table
Histogram bins for lat (median: 40.8604357).
bincount
-54.68 – -51.541
-51.54 – -48.40
-48.4 – -45.250
-45.25 – -42.1123
-42.11 – -38.9730
-38.97 – -35.83132
-35.83 – -32.6924
-32.69 – -29.559
-29.55 – -26.4114
-26.41 – -23.2710
-23.27 – -20.1355
-20.13 – -16.99104
-16.99 – -13.8420
-13.84 – -10.713
-10.7 – -7.56329
-7.563 – -4.42214
-4.422 – -1.28125
-1.281 – 1.8635
1.86 – 5.00132
5.001 – 8.14235
8.142 – 11.2863
11.28 – 14.4288
14.42 – 17.56276
17.56 – 20.7182
20.71 – 23.85134
23.85 – 26.99100
26.99 – 30.13298
30.13 – 33.27997
33.27 – 36.411011
36.41 – 39.55438
39.55 – 42.69597
42.69 – 45.833559
45.83 – 48.97117
48.97 – 52.12105
52.12 – 55.26101
55.26 – 58.430
58.4 – 61.549
61.54 – 64.6888
64.68 – 67.8275
67.82 – 70.963
Fig 4.
lon · Longitude spreads globally but is bimodal between the Americas and Eurasia/Pacific.
Show data table
Histogram bins for lon (median: -21.228268900000003).
bincount
-176.6 – -167.74
-167.7 – -158.82
-158.8 – -149.92
-149.9 – -1413
-141 – -132.12
-132.1 – -123.218
-123.2 – -114.3403
-114.3 – -105.43413
-105.4 – -96.5323
-96.53 – -87.6324
-87.63 – -78.7358
-78.73 – -69.83158
-69.83 – -60.93193
-60.93 – -52.0311
-52.03 – -43.135
-43.13 – -34.230
-34.23 – -25.337
-25.33 – -16.44154
-16.44 – -7.535122
-7.535 – 1.36468
1.364 – 10.26102
10.26 – 19.16203
19.16 – 28.06374
28.06 – 36.96637
36.96 – 45.861145
45.86 – 54.76325
54.76 – 63.6648
63.66 – 72.5643
72.56 – 81.46114
81.46 – 90.3626
90.36 – 99.2699
99.26 – 108.2181
108.2 – 117.145
117.1 – 126190
126 – 134.980
134.9 – 143.8234
143.8 – 152.741
152.7 – 161.683
161.6 – 170.55
170.5 – 179.4131
Fig 5.
tourism · Among the ~2.5% of records with a tourism tag, 'attraction' dominates over hotels and campsites.
Show data table
Top values for tourism (10 unique shown, of 10 total).
valuecountshare
attraction1942.2%
hotel90.1%
yes70.1%
camp_site30.0%
camp_site;loding30.0%
information20.0%
viewpoint20.0%
picnic_site10.0%
caravan_site10.0%
guest_house10.0%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
latnumeric0.0%
lonnumeric0.0%
countrycategorical99.9%
typecategorical0.0%
temperaturecategorical98.3%
wikipediacategorical98.5%
descriptioncategorical98.1%
intermittentcategorical97.9%
accesscategorical99.0%
tourismcategorical97.5%
osm_idnumeric0.0%
osm_typecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
latlonosm_id
lat+1.00-0.29+0.04
lon-0.29+1.00+0.16
osm_id+0.04+0.16+1.00

name text label

Short free-text names for places, almost certainly hot springs and geysers given that 'hot_spring' (3393) and 'geyser' (1083) dominate top_words. The column is highly diverse (8316 unique of 8776) but multilingual — top values mix Arabic, English, Spanish, and Turkish — and 25.3% of entries are all-caps, which would break naive string matching. There are 460 duplicates (5.2%) including 40 repeats of a single Arabic name, suggesting the same feature recorded multiple times.

Treatment: Normalize case and unicode, then keep as a descriptive label; do not use as a join key given duplicates and language mix.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["name"].stats

statvalue
n8,776
nulls0 (0.0%)
unique8,316
len_min 1
len_max 60
len_mean 17.1
len_median 19
len_p95 24
word_mean 2.16
word_median 2
n_empty 0
n_duplicates 460
duplicate_rate 0.05242
vocab_size 9,610
readability_flesch_mean 101.6
emoji_rate 0
url_rate 0
one_word_rate 0.1508
allcaps_rate 0.2525
boilerplate_rate 0
alert: allcaps25.3% rows are all-caps
Fig 8.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 17.099247948951685).
charscount
1 – 238
2 – 493
4 – 5586
5 – 7173
7 – 8320
8 – 10194
10 – 11613
11 – 13273
13 – 14395
14 – 16199
16 – 17411
17 – 191001
19 – 20412
20 – 222593
22 – 23964
23 – 2591
25 – 26106
26 – 2848
28 – 2974
29 – 3030
30 – 3222
32 – 3345
33 – 3515
35 – 3620
36 – 383
38 – 3913
39 – 417
41 – 427
42 – 443
44 – 456
45 – 471
47 – 4810
48 – 501
50 – 511
51 – 533
53 – 542
54 – 560
56 – 571
57 – 590
59 – 602

lat numeric feature

This is a latitude coordinate column, with values spanning -54.68 to 70.96 and a median of 40.86 placing most records in the northern hemisphere. The distribution is heavily left-skewed (skew -2.36, kurtosis 6.28) with 889 outliers (10.1%), reflecting a long southern-hemisphere tail relative to a northern-clustered core. Near-unique values (8758/8776) indicate point-level geolocations rather than coarse bins.

Treatment: Pair with longitude for geospatial features (e.g., bin, cluster, or compute distances) rather than treating as a standalone numeric.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["lat"].stats

statvalue
n8,776
nulls0 (0.0%)
unique8,758
min -54.68
max 70.96
mean 34.77
median 40.86
std 18.12
q1 32.31
q3 44.53
iqr 12.22
skew -2.36
kurtosis 6.277
n_outliers 889
outlier_rate 0.1013
zero_rate 0
alert: high_skewskew=-2.36
alert: outliers10.1% rows beyond 1.5 IQR
Fig 9.
Distribution of lat. Vertical dash marks the median.
Show data table
Histogram bins for lat (median: 40.8604357).
bincount
-54.68 – -51.541
-51.54 – -48.40
-48.4 – -45.250
-45.25 – -42.1123
-42.11 – -38.9730
-38.97 – -35.83132
-35.83 – -32.6924
-32.69 – -29.559
-29.55 – -26.4114
-26.41 – -23.2710
-23.27 – -20.1355
-20.13 – -16.99104
-16.99 – -13.8420
-13.84 – -10.713
-10.7 – -7.56329
-7.563 – -4.42214
-4.422 – -1.28125
-1.281 – 1.8635
1.86 – 5.00132
5.001 – 8.14235
8.142 – 11.2863
11.28 – 14.4288
14.42 – 17.56276
17.56 – 20.7182
20.71 – 23.85134
23.85 – 26.99100
26.99 – 30.13298
30.13 – 33.27997
33.27 – 36.411011
36.41 – 39.55438
39.55 – 42.69597
42.69 – 45.833559
45.83 – 48.97117
48.97 – 52.12105
52.12 – 55.26101
55.26 – 58.430
58.4 – 61.549
61.54 – 64.6888
64.68 – 67.8275
67.82 – 70.963

lon numeric feature

This column is almost certainly geographic longitude in decimal degrees: values span -176.63 to 179.36, with mean -23.99 and median -21.23 sitting plausibly within the valid [-180, 180] range. The wide IQR of 154.96 and std of 89.39 indicate global coverage rather than a regional dataset, and near-uniqueness (8747 unique of 8776) suggests each row is a distinct location. No nulls, no zeros, and no flagged outliers.

Treatment: Pair with latitude for geospatial features; avoid treating as a plain scalar in models due to wraparound at ±180.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["lon"].stats

statvalue
n8,776
nulls0 (0.0%)
unique8,747
min -176.6
max 179.4
mean -23.99
median -21.23
std 89.39
q1 -110.8
q3 44.15
iqr 155
skew 0.4043
kurtosis -1.179
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of lon. Vertical dash marks the median.
Show data table
Histogram bins for lon (median: -21.228268900000003).
bincount
-176.6 – -167.74
-167.7 – -158.82
-158.8 – -149.92
-149.9 – -1413
-141 – -132.12
-132.1 – -123.218
-123.2 – -114.3403
-114.3 – -105.43413
-105.4 – -96.5323
-96.53 – -87.6324
-87.63 – -78.7358
-78.73 – -69.83158
-69.83 – -60.93193
-60.93 – -52.0311
-52.03 – -43.135
-43.13 – -34.230
-34.23 – -25.337
-25.33 – -16.44154
-16.44 – -7.535122
-7.535 – 1.36468
1.364 – 10.26102
10.26 – 19.16203
19.16 – 28.06374
28.06 – 36.96637
36.96 – 45.861145
45.86 – 54.76325
54.76 – 63.6648
63.66 – 72.5643
72.56 – 81.46114
81.46 – 90.3626
90.36 – 99.2699
99.26 – 108.2181
108.2 – 117.145
117.1 – 126190
126 – 134.980
134.9 – 143.8234
143.8 – 152.741
152.7 – 161.683
161.6 – 170.55
170.5 – 179.4131

country categorical metadata

This is a country code field (ISO-2 style values like IQ, TW, MX, DE, RU, JP) that is effectively empty: 99.91% of the 8776 rows are null, leaving only 8 observed values across 6 distinct codes. The non-null distribution is too sparse to be meaningful, though IQ appears 3 times and accounts for 37.5% of present values. With this null rate, any apparent signal is noise.

Treatment: Drop the column; null rate of 99.91% leaves nothing to model.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["country"].stats

statvalue
n8,776
nulls8,768 (99.9%)
unique6
top_value IQ
top_rate 0.375
cardinality 6
entropy 2.406
entropy_ratio 0.9306
alert: long_tail5 singleton categories
alert: null_rate99.9% null
Fig 11.
Top values for country.
Show data table
Top values for country (6 unique shown, of 6 total).
valuecountshare
IQ30.0%
TW10.0%
MX10.0%
DE10.0%
RU10.0%
JP10.0%

type categorical label

Binary categorical column distinguishing two hydrothermal feature types: hot_spring (7082 rows, ~80.7%) and geyser (1694 rows). No nulls and only 2 unique values, so the field is clean but imbalanced roughly 4:1 toward hot_spring. Entropy ratio of 0.71 reflects that skew rather than any data quality issue.

Treatment: One-hot or boolean-encode; stratify splits to preserve the ~4:1 class balance.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["type"].stats

statvalue
n8,776
nulls0 (0.0%)
unique2
top_value hot_spring
top_rate 0.807
cardinality 2
entropy 0.7078
entropy_ratio 0.7078
Fig 12.
Top values for type.
Show data table
Top values for type (2 unique shown, of 2 total).
valuecountshare
hot_spring708280.7%
geyser169419.3%

temperature categorical feature

A free-text temperature field that is 98.31% null, with only 148 populated rows out of 8776. The dominant value is the descriptive string 'hot' (76 occurrences, 51.35% of populated rows), while the remaining entries are numeric strings like '90', '100', '21' — indicating a mix of qualitative and quantitative encodings with no consistent unit. Cardinality is 56 with entropy ratio 0.64, so the long tail is sparse but varied.

Treatment: Drop or treat as missing-by-default; if retained, normalize units and split numeric vs categorical encodings before use.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["temperature"].stats

statvalue
n8,776
nulls8,628 (98.3%)
unique56
top_value hot
top_rate 0.5135
cardinality 56
entropy 3.734
entropy_ratio 0.643
alert: long_tail44 singleton categories
alert: null_rate98.3% null
Fig 13.
Top values for temperature.
Show data table
Top values for temperature (20 unique shown, of 56 total).
valuecountshare
hot760.9%
9040.0%
10040.0%
2130.0%
9530.0%
3720.0%
4320.0%
2820.0%
4020.0%
4220.0%
3820.0%
4820.0%
37-39°10.0%
35-37 °C10.0%
58°C10.0%
52,110.0%
25-3010.0%
98°C10.0%
40-43°10.0%
7710.0%

wikipedia categorical metadata

This appears to be a Wikipedia article reference column, with values formatted as language-prefixed page titles (e.g., 'en:Olympic Hot Springs', 'ja:鉢形駅') spanning multiple languages including English, Japanese, Russian, and Icelandic. The column is 98.52% null with only 124 unique values across 8776 rows, and entropy ratio of 0.996 indicates the few populated entries are nearly all distinct. The top value appears just 3 times (0.023 rate), confirming no meaningful concentration.

Treatment: Drop or retain only as a reference link; too sparse and high-cardinality for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["wikipedia"].stats

statvalue
n8,776
nulls8,646 (98.5%)
unique124
top_value en:Olympic Hot Springs
top_rate 0.02308
cardinality 124
entropy 6.924
entropy_ratio 0.9957
alert: long_tail119 singleton categories
alert: null_rate98.5% null
Fig 14.
Top values for wikipedia.
Show data table
Top values for wikipedia (20 unique shown, of 124 total).
valuecountshare
en:Olympic Hot Springs30.0%
en:Fan and Mortar Geysers20.0%
ja:鉢形駅20.0%
ja:男衾駅20.0%
ru:Дачные горячие источники20.0%
is:Geysir10.0%
en:Wilbur Hot Springs10.0%
en:Morning Glory Pool10.0%
en:Giant Geyser10.0%
en:Great Fountain Geyser10.0%
fr:Echinus Geyser10.0%
en:Excelsior Geyser10.0%
en:Turquoise Pool10.0%
en:Opal Pool10.0%
en:Fishing Cone10.0%
is:Strokkur (hver)10.0%
de:Wallender Born10.0%
en:Fountain Paint Pot10.0%
de:Geysir Andernach10.0%
en:Old Faithful10.0%

description categorical free_text

Free-text descriptions, likely for hot spring or geothermal site entries, present on only ~1.94% of the 8776 rows (null_rate 0.9806). Among the 170 non-null entries there are 141 unique values with entropy ratio 0.966, so almost every description is bespoke; the most common string ('Mud geyser created from recent seismic activity') still only repeats 12 times (top_rate 0.0706). Languages are mixed — English, Japanese (熱海七湯), Russian, French — which will complicate any text processing.

Treatment: Treat as multilingual free text: language-detect then tokenize/embed, but expect ~98% missingness to limit usefulness as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["description"].stats

statvalue
n8,776
nulls8,606 (98.1%)
unique141
top_value Mud geyser created from recent seismic activity
top_rate 0.07059
cardinality 141
entropy 6.894
entropy_ratio 0.9656
alert: long_tail131 singleton categories
alert: null_rate98.1% null
Fig 15.
Top values for description.
Show data table
Top values for description (20 unique shown, of 141 total).
valuecountshare
Mud geyser created from recent seismic activity120.1%
熱海七湯50.1%
Free outdoor foot bath hot spring40.0%
Активная группа источников. входящая в состав Дачных горячих источников. Открыта и названа Кирсановым И.Т. и Кирсановой Т.П. в 1962 г.30.0%
Hot Spring30.0%
Exploitation traditionnelle30.0%
Горячие термальные источники30.0%
For 70/100 THB, possibility to get a private room20.0%
hot spring20.0%
Strong sulfur smell. ≈22°C, https://www.researchgate.net/publication/302914492_Ujerat_Termale_dhe_Minerale_te_Shqiperise_Thermal_and_Mineral_Waters_of_Albania20.0%
The only true geyser in the state of Colorado. Frequency of the eruptions vary, 30 to 40 minute intervals are most common. The action is slight and boils 12-15 minutes emitting carbon dioxide and hydrogen sulfide gas. Temperature is usually 82.4 F.10.0%
Bad. Thapai Hot springs 300฿ tickets10.0%
Heilwasser mit leicht erhöhtem Arsengehalt10.0%
Вода содержит примеси кобальта10.0%
Termálna voda 60 stupňov10.0%
sulfur baths10.0%
Two blue fibreglass pools perched on a cliff, overlooking the river valley, surrounded by a wooden deck set amongst bush and forest. Pools are feed from a naturally hot thermal spring in the surrounding geology10.0%
29.8-30 °C, 30-70 l/s https://www.researchgate.net/publication/302914492_Ujerat_Termale_dhe_Minerale_te_Shqiperise_Thermal_and_Mineral_Waters_of_Albania10.0%
29.7-30 °C, 5-50 l/s https://www.researchgate.net/publication/302914492_Ujerat_Termale_dhe_Minerale_te_Shqiperise_Thermal_and_Mineral_Waters_of_Albania10.0%
Private hot spring pools10.0%

intermittent categorical feature

Binary yes/no flag indicating whether something is intermittent, but it is essentially absent: 97.89% of the 8,776 rows are null, leaving only 185 populated values. Among those few, 'no' dominates at 91.9% (170 vs 15 'yes'), so the column carries almost no usable signal.

Treatment: Drop or treat as a sparse indicator; null rate too high for direct modelling.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["intermittent"].stats

statvalue
n8,776
nulls8,591 (97.9%)
unique2
top_value no
top_rate 0.9189
cardinality 2
entropy 0.406
entropy_ratio 0.406
alert: null_rate97.9% null
Fig 16.
Top values for intermittent.
Show data table
Top values for intermittent (2 unique shown, of 2 total).
valuecountshare
no1701.9%
yes150.2%

access categorical feature

This is a low-cardinality categorical access flag (5 distinct values: yes, customers, private, no, permissive) — likely an OSM-style access tag indicating who may use a feature. It is overwhelmingly null at 98.99%, leaving only 89 observed values across 8,776 rows. The non-null distribution is unusually flat (entropy ratio 0.945), with the modal value 'yes' accounting for just 26.97% of present values.

Treatment: Treat missing as its own category ('unspecified') before encoding, given 99% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["access"].stats

statvalue
n8,776
nulls8,687 (99.0%)
unique5
top_value yes
top_rate 0.2697
cardinality 5
entropy 2.194
entropy_ratio 0.9451
alert: null_rate99.0% null
Fig 17.
Top values for access.
Show data table
Top values for access (5 unique shown, of 5 total).
valuecountshare
yes240.3%
customers210.2%
private200.2%
no190.2%
permissive50.1%

tourism categorical metadata

This is an OpenStreetMap-style `tourism` tag classifying points of interest (attraction, hotel, camp_site, viewpoint, etc.). It is almost entirely empty — 97.46% null across 8776 rows — and among the 223 populated entries, `attraction` dominates at 87% (194 records), leaving the other 9 categories as long-tail singletons. A stray `yes` value (7 rows) suggests inconsistent tagging upstream.

Treatment: Drop or collapse to a binary is_attraction flag; too sparse and skewed for direct use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["tourism"].stats

statvalue
n8,776
nulls8,553 (97.5%)
unique10
top_value attraction
top_rate 0.87
cardinality 10
entropy 0.9127
entropy_ratio 0.2747
alert: null_rate97.5% null
Fig 18.
Top values for tourism.
Show data table
Top values for tourism (10 unique shown, of 10 total).
valuecountshare
attraction1942.2%
hotel90.1%
yes70.1%
camp_site30.0%
camp_site;loding30.0%
information20.0%
viewpoint20.0%
picnic_site10.0%
caravan_site10.0%
guest_house10.0%

osm_id numeric identifier

This is almost certainly the OpenStreetMap object identifier: every one of the 8776 rows is unique with no nulls or zeros, and values span 27750092 to 13535658843, the range typical of OSM IDs. The distribution is broad (IQR ~9.94e9) and slightly left-skewed (-0.30) with flat kurtosis (-1.43), consistent with IDs accumulated across OSM history rather than a meaningful numeric feature. No outliers were flagged, which is expected for an identifier.

Treatment: Use as a join key to OSM; do not feed into models as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["osm_id"].stats

statvalue
n8,776
nulls0 (0.0%)
unique8,776
min 2.775e+07
max 1.354e+10
mean 7.186e+09
median 8.263e+09
std 4.374e+09
q1 1.334e+09
q3 1.128e+10
iqr 9.942e+09
skew -0.3011
kurtosis -1.43
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 19.
Distribution of osm_id. Vertical dash marks the median.
Show data table
Histogram bins for osm_id (median: 8262841267.5).
bincount
2.775e+07 – 3.654e+08207
3.654e+08 – 7.031e+08170
7.031e+08 – 1.041e+0972
1.041e+09 – 1.379e+091866
1.379e+09 – 1.716e+0985
1.716e+09 – 2.054e+0928
2.054e+09 – 2.392e+0918
2.392e+09 – 2.729e+0937
2.729e+09 – 3.067e+0944
3.067e+09 – 3.405e+0940
3.405e+09 – 3.742e+0961
3.742e+09 – 4.08e+0939
4.08e+09 – 4.418e+09140
4.418e+09 – 4.756e+09107
4.756e+09 – 5.093e+0968
5.093e+09 – 5.431e+0973
5.431e+09 – 5.769e+0966
5.769e+09 – 6.106e+0987
6.106e+09 – 6.444e+09112
6.444e+09 – 6.782e+0968
6.782e+09 – 7.119e+09147
7.119e+09 – 7.457e+09167
7.457e+09 – 7.795e+09394
7.795e+09 – 8.132e+09194
8.132e+09 – 8.47e+09403
8.47e+09 – 8.808e+09271
8.808e+09 – 9.146e+09323
9.146e+09 – 9.483e+09138
9.483e+09 – 9.821e+09192
9.821e+09 – 1.016e+10270
1.016e+10 – 1.05e+1079
1.05e+10 – 1.083e+10274
1.083e+10 – 1.117e+10215
1.117e+10 – 1.151e+10286
1.151e+10 – 1.185e+10115
1.185e+10 – 1.218e+10386
1.218e+10 – 1.252e+101210
1.252e+10 – 1.286e+10110
1.286e+10 – 1.32e+10122
1.32e+10 – 1.354e+1092

osm_type categorical feature

This column records the OpenStreetMap geometry type, taking only two values across 8,776 rows: 'node' (6,705) and 'way' (2,071). 76.4% of records are nodes, giving an entropy ratio of 0.79 — moderately imbalanced but no nulls or rare categories.

Treatment: One-hot or binary-encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["osm_type"].stats

statvalue
n8,776
nulls0 (0.0%)
unique2
top_value node
top_rate 0.764
cardinality 2
entropy 0.7883
entropy_ratio 0.7883
Fig 20.
Top values for osm_type.
Show data table
Top values for osm_type (2 unique shown, of 2 total).
valuecountshare
node670576.4%
way207123.6%

How to cite

click to copy

BibTeX
@misc{saturn-quirky-geothermal-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky geothermal},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-geothermal}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky geothermal. Source: /home/coolhand/html/datavis/data_trove/data/quirky/geothermal.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-geothermal