saturn

/home/coolhand/html/datavis/data_trove/data/geographic/waterfalls/waterfalls_worldwide.json 80,678 rows sample n=80,678 seed 42 2026-05-01T23:18:56+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/geographic/waterfalls/waterfalls_worldwide.json
Total rows80,678
Profiled sample80,678
Columns9
Generated2026-05-01T23:18:56+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset catalogues 80,678 waterfalls worldwide, sourced entirely from OpenStreetMap with latitude/longitude coordinates and minimal descriptive metadata. The most striking feature is how sparse the descriptive fields are: 'category' and 'source' are constant, 'date' and 'country' are essentially empty (country is blank for 80,650 of 80,678 rows), and 89.9% of 'description' entries are simply 'Waterfall'. The 'name' field is similarly thin — 'Unnamed Waterfall' accounts for 48,168 rows and the duplicate rate is 65.7%. The real analytical signal lives in the geographic coordinates, where latitude skews toward the northern hemisphere (median 40.3) and longitude spans the full globe, making this primarily a spatial dataset rather than an attribute-rich one.

latitude high anthropic:claude-opus-4-7

Geographic latitude coordinates spanning -77.72 to 78.66, covering nearly the full habitable range of Earth. The distribution is left-skewed (-0.94) with a median of 40.31 sitting well above the mean of 27.15, suggesting a concentration of records in the Northern Hemisphere with a long tail reaching toward Antarctica. Near-uniqueness (80,650 distinct of 80,678) and zero nulls indicate clean, granular point data.

longitude high anthropic:claude-opus-4-7

This is a longitude coordinate column spanning the full global range from -179.99 to 179.41, with 80,650 unique values across 80,678 rows and no nulls. The distribution is broad (std 76.86, IQR 100.27) and only mildly skewed (0.29), with the median at 7.80 sitting east of the prime meridian, hinting at a Europe/Africa-leaning sample. No outliers were flagged, consistent with values bounded by valid geographic limits.

name high anthropic:claude-opus-4-7

This is the human-readable name of a waterfall, averaging 2 words and 16 characters. The column is dominated by the placeholder 'Unnamed Waterfall' (48,168 of 80,678 rows), driving a 65.7% duplicate rate; multiple languages appear in the vocabulary (cachoeira, cascada, cascata, salto, fossen) alongside English 'falls'.

description high anthropic:claude-opus-4-7

This is a categorical descriptor column, overwhelmingly dominated by the value "Waterfall" which accounts for 72,565 of 80,678 rows (top_rate 0.899). The remaining 774 categories appear to be variants annotated with heights (e.g. "Waterfall, 3m", "Waterfall, 5m"), suggesting a structured suffix pattern rather than free text. Entropy is very low (1.14, ratio 0.119) and the long_tail alert fires, so most signal collapses into one label.

category high anthropic:claude-opus-4-7

This column is a constant categorical tag, holding the literal value "usgs_waterfalls" for all 80678 rows. With cardinality 1, entropy 0, and a top_rate of 1.0, it carries no information and likely just records the source dataset or ingestion batch.

date high anthropic:claude-opus-4-7

This column is named 'date' but contains a single value—an empty string—across all 80678 rows. Cardinality is 1, top_rate is 1.0, and entropy is 0.0, so the field carries no information. It looks like a date field that was never populated.

country high anthropic:claude-opus-4-7

This is an ISO country code field that is effectively empty: 80650 of 80678 rows (top_rate 0.9996) hold the blank string, leaving only 28 actual codes spread across VE (24), and one each for DE, LB, HN, and BR. Entropy is 0.0048 (entropy_ratio 0.0019), so the column carries almost no information despite having no nulls. The non-blank values look plausible but are far too sparse to support segmentation or modelling.

height high anthropic:claude-opus-4-7

A nominally numeric height field stored as strings, but 89.9% of the 80,678 rows are empty and the remaining values spread across 775 distinct tokens with very low entropy ratio (0.119). The populated values look like small integers (3, 2, 5, 10...) with no obvious unit, suggesting inconsistent or truncated entry rather than a clean measurement.

source high anthropic:claude-opus-4-7

This column records the data provenance, with every one of the 80678 rows tagged as "OpenStreetMap". Cardinality is 1 and entropy is 0, so the field carries no information for modelling or filtering.

Numeric correlation

latitude numeric

rows80,678
null0 (0.0%)
unique80,650
min-77.722
max78.664
mean27.148
median40.312
std30.045
q19.657
q347.477
iqr37.820
skew-0.936
kurtosis-0.283
n_outliers298
outlier_rate3.69e-03
zero_rate0.000

longitude numeric

rows80,678
null0 (0.0%)
unique80,650
min-179.991
max179.412
mean0.963
median7.803
std76.859
q1-61.708
q338.561
iqr100.269
skew0.287
kurtosis-0.412
n_outliers0
outlier_rate0.000
zero_rate0.000

name text

65.7% duplicate strings
rows80,678
null0 (0.0%)
unique27,697
len_min1
len_max67
len_mean16.121
len_median17.000
len_p9521.000
word_mean2.091
word_median2.000
n_empty0
n_duplicates52,981
duplicate_rate0.657
vocab_size8,093
readability_flesch_mean17.609
emoji_rate1.24e-05
url_rate0.000
one_word_rate0.111
allcaps_rate0.035
boilerplate_rate0.000
Sample values (first 10)
  1. Strømslifossen
  2. Unnamed Waterfall
  3. Rauðfossar
  4. Unnamed Waterfall
  5. Little Niagara Falls
  6. Unnamed Waterfall
  7. Price’s Falls
  8. Unnamed Waterfall
  9. 彌東飛瀑
  10. Cascada de Arriba

description categorical

403 singleton categories
rows80,678
null0 (0.0%)
unique775
top_valueWaterfall
top_rate0.899
cardinality775
entropy1.140
entropy_ratio0.119
Top values (rank 1–20)
  1. Waterfall — 72,565
  2. Waterfall, 3m — 551
  3. Waterfall, 2m — 520
  4. Waterfall, 5m — 460
  5. Waterfall, 10m — 426
  6. Waterfall, 4m — 423
  7. Waterfall, 1m — 358
  8. Waterfall, 6m — 329
  9. Waterfall, 20m — 298
  10. Waterfall, 15m — 257
  11. Waterfall, 8m — 240
  12. Waterfall, 7m — 214
  13. Waterfall, 30m — 170
  14. Waterfall, 12m — 159
  15. Waterfall, 25m — 125
  16. Waterfall, 40m — 114
  17. Waterfall, 1.5m — 103
  18. Waterfall, 50m — 79
  19. Waterfall, 9m — 79
  20. Waterfall, 60m — 74

category categorical

top value is 100.0% of rows
rows80,678
null0 (0.0%)
unique1
top_valueusgs_waterfalls
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. usgs_waterfalls — 80,678

date categorical

top value is 100.0% of rows
rows80,678
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. — 80,678

country categorical

4 singleton categories top value is 100.0% of rows
rows80,678
null0 (0.0%)
unique6
top_value
top_rate1.000
cardinality6
entropy4.79e-03
entropy_ratio1.85e-03
Top values (rank 1–20)
  1. — 80,650
  2. VE — 24
  3. DE — 1
  4. LB — 1
  5. HN — 1
  6. BR — 1

height categorical

403 singleton categories
rows80,678
null0 (0.0%)
unique775
top_value
top_rate0.899
cardinality775
entropy1.140
entropy_ratio0.119
Top values (rank 1–20)
  1. — 72,565
  2. 3 — 551
  3. 2 — 520
  4. 5 — 460
  5. 10 — 426
  6. 4 — 423
  7. 1 — 358
  8. 6 — 329
  9. 20 — 298
  10. 15 — 257
  11. 8 — 240
  12. 7 — 214
  13. 30 — 170
  14. 12 — 159
  15. 25 — 125
  16. 40 — 114
  17. 1.5 — 103
  18. 50 — 79
  19. 9 — 79
  20. 60 — 74

source categorical

top value is 100.0% of rows
rows80,678
null0 (0.0%)
unique1
top_valueOpenStreetMap
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. OpenStreetMap — 80,678