saturn·

data trove country centroids

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/geographic/country_centroids.json

Saturn profiled 7,124 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/geographic/country_centroids.json",
    "--findings", "data-trove-country-centroids.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset contains 7,124 geographic coordinate records, likely representing country or administrative centroid points sourced from Natural Earth 1:10m Admin 0 Label Points. The most striking issue is that all categorical attribute columns — including name, continent, iso_a2, iso_a3, region_un, and subregion — contain only empty strings, meaning the dataset is essentially stripped of its descriptive metadata and only the raw coordinates remain usable. The latitude values range from -83.1 to 83.2 with a mean around 22.9°, suggesting a moderate northern hemisphere bias, while longitude spans nearly the full global range (-180 to 180) with no outliers. Before any analysis, the empty categorical fields need to be investigated and repopulated, as the dataset in its current form cannot be used to answer any country- or region-level questions.

citing: row_count · column_count · columns[latitude].stats.min · columns[latitude].stats.max · columns[latitude].stats.mean · columns[latitude].stats.skew · columns[longitude].stats.min · columns[longitude].stats.max · columns[longitude].stats.outlier_rate · columns[source].stats.top_value · columns[name].stats.top_rate · columns[continent].stats.cardinality

Out[4]:

saturn.schema() · 10 columns

column kind n null% unique alerts
iso_a2 categorical 7,124 0.0% 1 imbalance
iso_a3 categorical 7,124 0.0% 1 imbalance
name categorical 7,124 0.0% 1 imbalance
name_long categorical 7,124 0.0% 1 imbalance
continent categorical 7,124 0.0% 1 imbalance
region_un categorical 7,124 0.0% 1 imbalance
subregion categorical 7,124 0.0% 1 imbalance
longitude numeric 7,124 0.0% 7,124
latitude numeric 7,124 0.0% 7,124
source categorical 7,124 0.0% 1 imbalance
Fig 1.
latitude · Look for the northern hemisphere skew — most centroids cluster above the equator with a thin tail toward the south pole.
Show data table
Histogram bins for latitude (median: 25.195900167159152).
bincount
-83.05 – -78.919
-78.9 – -74.7426
-74.74 – -70.5838
-70.58 – -66.4342
-66.43 – -62.2751
-62.27 – -58.1110
-58.11 – -53.9545
-53.95 – -49.877
-49.8 – -45.6472
-45.64 – -41.4862
-41.48 – -37.3230
-37.32 – -33.1719
-33.17 – -29.0110
-29.01 – -24.8523
-24.85 – -20.785
-20.7 – -16.54140
-16.54 – -12.38169
-12.38 – -8.224238
-8.224 – -4.067246
-4.067 – 0.09051250
0.09051 – 4.248293
4.248 – 8.405321
8.405 – 12.56381
12.56 – 16.72302
16.72 – 20.88179
20.88 – 25.03400
25.03 – 29.19489
29.19 – 33.35214
33.35 – 37.51388
37.51 – 41.66263
41.66 – 45.82163
45.82 – 49.98142
49.98 – 54.13225
54.13 – 58.29275
58.29 – 62.45502
62.45 – 66.61441
66.61 – 70.76222
70.76 – 74.9288
74.92 – 79.08105
79.08 – 83.2479
Fig 2.
longitude · Longitude is broadly spread across the full -180 to 180 range with no outliers, reflecting global coverage.
Show data table
Histogram bins for longitude (median: 23.477727818184434).
bincount
-180 – -171112
-171 – -162105
-162 – -15387
-153 – -144109
-144 – -13555
-135 – -126113
-126 – -11795
-117 – -10891
-108 – -98.9836
-98.98 – -89.9882
-89.98 – -80.98350
-80.98 – -71.98554
-71.98 – -62.98253
-62.98 – -53.98202
-53.98 – -44.9983
-44.99 – -35.9949
-35.99 – -26.9930
-26.99 – -17.9966
-17.99 – -8.98975
-8.989 – 0.01045126
0.01045 – 9.01130
9.01 – 18.01299
18.01 – 27.01632
27.01 – 36.01131
36.01 – 45.0188
45.01 – 54.0190
54.01 – 63306
63 – 7248
72 – 81226
81 – 9028
90 – 99168
99 – 108256
108 – 117190
117 – 126604
126 – 135535
135 – 144170
144 – 153185
153 – 162115
162 – 171149
171 – 180101
Fig 3.
source · All 7,124 rows share a single source value, confirming this is a uniform single-origin dataset with no provenance variety.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Natural Earth 1:10m Admin 0 Label Points7124100.0%
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
iso_a2categorical0.0%
iso_a3categorical0.0%
namecategorical0.0%
name_longcategorical0.0%
continentcategorical0.0%
region_uncategorical0.0%
subregioncategorical0.0%
longitudenumeric0.0%
latitudenumeric0.0%
sourcecategorical0.0%
Fig 5.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
longitudelatitude
longitude+1.00-0.10
latitude-0.10+1.00

iso_a2 categorical other

This column is an ISO 3166-1 alpha-2 country code field, but it contains exactly one distinct value across all 7,124 rows: an empty string. Every single record has a blank code, making the column entirely uninformative. With entropy of 0.0 and top_rate of 1.0, this column carries zero signal and is effectively a dead field in this dataset.

Treatment: Drop entirely; the column is a constant empty string across all 7,124 rows and provides no analytical value.

anthropic:default · confidence high
Out[11]:

saturn.columns["iso_a2"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 6.
Top values for iso_a2.
Show data table
Top values for iso_a2 (1 unique shown, of 1 total).
valuecountshare
7124100.0%

iso_a3 categorical feature

This column is intended to hold ISO 3166-1 alpha-3 country codes but contains exclusively empty strings across all 7,124 rows — cardinality of 1, top_rate of 1.0, and zero nulls. The column is entirely unpopulated (empty string rather than NULL), making it informationally void despite having no technical missing values. This is a data pipeline or extraction failure: the field exists but was never filled.

Treatment: Drop this column; it carries zero information (entropy = 0.0) and would need to be re-sourced from upstream before any use.

anthropic:default · confidence high
Out[14]:

saturn.columns["iso_a3"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 7.
Top values for iso_a3.
Show data table
Top values for iso_a3 (1 unique shown, of 1 total).
valuecountshare
7124100.0%

name categorical other

This column, labelled 'name', is a categorical field that is entirely empty strings across all 7,124 rows — a single unique value of '' with a top_rate of 1.0 and null_rate of 0.0. No actual name data is present; the column has been populated with blank strings rather than nulls, masking what would otherwise appear as 100% missing. With entropy of 0.0 and cardinality of 1, it carries zero information.

Treatment: Drop entirely — zero variance, zero information content; investigate upstream pipeline for why nulls were coerced to empty strings.

anthropic:default · confidence high
Out[17]:

saturn.columns["name"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 8.
Top values for name.
Show data table
Top values for name (1 unique shown, of 1 total).
valuecountshare
7124100.0%

name_long categorical other

This column, ostensibly a long-form name field, contains exactly one unique value across all 7,124 rows: an empty string. With a null_rate of 0.0 and top_rate of 1.0, every single record holds an empty string rather than a true null, meaning the field was populated with blanks rather than left absent. The column carries zero informational content (entropy = 0.0) and is entirely useless for analysis in its current state.

Treatment: Drop column; it is a constant empty-string field with no variance or analytical value.

anthropic:default · confidence high
Out[20]:

saturn.columns["name_long"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 9.
Top values for name_long.
Show data table
Top values for name_long (1 unique shown, of 1 total).
valuecountshare
7124100.0%

continent categorical label

This column is intended to represent a geographic continent label, but it contains exactly one distinct value — an empty string — across all 7,124 rows with no nulls. The column carries zero information entropy (entropy = 0.0, top_rate = 1.0), meaning every record has been filled with a blank string rather than a real value or a proper null. This is a data quality failure: the field appears to have been populated with empty strings instead of being left null or populated correctly.

Treatment: Drop or remediate before modelling — the column is a constant empty string and provides no signal; investigate ETL pipeline for the source of blank-string imputation.

anthropic:default · confidence high
Out[23]:

saturn.columns["continent"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 10.
Top values for continent.
Show data table
Top values for continent (1 unique shown, of 1 total).
valuecountshare
7124100.0%

region_un categorical label

This column is intended to store a UN macro-region label but contains only empty strings across all 7,124 rows — cardinality is 1, top_rate is 1.0, and entropy is 0.0. It carries zero informational value in its current state. This is almost certainly an unpopulated or failed data extraction field rather than a legitimately uniform dataset.

Treatment: Drop this column entirely; it is constant-empty and contributes no signal to any downstream task.

anthropic:default · confidence high
Out[26]:

saturn.columns["region_un"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 11.
Top values for region_un.
Show data table
Top values for region_un (1 unique shown, of 1 total).
valuecountshare
7124100.0%

subregion categorical other

This column represents a geographic subregion field, but it contains exactly one distinct value across all 7,124 rows: an empty string. With a top_rate of 1.0 and entropy of 0.0, the column is entirely unpopulated — every record is blank, not null. This is a completely degenerate column with zero informational content.

Treatment: Drop entirely — zero variance, all values are empty strings with no predictive or descriptive value.

anthropic:default · confidence high
Out[29]:

saturn.columns["subregion"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 12.
Top values for subregion.
Show data table
Top values for subregion (1 unique shown, of 1 total).
valuecountshare
7124100.0%

longitude numeric feature

This column contains geographic longitude coordinates, with values spanning the full valid range from approximately -179.97° to 179.99°, indicating near-global coverage. All 7,124 rows are unique and non-null, consistent with precise GPS or geocoded point locations. The IQR of 191.8° is notably wide — Q1 at -72.3° and Q3 at 119.5° — confirming records are spread across both the Western and Eastern hemispheres rather than concentrated in any single region. The near-zero skew (-0.27) and platykurtic distribution (kurtosis -1.13) suggest a fairly flat, broadly distributed spread of locations around the globe.

Treatment: Pair with latitude for spatial analysis; consider geohash or H3 encoding for ML features, or use directly in distance calculations.

anthropic:default · confidence high
Out[32]:

saturn.columns["longitude"].stats

statvalue
n7,124
nulls0 (0.0%)
unique7,124
min -180
max 180
mean 21.9
median 23.48
std 97.72
q1 -72.33
q3 119.5
iqr 191.8
skew -0.267
kurtosis -1.131
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 13.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: 23.477727818184434).
bincount
-180 – -171112
-171 – -162105
-162 – -15387
-153 – -144109
-144 – -13555
-135 – -126113
-126 – -11795
-117 – -10891
-108 – -98.9836
-98.98 – -89.9882
-89.98 – -80.98350
-80.98 – -71.98554
-71.98 – -62.98253
-62.98 – -53.98202
-53.98 – -44.9983
-44.99 – -35.9949
-35.99 – -26.9930
-26.99 – -17.9966
-17.99 – -8.98975
-8.989 – 0.01045126
0.01045 – 9.01130
9.01 – 18.01299
18.01 – 27.01632
27.01 – 36.01131
36.01 – 45.0188
45.01 – 54.0190
54.01 – 63306
63 – 7248
72 – 81226
81 – 9028
90 – 99168
99 – 108256
108 – 117190
117 – 126604
126 – 135535
135 – 144170
144 – 153185
153 – 162115
162 – 171149
171 – 180101

latitude numeric feature

This column contains geographic latitude values, with every one of 7,124 rows being unique and no nulls, consistent with precise coordinate data. The range spans -83.05 to 83.23 degrees, covering nearly the full global latitude range, with a mean of 22.92 and median of 25.20 suggesting a modest concentration in the Northern Hemisphere tropics/subtropics. The IQR of 51.96 (Q1 ≈ 1.15, Q3 ≈ 53.11) confirms wide global spread, and the mild negative skew (-0.60) indicates a slight tail toward southern latitudes. Only 35 outliers (0.49%) were flagged, likely high-latitude locations near the poles.

Treatment: Use as-is or pair with longitude for geospatial modelling; consider binning into latitude bands or projecting to spatial features.

anthropic:default · confidence high
Out[35]:

saturn.columns["latitude"].stats

statvalue
n7,124
nulls0 (0.0%)
unique7,124
min -83.05
max 83.24
mean 22.92
median 25.2
std 34.23
q1 1.149
q3 53.11
iqr 51.96
skew -0.6007
kurtosis 0.1113
n_outliers 35
outlier_rate 0.004913
zero_rate 0
Fig 14.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 25.195900167159152).
bincount
-83.05 – -78.919
-78.9 – -74.7426
-74.74 – -70.5838
-70.58 – -66.4342
-66.43 – -62.2751
-62.27 – -58.1110
-58.11 – -53.9545
-53.95 – -49.877
-49.8 – -45.6472
-45.64 – -41.4862
-41.48 – -37.3230
-37.32 – -33.1719
-33.17 – -29.0110
-29.01 – -24.8523
-24.85 – -20.785
-20.7 – -16.54140
-16.54 – -12.38169
-12.38 – -8.224238
-8.224 – -4.067246
-4.067 – 0.09051250
0.09051 – 4.248293
4.248 – 8.405321
8.405 – 12.56381
12.56 – 16.72302
16.72 – 20.88179
20.88 – 25.03400
25.03 – 29.19489
29.19 – 33.35214
33.35 – 37.51388
37.51 – 41.66263
41.66 – 45.82163
45.82 – 49.98142
49.98 – 54.13225
54.13 – 58.29275
58.29 – 62.45502
62.45 – 66.61441
66.61 – 70.76222
70.76 – 74.9288
74.92 – 79.08105
79.08 – 83.2479

source categorical metadata

This column records the data source attribution for every row, and all 7,124 records carry the identical value 'Natural Earth 1:10m Admin 0 Label Points'. With cardinality of 1, entropy of 0.0, and a top_rate of 1.0, the column carries zero information variance — it is purely a provenance/metadata tag indicating the dataset was sourced entirely from a single Natural Earth layer.

Treatment: Drop before modelling; constant column adds no predictive signal and wastes memory.

anthropic:default · confidence high
Out[38]:

saturn.columns["source"].stats

statvalue
n7,124
nulls0 (0.0%)
unique1
top_value Natural Earth 1:10m Admin 0 Label Points
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for source.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Natural Earth 1:10m Admin 0 Label Points7124100.0%

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-country-centroids-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove country centroids},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-country-centroids}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove country centroids. Source: /home/coolhand/html/datavis/data_trove/geographic/country_centroids.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-country-centroids