saturn·

data trove wine varieties regions

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json

Saturn profiled 62 rows across 2 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json",
    "--findings", "data-trove-wine-varieties-regions.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This dataset lists wine production (or a related wine metric) aggregated by country, covering 62 countries each with an associated count. The count distribution is extremely skewed: the median is just 2, yet the mean is nearly 19 and the maximum reaches 476, with 10 flagged outliers — suggesting a small handful of countries dominate the wine landscape entirely. France tops the list and is worth examining alongside the other high-count outliers to understand which countries drive the bulk of the totals.

citing: mean · median · max · n_outliers · outlier_rate · skew · kurtosis · top_value · row_count

Out[4]:

saturn.schema() · 2 columns

column kind n null% unique alerts
name categorical 62 0.0% 62 long_tail
count numeric 62 0.0% 21 high_skew outliers
Fig 1.
count · Look for the extreme right tail — a tiny number of countries account for vastly disproportionate counts while most cluster near zero.
Show data table
Histogram bins for count (median: 2.0).
bincount
1 – 68.8659
68.86 – 136.72
136.7 – 204.60
204.6 – 272.40
272.4 – 340.30
340.3 – 408.10
408.1 – 4761
Fig 2.
name · Ranked by their count values, this reveals which countries dominate and how steeply the distribution falls off.
Show data table
Top values for name (20 unique shown, of 62 total).
valuecountshare
France11.6%
United Kingdom11.6%
Germany11.6%
Spain11.6%
Belgium11.6%
Italy11.6%
United States11.6%
Switzerland11.6%
Australia11.6%
Bolivia11.6%
Austria11.6%
Croatia11.6%
Canada11.6%
Portugal11.6%
Poland11.6%
Netherlands11.6%
Ireland11.6%
Romania11.6%
Argentina11.6%
Unknown11.6%
Fig 3.
count · Shows what share of the total count the top outlier countries hold versus the long tail of smaller producers.
Show data table
Histogram bins for count (median: 2.0).
bincount
1 – 68.8659
68.86 – 136.72
136.7 – 204.60
204.6 – 272.40
272.4 – 340.30
340.3 – 408.10
408.1 – 4761
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
namecategorical0.0%
countnumeric0.0%

name categorical label

This column contains country names, with 62 unique values across 62 rows and a null rate of 0.0 — every row holds a distinct country. The entropy ratio of ~1.0 and top_rate of 0.016 confirm perfect cardinality: each country appears exactly once, making this effectively a unique identifier rather than a grouping variable. The 'long_tail' alert is technically triggered but is a non-issue here since all values are equally frequent.

Treatment: Use as a row label or join key; not useful as a categorical feature without additional rows per country.

anthropic:default · confidence high
Out[10]:

saturn.columns["name"].stats

statvalue
n62
nulls0 (0.0%)
unique62
top_value France
top_rate 0.01613
cardinality 62
entropy 5.954
entropy_ratio 1
alert: long_tail62 singleton categories
Fig 5.
Top values for name.
Show data table
Top values for name (20 unique shown, of 62 total).
valuecountshare
France11.6%
United Kingdom11.6%
Germany11.6%
Spain11.6%
Belgium11.6%
Italy11.6%
United States11.6%
Switzerland11.6%
Australia11.6%
Bolivia11.6%
Austria11.6%
Croatia11.6%
Canada11.6%
Portugal11.6%
Poland11.6%
Netherlands11.6%
Ireland11.6%
Romania11.6%
Argentina11.6%
Unknown11.6%

count numeric feature

This column appears to be a frequency or occurrence count, likely representing how many times an event or item appears in some grouping. The distribution is extremely right-skewed (skew=6.26, kurtosis=41.80): the median is just 2.0 while the mean is 18.94 and the maximum reaches 476.0, indicating a handful of dominant entries vastly outnumber the rest. With only 21 unique values across 62 rows and 10 outliers (16.1% of records), a small number of high-count observations are pulling the distribution heavily. The IQR of 7.25 (Q1=1.0, Q3=8.25) confirms that 75% of values sit at or below 8, making the max of 476 a stark anomaly.

Treatment: Log-transform (log1p) before modelling to reduce extreme skew; investigate the 10 outliers, particularly the max of 476, for data integrity.

anthropic:default · confidence high
Out[13]:

saturn.columns["count"].stats

statvalue
n62
nulls0 (0.0%)
unique21
min 1
max 476
mean 18.94
median 2
std 63.53
q1 1
q3 8.25
iqr 7.25
skew 6.255
kurtosis 41.8
n_outliers 10
outlier_rate 0.1613
zero_rate 0
alert: high_skewskew=+6.26
alert: outliers16.1% rows beyond 1.5 IQR
Fig 6.
Distribution of count. Vertical dash marks the median.
Show data table
Histogram bins for count (median: 2.0).
bincount
1 – 68.8659
68.86 – 136.72
136.7 – 204.60
204.6 – 272.40
272.4 – 340.30
340.3 – 408.10
408.1 – 4761

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-wine-varieties-regions-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove wine varieties regions},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-wine-varieties-regions}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove wine varieties regions. Source: /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-wine-varieties-regions