data trove wine varieties regions

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json

Saturn profiled 62 rows across 2 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json",
    "--findings", "data-trove-wine-varieties-regions.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This dataset lists wine production (or a related wine metric) aggregated by country, covering 62 countries each with an associated count. The count distribution is extremely skewed: the median is just 2, yet the mean is nearly 19 and the maximum reaches 476, with 10 flagged outliers — suggesting a small handful of countries dominate the wine landscape entirely. France tops the list and is worth examining alongside the other high-count outliers to understand which countries drive the bulk of the totals.

citing: mean · median · max · n_outliers · outlier_rate · skew · kurtosis · top_value · row_count

Out[4]:

saturn.schema() · 2 columns

column	kind	n	null%	unique	alerts
name	categorical	62	0.0%	62	long_tail
count	numeric	62	0.0%	21	high_skew outliers

Fig 1.

count · Look for the extreme right tail — a tiny number of countries account for vastly disproportionate counts while most cluster near zero.

Show data table

Histogram bins for count (median: 2.0).
bin	count
1 – 68.86	59
68.86 – 136.7	2
136.7 – 204.6	0
204.6 – 272.4	0
272.4 – 340.3	0
340.3 – 408.1	0
408.1 – 476	1

Fig 2.

name · Ranked by their count values, this reveals which countries dominate and how steeply the distribution falls off.

Show data table

Top values for name (20 unique shown, of 62 total).
value	count	share
France	1	1.6%
United Kingdom	1	1.6%
Germany	1	1.6%
Spain	1	1.6%
Belgium	1	1.6%
Italy	1	1.6%
United States	1	1.6%
Switzerland	1	1.6%
Australia	1	1.6%
Bolivia	1	1.6%
Austria	1	1.6%
Croatia	1	1.6%
Canada	1	1.6%
Portugal	1	1.6%
Poland	1	1.6%
Netherlands	1	1.6%
Ireland	1	1.6%
Romania	1	1.6%
Argentina	1	1.6%
Unknown	1	1.6%

Fig 3.

count · Shows what share of the total count the top outlier countries hold versus the long tail of smaller producers.

Show data table

Histogram bins for count (median: 2.0).
bin	count
1 – 68.86	59
68.86 – 136.7	2
136.7 – 204.6	0
204.6 – 272.4	0
272.4 – 340.3	0
340.3 – 408.1	0
408.1 – 476	1

Fig 4.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
name	categorical	0.0%
count	numeric	0.0%

name categorical label

This column contains country names, with 62 unique values across 62 rows and a null rate of 0.0 — every row holds a distinct country. The entropy ratio of ~1.0 and top_rate of 0.016 confirm perfect cardinality: each country appears exactly once, making this effectively a unique identifier rather than a grouping variable. The 'long_tail' alert is technically triggered but is a non-issue here since all values are equally frequent.

Treatment: Use as a row label or join key; not useful as a categorical feature without additional rows per country.

anthropic:default · confidence high

Out[10]:

saturn.columns["name"].stats

stat	value
n	62
nulls	0 (0.0%)
unique	62
top_value	France
top_rate	0.01613
cardinality	62
entropy	5.954
entropy_ratio	1
alert: long_tail	62 singleton categories

Fig 5.

Top values for name.

Show data table

Top values for name (20 unique shown, of 62 total).
value	count	share
France	1	1.6%
United Kingdom	1	1.6%
Germany	1	1.6%
Spain	1	1.6%
Belgium	1	1.6%
Italy	1	1.6%
United States	1	1.6%
Switzerland	1	1.6%
Australia	1	1.6%
Bolivia	1	1.6%
Austria	1	1.6%
Croatia	1	1.6%
Canada	1	1.6%
Portugal	1	1.6%
Poland	1	1.6%
Netherlands	1	1.6%
Ireland	1	1.6%
Romania	1	1.6%
Argentina	1	1.6%
Unknown	1	1.6%

count numeric feature

This column appears to be a frequency or occurrence count, likely representing how many times an event or item appears in some grouping. The distribution is extremely right-skewed (skew=6.26, kurtosis=41.80): the median is just 2.0 while the mean is 18.94 and the maximum reaches 476.0, indicating a handful of dominant entries vastly outnumber the rest. With only 21 unique values across 62 rows and 10 outliers (16.1% of records), a small number of high-count observations are pulling the distribution heavily. The IQR of 7.25 (Q1=1.0, Q3=8.25) confirms that 75% of values sit at or below 8, making the max of 476 a stark anomaly.

Treatment: Log-transform (log1p) before modelling to reduce extreme skew; investigate the 10 outliers, particularly the max of 476, for data integrity.

anthropic:default · confidence high

Out[13]:

saturn.columns["count"].stats

stat	value
n	62
nulls	0 (0.0%)
unique	21
min	1
max	476
mean	18.94
median	2
std	63.53
q1	1
q3	8.25
iqr	7.25
skew	6.255
kurtosis	41.8
n_outliers	10
outlier_rate	0.1613
zero_rate	0
alert: high_skew	skew=+6.26
alert: outliers	16.1% rows beyond 1.5 IQR

Fig 6.

Distribution of count. Vertical dash marks the median.

Show data table

Histogram bins for count (median: 2.0).
bin	count
1 – 68.86	59
68.86 – 136.7	2
136.7 – 204.6	0
204.6 – 272.4	0
272.4 – 340.3	0
340.3 – 408.1	0
408.1 – 476	1

How to cite

click to copy

BibTeX

@misc{saturn-data-trove-wine-varieties-regions-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove wine varieties regions},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-wine-varieties-regions}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}

APA

Steuber, L. (2026). Saturn reading: data trove wine varieties regions. Source: /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-wine-varieties-regions