data trove wine varieties regions

source /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json 62 rows 2 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · medium confidence anthropic:default

This dataset lists wine production (or a related wine metric) aggregated by country, covering 62 countries each with an associated count. The count distribution is extremely skewed: the median is just 2, yet the mean is nearly 19 and the maximum reaches 476, with 10 flagged outliers — suggesting a small handful of countries dominate the wine landscape entirely. France tops the list and is worth examining alongside the other high-count outliers to understand which countries drive the bulk of the totals.

citing: mean · median · max · n_outliers · outlier_rate · skew · kurtosis · top_value · row_count

Charts the summary said to look at first

count · Look for the extreme right tail — a tiny number of countries account for vastly disproportionate counts while most cluster near zero.

Show data table

Histogram bins for count (median: 2.0).
bin	count
1 – 68.86	59
68.86 – 136.7	2
136.7 – 204.6	0
204.6 – 272.4	0
272.4 – 340.3	0
340.3 – 408.1	0
408.1 – 476	1

name · Ranked by their count values, this reveals which countries dominate and how steeply the distribution falls off.

Show data table

Top values for name (20 unique shown, of 62 total).
value	count	share
France	1	1.6%
United Kingdom	1	1.6%
Germany	1	1.6%
Spain	1	1.6%
Belgium	1	1.6%
Italy	1	1.6%
United States	1	1.6%
Switzerland	1	1.6%
Australia	1	1.6%
Bolivia	1	1.6%
Austria	1	1.6%
Croatia	1	1.6%
Canada	1	1.6%
Portugal	1	1.6%
Poland	1	1.6%
Netherlands	1	1.6%
Ireland	1	1.6%
Romania	1	1.6%
Argentina	1	1.6%
Unknown	1	1.6%

count · Shows what share of the total count the top outlier countries hold versus the long tail of smaller producers.

Show data table

Histogram bins for count (median: 2.0).
bin	count
1 – 68.86	59
68.86 – 136.7	2
136.7 – 204.6	0
204.6 – 272.4	0
272.4 – 340.3	0
340.3 – 408.1	0
408.1 – 476	1

Schema

2 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
name	categorical	0.0%	62	long_tail
count	numeric	0.0%	21	high_skew outliers

name

categorical label long_tail

This column contains country names, with 62 unique values across 62 rows and a null rate of 0.0 — every row holds a distinct country. The entropy ratio of ~1.0 and top_rate of 0.016 confirm perfect cardinality: each country appears exactly once, making this effectively a unique identifier rather than a grouping variable. The 'long_tail' alert is technically triggered but is a non-issue here since all values are equally frequent. Treatment: Use as a row label or join key; not useful as a categorical feature without additional rows per country. high · anthropic:default

n: 62
nulls: 0 (0.0%)
unique: 62
top_value: France
top_rate: 0.01613
cardinality: 62
entropy: 5.954
entropy_ratio: 1

count

numeric feature high_skew outliers

This column appears to be a frequency or occurrence count, likely representing how many times an event or item appears in some grouping. The distribution is extremely right-skewed (skew=6.26, kurtosis=41.80): the median is just 2.0 while the mean is 18.94 and the maximum reaches 476.0, indicating a handful of dominant entries vastly outnumber the rest. With only 21 unique values across 62 rows and 10 outliers (16.1% of records), a small number of high-count observations are pulling the distribution heavily. The IQR of 7.25 (Q1=1.0, Q3=8.25) confirms that 75% of values sit at or below 8, making the max of 476 a stark anomaly. Treatment: Log-transform (log1p) before modelling to reduce extreme skew; investigate the 10 outliers, particularly the max of 476, for data integrity. high · anthropic:default

n: 62
nulls: 0 (0.0%)
unique: 21
min: 1
max: 476
mean: 18.94
median: 2
std: 63.53
q1: 1
q3: 8.25
iqr: 7.25
skew: 6.255
kurtosis: 41.8
n_outliers: 10
outlier_rate: 0.1613
zero_rate: 0