saturn·

data trove wine varieties regions

source /home/coolhand/html/datavis/data_trove/data/quirky/wine_by_country.json 62 rows 2 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · medium confidence anthropic:default

This dataset lists wine production (or a related wine metric) aggregated by country, covering 62 countries each with an associated count. The count distribution is extremely skewed: the median is just 2, yet the mean is nearly 19 and the maximum reaches 476, with 10 flagged outliers — suggesting a small handful of countries dominate the wine landscape entirely. France tops the list and is worth examining alongside the other high-count outliers to understand which countries drive the bulk of the totals.

citing: mean · median · max · n_outliers · outlier_rate · skew · kurtosis · top_value · row_count

Schema

2 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name categorical 0.0% 62
long_tail
count numeric 0.0% 21
high_skew outliers

name

categorical label long_tail
This column contains country names, with 62 unique values across 62 rows and a null rate of 0.0 — every row holds a distinct country. The entropy ratio of ~1.0 and top_rate of 0.016 confirm perfect cardinality: each country appears exactly once, making this effectively a unique identifier rather than a grouping variable. The 'long_tail' alert is technically triggered but is a non-issue here since all values are equally frequent. Treatment: Use as a row label or join key; not useful as a categorical feature without additional rows per country. high · anthropic:default
n
62
nulls
0 (0.0%)
unique
62
top_value
France
top_rate
0.01613
cardinality
62
entropy
5.954
entropy_ratio
1

count

numeric feature high_skew outliers
This column appears to be a frequency or occurrence count, likely representing how many times an event or item appears in some grouping. The distribution is extremely right-skewed (skew=6.26, kurtosis=41.80): the median is just 2.0 while the mean is 18.94 and the maximum reaches 476.0, indicating a handful of dominant entries vastly outnumber the rest. With only 21 unique values across 62 rows and 10 outliers (16.1% of records), a small number of high-count observations are pulling the distribution heavily. The IQR of 7.25 (Q1=1.0, Q3=8.25) confirms that 75% of values sit at or below 8, making the max of 476 a stark anomaly. Treatment: Log-transform (log1p) before modelling to reduce extreme skew; investigate the 10 outliers, particularly the max of 476, for data integrity. high · anthropic:default
n
62
nulls
0 (0.0%)
unique
21
min
1
max
476
mean
18.94
median
2
std
63.53
q1
1
q3
8.25
iqr
7.25
skew
6.255
kurtosis
41.8
n_outliers
10
outlier_rate
0.1613
zero_rate
0