data trove wine varieties regions
Reading
This dataset lists wine production (or a related wine metric) aggregated by country, covering 62 countries each with an associated count. The count distribution is extremely skewed: the median is just 2, yet the mean is nearly 19 and the maximum reaches 476, with 10 flagged outliers — suggesting a small handful of countries dominate the wine landscape entirely. France tops the list and is worth examining alongside the other high-count outliers to understand which countries drive the bulk of the totals.
citing: mean · median · max · n_outliers · outlier_rate · skew · kurtosis · top_value · row_count
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1 – 68.86 | 59 |
| 68.86 – 136.7 | 2 |
| 136.7 – 204.6 | 0 |
| 204.6 – 272.4 | 0 |
| 272.4 – 340.3 | 0 |
| 340.3 – 408.1 | 0 |
| 408.1 – 476 | 1 |
Show data table
| value | count | share |
|---|---|---|
| France | 1 | 1.6% |
| United Kingdom | 1 | 1.6% |
| Germany | 1 | 1.6% |
| Spain | 1 | 1.6% |
| Belgium | 1 | 1.6% |
| Italy | 1 | 1.6% |
| United States | 1 | 1.6% |
| Switzerland | 1 | 1.6% |
| Australia | 1 | 1.6% |
| Bolivia | 1 | 1.6% |
| Austria | 1 | 1.6% |
| Croatia | 1 | 1.6% |
| Canada | 1 | 1.6% |
| Portugal | 1 | 1.6% |
| Poland | 1 | 1.6% |
| Netherlands | 1 | 1.6% |
| Ireland | 1 | 1.6% |
| Romania | 1 | 1.6% |
| Argentina | 1 | 1.6% |
| Unknown | 1 | 1.6% |
Show data table
| bin | count |
|---|---|
| 1 – 68.86 | 59 |
| 68.86 – 136.7 | 2 |
| 136.7 – 204.6 | 0 |
| 204.6 – 272.4 | 0 |
| 272.4 – 340.3 | 0 |
| 340.3 – 408.1 | 0 |
| 408.1 – 476 | 1 |
Schema
2 columns| Alerts | ||||
|---|---|---|---|---|
| name | categorical | 0.0% | 62 |
long_tail
|
| count | numeric | 0.0% | 21 |
high_skew
outliers
|
name
categorical label long_tailThis column contains country names, with 62 unique values across 62 rows and a null rate of 0.0 — every row holds a distinct country. The entropy ratio of ~1.0 and top_rate of 0.016 confirm perfect cardinality: each country appears exactly once, making this effectively a unique identifier rather than a grouping variable. The 'long_tail' alert is technically triggered but is a non-issue here since all values are equally frequent. Treatment: Use as a row label or join key; not useful as a categorical feature without additional rows per country.
- n
- 62
- nulls
- 0 (0.0%)
- unique
- 62
- top_value
- France
- top_rate
- 0.01613
- cardinality
- 62
- entropy
- 5.954
- entropy_ratio
- 1
count
numeric feature high_skew outliersThis column appears to be a frequency or occurrence count, likely representing how many times an event or item appears in some grouping. The distribution is extremely right-skewed (skew=6.26, kurtosis=41.80): the median is just 2.0 while the mean is 18.94 and the maximum reaches 476.0, indicating a handful of dominant entries vastly outnumber the rest. With only 21 unique values across 62 rows and 10 outliers (16.1% of records), a small number of high-count observations are pulling the distribution heavily. The IQR of 7.25 (Q1=1.0, Q3=8.25) confirms that 75% of values sit at or below 8, making the max of 476 a stark anomaly. Treatment: Log-transform (log1p) before modelling to reduce extreme skew; investigate the 10 outliers, particularly the max of 476, for data integrity.
- n
- 62
- nulls
- 0 (0.0%)
- unique
- 21
- min
- 1
- max
- 476
- mean
- 18.94
- median
- 2
- std
- 63.53
- q1
- 1
- q3
- 8.25
- iqr
- 7.25
- skew
- 6.255
- kurtosis
- 41.8
- n_outliers
- 10
- outlier_rate
- 0.1613
- zero_rate
- 0