olympics olympic medals data
Reading
This dataset contains 1,433 rows of Olympic medal counts by country and year, spanning 1896 to 2024 across 165 countries. Medal columns (gold, silver, bronze, total) are heavily right-skewed with high kurtosis and many outliers — a small number of dominant nations pull the means well above the medians (e.g. total has a median of 5 but a max of 234). Zero-rates are notable too: 33.9% of rows have zero gold medals and 25.3% zero silver, reflecting how often countries leave a Games empty-handed in a category. Country participation is fairly even at the top, with France and Great Britain tied as most-frequent entries (30 appearances each). Start by examining the shape of `total` and `gold` distributions and the `year` coverage to understand era effects.
citing: row_count · column_count · columns.total.stats · columns.gold.stats · columns.silver.stats · columns.bronze.stats · columns.country.top_values · columns.year.stats
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1 – 7.297 | 907 |
| 7.297 – 13.59 | 176 |
| 13.59 – 19.89 | 99 |
| 19.89 – 26.19 | 77 |
| 26.19 – 32.49 | 42 |
| 32.49 – 38.78 | 28 |
| 38.78 – 45.08 | 20 |
| 45.08 – 51.38 | 11 |
| 51.38 – 57.68 | 7 |
| 57.68 – 63.97 | 7 |
| 63.97 – 70.27 | 12 |
| 70.27 – 76.57 | 5 |
| 76.57 – 82.86 | 1 |
| 82.86 – 89.16 | 3 |
| 89.16 – 95.46 | 11 |
| 95.46 – 101.8 | 8 |
| 101.8 – 108.1 | 5 |
| 108.1 – 114.4 | 5 |
| 114.4 – 120.6 | 0 |
| 120.6 – 126.9 | 4 |
| 126.9 – 133.2 | 1 |
| 133.2 – 139.5 | 0 |
| 139.5 – 145.8 | 0 |
| 145.8 – 152.1 | 1 |
| 152.1 – 158.4 | 0 |
| 158.4 – 164.7 | 0 |
| 164.7 – 171 | 0 |
| 171 – 177.3 | 1 |
| 177.3 – 183.6 | 0 |
| 183.6 – 189.9 | 0 |
| 189.9 – 196.2 | 1 |
| 196.2 – 202.5 | 0 |
| 202.5 – 208.8 | 0 |
| 208.8 – 215.1 | 0 |
| 215.1 – 221.4 | 0 |
| 221.4 – 227.7 | 0 |
| 227.7 – 234 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 2.243 | 948 |
| 2.243 – 4.486 | 175 |
| 4.486 – 6.73 | 76 |
| 6.73 – 8.973 | 61 |
| 8.973 – 11.22 | 52 |
| 11.22 – 13.46 | 25 |
| 13.46 – 15.7 | 11 |
| 15.7 – 17.95 | 13 |
| 17.95 – 20.19 | 12 |
| 20.19 – 22.43 | 3 |
| 22.43 – 24.68 | 4 |
| 24.68 – 26.92 | 3 |
| 26.92 – 29.16 | 6 |
| 29.16 – 31.41 | 2 |
| 31.41 – 33.65 | 5 |
| 33.65 – 35.89 | 2 |
| 35.89 – 38.14 | 11 |
| 38.14 – 40.38 | 6 |
| 40.38 – 42.62 | 1 |
| 42.62 – 44.86 | 3 |
| 44.86 – 47.11 | 5 |
| 47.11 – 49.35 | 3 |
| 49.35 – 51.59 | 1 |
| 51.59 – 53.84 | 0 |
| 53.84 – 56.08 | 2 |
| 56.08 – 58.32 | 0 |
| 58.32 – 60.57 | 0 |
| 60.57 – 62.81 | 0 |
| 62.81 – 65.05 | 0 |
| 65.05 – 67.3 | 0 |
| 67.3 – 69.54 | 0 |
| 69.54 – 71.78 | 0 |
| 71.78 – 74.03 | 0 |
| 74.03 – 76.27 | 0 |
| 76.27 – 78.51 | 1 |
| 78.51 – 80.76 | 1 |
| 80.76 – 83 | 1 |
Show data table
| bin | count |
|---|---|
| 1896 – 1899 | 10 |
| 1899 – 1903 | 18 |
| 1903 – 1906 | 12 |
| 1906 – 1910 | 19 |
| 1910 – 1913 | 19 |
| 1913 – 1917 | 0 |
| 1917 – 1920 | 22 |
| 1920 – 1924 | 0 |
| 1924 – 1927 | 27 |
| 1927 – 1931 | 33 |
| 1931 – 1934 | 28 |
| 1934 – 1938 | 32 |
| 1938 – 1941 | 0 |
| 1941 – 1944 | 0 |
| 1944 – 1948 | 0 |
| 1948 – 1951 | 38 |
| 1951 – 1955 | 43 |
| 1955 – 1958 | 38 |
| 1958 – 1962 | 44 |
| 1962 – 1965 | 41 |
| 1965 – 1969 | 44 |
| 1969 – 1972 | 48 |
| 1972 – 1976 | 0 |
| 1976 – 1979 | 41 |
| 1979 – 1982 | 36 |
| 1982 – 1986 | 47 |
| 1986 – 1989 | 52 |
| 1989 – 1993 | 64 |
| 1993 – 1996 | 79 |
| 1996 – 2000 | 0 |
| 2000 – 2003 | 80 |
| 2003 – 2007 | 74 |
| 2007 – 2010 | 87 |
| 2010 – 2014 | 86 |
| 2014 – 2017 | 86 |
| 2017 – 2021 | 93 |
| 2021 – 2024 | 92 |
Show data table
| value | count | share |
|---|---|---|
| FRA | 30 | 2.1% |
| GBR | 30 | 2.1% |
| USA | 29 | 2.0% |
| DEN | 29 | 2.0% |
| SUI | 29 | 2.0% |
| HUN | 28 | 2.0% |
| AUS | 28 | 2.0% |
| BEL | 28 | 2.0% |
| ITA | 28 | 2.0% |
| SWE | 28 | 2.0% |
| AUT | 27 | 1.9% |
| NED | 27 | 1.9% |
| CAN | 27 | 1.9% |
| NOR | 26 | 1.8% |
| FIN | 26 | 1.8% |
| JPN | 23 | 1.6% |
| NZL | 23 | 1.6% |
| POL | 23 | 1.6% |
| MEX | 22 | 1.5% |
| GRE | 21 | 1.5% |
Show data table
| bin | count |
|---|---|
| 1 – 3.486 | 90 |
| 3.486 – 5.973 | 60 |
| 5.973 – 8.459 | 90 |
| 8.459 – 10.95 | 60 |
| 10.95 – 13.43 | 86 |
| 13.43 – 15.92 | 56 |
| 15.92 – 18.41 | 84 |
| 18.41 – 20.89 | 52 |
| 20.89 – 23.38 | 74 |
| 23.38 – 25.86 | 48 |
| 25.86 – 28.35 | 71 |
| 28.35 – 30.84 | 44 |
| 30.84 – 33.32 | 65 |
| 33.32 – 35.81 | 40 |
| 35.81 – 38.3 | 58 |
| 38.3 – 40.78 | 34 |
| 40.78 – 43.27 | 47 |
| 43.27 – 45.76 | 26 |
| 45.76 – 48.24 | 35 |
| 48.24 – 50.73 | 20 |
| 50.73 – 53.22 | 29 |
| 53.22 – 55.7 | 18 |
| 55.7 – 58.19 | 27 |
| 58.19 – 60.68 | 18 |
| 60.68 – 63.16 | 27 |
| 63.16 – 65.65 | 17 |
| 65.65 – 68.14 | 24 |
| 68.14 – 70.62 | 16 |
| 70.62 – 73.11 | 24 |
| 73.11 – 75.59 | 15 |
| 75.59 – 78.08 | 21 |
| 78.08 – 80.57 | 13 |
| 80.57 – 83.05 | 15 |
| 83.05 – 85.54 | 10 |
| 85.54 – 88.03 | 10 |
| 88.03 – 90.51 | 4 |
| 90.51 – 93 | 5 |
Schema
8 columns| Alerts | ||||
|---|---|---|---|---|
| year | numeric | 0.0% | 30 |
|
| country | categorical | 0.0% | 159 |
|
| country_name | categorical | 0.0% | 165 |
|
| gold | numeric | 0.0% | 52 |
high_skew
outliers
|
| silver | numeric | 0.0% | 45 |
high_skew
outliers
|
| bronze | numeric | 0.0% | 44 |
high_skew
outliers
|
| total | numeric | 0.0% | 97 |
high_skew
outliers
|
| rank_total | numeric | 0.0% | 93 |
|
year
numeric timestampFour-digit calendar years spanning 1896 to 2024 with 30 distinct values across 1,433 rows and no nulls. The distribution is left-skewed (skew -0.76) toward recent decades, with a median of 1992 and IQR from 1960 to 2008, suggesting coverage is sparser in the early 20th century. No outliers were flagged. Treatment: Treat as a temporal feature; bucket by decade or use directly without scaling.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 30
- min
- 1,896
- max
- 2,024
- mean
- 1982
- median
- 1,992
- std
- 33.95
- q1
- 1,960
- q3
- 2,008
- iqr
- 48
- skew
- -0.7568
- kurtosis
- -0.408
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
country
categorical featureThree-letter country codes (e.g., FRA, GBR, USA, DEN, SUI) covering 159 distinct nations across 1433 rows with no nulls. The distribution is remarkably flat — the top value FRA accounts for only 2.1% of rows and entropy ratio is 0.92, so no country dominates. Top counts cluster tightly between 28 and 30, suggesting a near-uniform sampling design rather than organic population weights. Treatment: Group rare codes or target-encode before modelling; one-hot would create 159 sparse columns.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 159
- top_value
- FRA
- top_rate
- 0.02094
- cardinality
- 159
- entropy
- 6.695
- entropy_ratio
- 0.9155
country_name
categorical featureCategorical country labels with 165 distinct values across 1433 rows and no nulls. Distribution is remarkably flat — the top value 'France' covers only 2.09% of rows, and the top ten countries each appear 28–30 times, giving an entropy ratio of 0.91 (near-uniform). This looks like a panel where each country contributes a similar number of observations rather than a skewed real-world sample. Treatment: Use as a grouping key or target-encode/one-hot for modelling; consider mapping to ISO codes for joins.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 165
- top_value
- France
- top_rate
- 0.02094
- cardinality
- 165
- entropy
- 6.715
- entropy_ratio
- 0.9116
gold
numeric feature high_skew outliersNumeric count-style feature 'gold' ranging from 0 to 83 with median 1 and mean 4.06, so most rows sit near zero (zero_rate 0.339) while a long tail pulls the average up. Distribution is severely right-skewed (skew 4.26, kurtosis 23.14) with 134 outliers (9.35% of rows) above the q3 of 4. Only 52 unique values across 1433 rows suggests a discrete tally rather than a continuous measurement. Treatment: Apply a log1p transform before modelling to tame the heavy right tail.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 52
- min
- 0
- max
- 83
- mean
- 4.059
- median
- 1
- std
- 8.419
- q1
- 0
- q3
- 4
- iqr
- 4
- skew
- 4.259
- kurtosis
- 23.14
- n_outliers
- 134
- outlier_rate
- 0.09351
- zero_rate
- 0.3391
silver
numeric feature high_skew outliersA non-negative integer-like count of silver medals or items, with 45 distinct values ranging 0 to 79 and a median of 2. The distribution is heavily right-skewed (skew 4.03, kurtosis 23.2) with 25.3% zeros and 9.8% flagged as outliers, so a small set of large counts dominates the mean (4.04) versus the median. Treatment: Apply a log1p transform before modelling to tame the skew and heavy tail.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 45
- min
- 0
- max
- 79
- mean
- 4.038
- median
- 2
- std
- 7.121
- q1
- 0
- q3
- 4
- iqr
- 4
- skew
- 4.026
- kurtosis
- 23.21
- n_outliers
- 140
- outlier_rate
- 0.0977
- zero_rate
- 0.2526
bronze
numeric feature high_skew outliersThis is a count of bronze medals (or similar bronze-tier tally) per record, with 1433 rows, 44 distinct integer values from 0 to 78, and no nulls. The distribution is heavily right-skewed (skew 3.37, kurtosis 16.94): the median is 2 and Q3 is 5, yet the max reaches 78, producing 150 outliers (10.5%). Roughly 19.8% of rows are zero, so a sizeable share of entities have never won bronze. Treatment: Apply a log1p transform before modelling to tame the heavy right tail.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 44
- min
- 0
- max
- 78
- mean
- 4.398
- median
- 2
- std
- 6.853
- q1
- 1
- q3
- 5
- iqr
- 4
- skew
- 3.37
- kurtosis
- 16.94
- n_outliers
- 150
- outlier_rate
- 0.1047
- zero_rate
- 0.1982
total
numeric feature high_skew outliersThis appears to be a count-style numeric feature (total), heavily right-skewed: the median is 5 while the mean is 12.5 and the max reaches 234. Skew of 3.92 and kurtosis of 20.8 confirm a long tail, with 151 values (10.5%) flagged as outliers. No nulls or zeros, and only 97 unique values across 1,433 rows, suggesting a discrete count with a small repeating vocabulary. Treatment: log1p-transform before modelling to tame the heavy right tail.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 97
- min
- 1
- max
- 234
- mean
- 12.5
- median
- 5
- std
- 21.66
- q1
- 2
- q3
- 13
- iqr
- 11
- skew
- 3.922
- kurtosis
- 20.8
- n_outliers
- 151
- outlier_rate
- 0.1054
- zero_rate
- 0
rank_total
numeric featureInteger-valued ranking field spanning 1 to 93 with 93 unique values across 1433 rows, suggesting a complete rank table repeated many times (e.g., per period or per group). Distribution is right-skewed (skew 0.74) with median 26 below mean 31.06, so lower ranks dominate while a tail extends toward 93. No nulls, no zeros, and no outliers flagged given the bounded range. Treatment: Use as an ordinal feature; consider inverting (e.g., 94 - rank) so higher means better before modelling.
- n
- 1,433
- nulls
- 0 (0.0%)
- unique
- 93
- min
- 1
- max
- 93
- mean
- 31.06
- median
- 26
- std
- 22.7
- q1
- 13
- q3
- 45
- iqr
- 32
- skew
- 0.739
- kurtosis
- -0.3677
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0