olympics olympic medals data

source /home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json 1,433 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 1,433 rows of Olympic medal counts by country and year, spanning 1896 to 2024 across 165 countries. Medal columns (gold, silver, bronze, total) are heavily right-skewed with high kurtosis and many outliers — a small number of dominant nations pull the means well above the medians (e.g. total has a median of 5 but a max of 234). Zero-rates are notable too: 33.9% of rows have zero gold medals and 25.3% zero silver, reflecting how often countries leave a Games empty-handed in a category. Country participation is fairly even at the top, with France and Great Britain tied as most-frequent entries (30 appearances each). Start by examining the shape of `total` and `gold` distributions and the `year` coverage to understand era effects.

citing: row_count · column_count · columns.total.stats · columns.gold.stats · columns.silver.stats · columns.bronze.stats · columns.country.top_values · columns.year.stats

Charts the summary said to look at first

total · Highly skewed total medal counts — most rows are small while a few mega-hauls reach 234.

Show data table

Histogram bins for total (median: 5.0).
bin	count
1 – 7.297	907
7.297 – 13.59	176
13.59 – 19.89	99
19.89 – 26.19	77
26.19 – 32.49	42
32.49 – 38.78	28
38.78 – 45.08	20
45.08 – 51.38	11
51.38 – 57.68	7
57.68 – 63.97	7
63.97 – 70.27	12
70.27 – 76.57	5
76.57 – 82.86	1
82.86 – 89.16	3
89.16 – 95.46	11
95.46 – 101.8	8
101.8 – 108.1	5
108.1 – 114.4	5
114.4 – 120.6	0
120.6 – 126.9	4
126.9 – 133.2	1
133.2 – 139.5	0
139.5 – 145.8	0
145.8 – 152.1	1
152.1 – 158.4	0
158.4 – 164.7	0
164.7 – 171	0
171 – 177.3	1
177.3 – 183.6	0
183.6 – 189.9	0
189.9 – 196.2	1
196.2 – 202.5	0
202.5 – 208.8	0
208.8 – 215.1	0
215.1 – 221.4	0
221.4 – 227.7	0
227.7 – 234	1

gold · Gold medals are zero for ~34% of rows; check the long right tail of dominant performances.

Show data table

Histogram bins for gold (median: 1.0).
bin	count
0 – 2.243	948
2.243 – 4.486	175
4.486 – 6.73	76
6.73 – 8.973	61
8.973 – 11.22	52
11.22 – 13.46	25
13.46 – 15.7	11
15.7 – 17.95	13
17.95 – 20.19	12
20.19 – 22.43	3
22.43 – 24.68	4
24.68 – 26.92	3
26.92 – 29.16	6
29.16 – 31.41	2
31.41 – 33.65	5
33.65 – 35.89	2
35.89 – 38.14	11
38.14 – 40.38	6
40.38 – 42.62	1
42.62 – 44.86	3
44.86 – 47.11	5
47.11 – 49.35	3
49.35 – 51.59	1
51.59 – 53.84	0
53.84 – 56.08	2
56.08 – 58.32	0
58.32 – 60.57	0
60.57 – 62.81	0
62.81 – 65.05	0
65.05 – 67.3	0
67.3 – 69.54	0
69.54 – 71.78	0
71.78 – 74.03	0
74.03 – 76.27	0
76.27 – 78.51	1
78.51 – 80.76	1
80.76 – 83	1

year · Coverage spans 1896–2024 with a left skew — more recent Games contribute more rows.

Show data table

Histogram bins for year (median: 1992.0).
bin	count
1896 – 1899	10
1899 – 1903	18
1903 – 1906	12
1906 – 1910	19
1910 – 1913	19
1913 – 1917	0
1917 – 1920	22
1920 – 1924	0
1924 – 1927	27
1927 – 1931	33
1931 – 1934	28
1934 – 1938	32
1938 – 1941	0
1941 – 1944	0
1944 – 1948	0
1948 – 1951	38
1951 – 1955	43
1955 – 1958	38
1958 – 1962	44
1962 – 1965	41
1965 – 1969	44
1969 – 1972	48
1972 – 1976	0
1976 – 1979	41
1979 – 1982	36
1982 – 1986	47
1986 – 1989	52
1989 – 1993	64
1993 – 1996	79
1996 – 2000	0
2000 – 2003	80
2003 – 2007	74
2007 – 2010	87
2010 – 2014	86
2014 – 2017	86
2017 – 2021	93
2021 – 2024	92

country · Top countries by appearances are tightly clustered, with France and Great Britain leading at 30 each.

Show data table

Top values for country (20 unique shown, of 159 total).
value	count	share
FRA	30	2.1%
GBR	30	2.1%
USA	29	2.0%
DEN	29	2.0%
SUI	29	2.0%
HUN	28	2.0%
AUS	28	2.0%
BEL	28	2.0%
ITA	28	2.0%
SWE	28	2.0%
AUT	27	1.9%
NED	27	1.9%
CAN	27	1.9%
NOR	26	1.8%
FIN	26	1.8%
JPN	23	1.6%
NZL	23	1.6%
POL	23	1.6%
MEX	22	1.5%
GRE	21	1.5%

rank_total · Ranks distribute fairly evenly from 1 to 93 — useful context for interpreting medal totals.

Show data table

Histogram bins for rank_total (median: 26.0).
bin	count
1 – 3.486	90
3.486 – 5.973	60
5.973 – 8.459	90
8.459 – 10.95	60
10.95 – 13.43	86
13.43 – 15.92	56
15.92 – 18.41	84
18.41 – 20.89	52
20.89 – 23.38	74
23.38 – 25.86	48
25.86 – 28.35	71
28.35 – 30.84	44
30.84 – 33.32	65
33.32 – 35.81	40
35.81 – 38.3	58
38.3 – 40.78	34
40.78 – 43.27	47
43.27 – 45.76	26
45.76 – 48.24	35
48.24 – 50.73	20
50.73 – 53.22	29
53.22 – 55.7	18
55.7 – 58.19	27
58.19 – 60.68	18
60.68 – 63.16	27
63.16 – 65.65	17
65.65 – 68.14	24
68.14 – 70.62	16
70.62 – 73.11	24
73.11 – 75.59	15
75.59 – 78.08	21
78.08 – 80.57	13
80.57 – 83.05	15
83.05 – 85.54	10
85.54 – 88.03	10
88.03 – 90.51	4
90.51 – 93	5

Schema

8 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
year	numeric	0.0%	30
country	categorical	0.0%	159
country_name	categorical	0.0%	165
gold	numeric	0.0%	52	high_skew outliers
silver	numeric	0.0%	45	high_skew outliers
bronze	numeric	0.0%	44	high_skew outliers
total	numeric	0.0%	97	high_skew outliers
rank_total	numeric	0.0%	93

year

numeric timestamp

Four-digit calendar years spanning 1896 to 2024 with 30 distinct values across 1,433 rows and no nulls. The distribution is left-skewed (skew -0.76) toward recent decades, with a median of 1992 and IQR from 1960 to 2008, suggesting coverage is sparser in the early 20th century. No outliers were flagged. Treatment: Treat as a temporal feature; bucket by decade or use directly without scaling. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 30
min: 1,896
max: 2,024
mean: 1982
median: 1,992
std: 33.95
q1: 1,960
q3: 2,008
iqr: 48
skew: -0.7568
kurtosis: -0.408
n_outliers: 0
outlier_rate: 0
zero_rate: 0

country

categorical feature

Three-letter country codes (e.g., FRA, GBR, USA, DEN, SUI) covering 159 distinct nations across 1433 rows with no nulls. The distribution is remarkably flat — the top value FRA accounts for only 2.1% of rows and entropy ratio is 0.92, so no country dominates. Top counts cluster tightly between 28 and 30, suggesting a near-uniform sampling design rather than organic population weights. Treatment: Group rare codes or target-encode before modelling; one-hot would create 159 sparse columns. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 159
top_value: FRA
top_rate: 0.02094
cardinality: 159
entropy: 6.695
entropy_ratio: 0.9155

country_name

categorical feature

Categorical country labels with 165 distinct values across 1433 rows and no nulls. Distribution is remarkably flat — the top value 'France' covers only 2.09% of rows, and the top ten countries each appear 28–30 times, giving an entropy ratio of 0.91 (near-uniform). This looks like a panel where each country contributes a similar number of observations rather than a skewed real-world sample. Treatment: Use as a grouping key or target-encode/one-hot for modelling; consider mapping to ISO codes for joins. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 165
top_value: France
top_rate: 0.02094
cardinality: 165
entropy: 6.715
entropy_ratio: 0.9116

gold

numeric feature high_skew outliers

Numeric count-style feature 'gold' ranging from 0 to 83 with median 1 and mean 4.06, so most rows sit near zero (zero_rate 0.339) while a long tail pulls the average up. Distribution is severely right-skewed (skew 4.26, kurtosis 23.14) with 134 outliers (9.35% of rows) above the q3 of 4. Only 52 unique values across 1433 rows suggests a discrete tally rather than a continuous measurement. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 52
min: 0
max: 83
mean: 4.059
median: 1
std: 8.419
q1: 0
q3: 4
iqr: 4
skew: 4.259
kurtosis: 23.14
n_outliers: 134
outlier_rate: 0.09351
zero_rate: 0.3391

silver

numeric feature high_skew outliers

A non-negative integer-like count of silver medals or items, with 45 distinct values ranging 0 to 79 and a median of 2. The distribution is heavily right-skewed (skew 4.03, kurtosis 23.2) with 25.3% zeros and 9.8% flagged as outliers, so a small set of large counts dominates the mean (4.04) versus the median. Treatment: Apply a log1p transform before modelling to tame the skew and heavy tail. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 45
min: 0
max: 79
mean: 4.038
median: 2
std: 7.121
q1: 0
q3: 4
iqr: 4
skew: 4.026
kurtosis: 23.21
n_outliers: 140
outlier_rate: 0.0977
zero_rate: 0.2526

bronze

numeric feature high_skew outliers

This is a count of bronze medals (or similar bronze-tier tally) per record, with 1433 rows, 44 distinct integer values from 0 to 78, and no nulls. The distribution is heavily right-skewed (skew 3.37, kurtosis 16.94): the median is 2 and Q3 is 5, yet the max reaches 78, producing 150 outliers (10.5%). Roughly 19.8% of rows are zero, so a sizeable share of entities have never won bronze. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 44
min: 0
max: 78
mean: 4.398
median: 2
std: 6.853
q1: 1
q3: 5
iqr: 4
skew: 3.37
kurtosis: 16.94
n_outliers: 150
outlier_rate: 0.1047
zero_rate: 0.1982

total

numeric feature high_skew outliers

This appears to be a count-style numeric feature (total), heavily right-skewed: the median is 5 while the mean is 12.5 and the max reaches 234. Skew of 3.92 and kurtosis of 20.8 confirm a long tail, with 151 values (10.5%) flagged as outliers. No nulls or zeros, and only 97 unique values across 1,433 rows, suggesting a discrete count with a small repeating vocabulary. Treatment: log1p-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 97
min: 1
max: 234
mean: 12.5
median: 5
std: 21.66
q1: 2
q3: 13
iqr: 11
skew: 3.922
kurtosis: 20.8
n_outliers: 151
outlier_rate: 0.1054
zero_rate: 0

rank_total

numeric feature

Integer-valued ranking field spanning 1 to 93 with 93 unique values across 1433 rows, suggesting a complete rank table repeated many times (e.g., per period or per group). Distribution is right-skewed (skew 0.74) with median 26 below mean 31.06, so lower ranks dominate while a tail extends toward 93. No nulls, no zeros, and no outliers flagged given the bounded range. Treatment: Use as an ordinal feature; consider inverting (e.g., 94 - rank) so higher means better before modelling. high · anthropic:claude-opus-4-7

n: 1,433
nulls: 0 (0.0%)
unique: 93
min: 1
max: 93
mean: 31.06
median: 26
std: 22.7
q1: 13
q3: 45
iqr: 32
skew: 0.739
kurtosis: -0.3677
n_outliers: 0
outlier_rate: 0
zero_rate: 0