saturn·

olympics olympic medals data

source /home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json 1,433 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 1,433 rows of Olympic medal counts by country and year, spanning 1896 to 2024 across 165 countries. Medal columns (gold, silver, bronze, total) are heavily right-skewed with high kurtosis and many outliers — a small number of dominant nations pull the means well above the medians (e.g. total has a median of 5 but a max of 234). Zero-rates are notable too: 33.9% of rows have zero gold medals and 25.3% zero silver, reflecting how often countries leave a Games empty-handed in a category. Country participation is fairly even at the top, with France and Great Britain tied as most-frequent entries (30 appearances each). Start by examining the shape of `total` and `gold` distributions and the `year` coverage to understand era effects.

citing: row_count · column_count · columns.total.stats · columns.gold.stats · columns.silver.stats · columns.bronze.stats · columns.country.top_values · columns.year.stats

Schema

8 columns
Per-column summary. Click column name to jump to its detail.
Alerts
year numeric 0.0% 30
country categorical 0.0% 159
country_name categorical 0.0% 165
gold numeric 0.0% 52
high_skew outliers
silver numeric 0.0% 45
high_skew outliers
bronze numeric 0.0% 44
high_skew outliers
total numeric 0.0% 97
high_skew outliers
rank_total numeric 0.0% 93

year

numeric timestamp
Four-digit calendar years spanning 1896 to 2024 with 30 distinct values across 1,433 rows and no nulls. The distribution is left-skewed (skew -0.76) toward recent decades, with a median of 1992 and IQR from 1960 to 2008, suggesting coverage is sparser in the early 20th century. No outliers were flagged. Treatment: Treat as a temporal feature; bucket by decade or use directly without scaling. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
30
min
1,896
max
2,024
mean
1982
median
1,992
std
33.95
q1
1,960
q3
2,008
iqr
48
skew
-0.7568
kurtosis
-0.408
n_outliers
0
outlier_rate
0
zero_rate
0

country

categorical feature
Three-letter country codes (e.g., FRA, GBR, USA, DEN, SUI) covering 159 distinct nations across 1433 rows with no nulls. The distribution is remarkably flat — the top value FRA accounts for only 2.1% of rows and entropy ratio is 0.92, so no country dominates. Top counts cluster tightly between 28 and 30, suggesting a near-uniform sampling design rather than organic population weights. Treatment: Group rare codes or target-encode before modelling; one-hot would create 159 sparse columns. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
159
top_value
FRA
top_rate
0.02094
cardinality
159
entropy
6.695
entropy_ratio
0.9155

country_name

categorical feature
Categorical country labels with 165 distinct values across 1433 rows and no nulls. Distribution is remarkably flat — the top value 'France' covers only 2.09% of rows, and the top ten countries each appear 28–30 times, giving an entropy ratio of 0.91 (near-uniform). This looks like a panel where each country contributes a similar number of observations rather than a skewed real-world sample. Treatment: Use as a grouping key or target-encode/one-hot for modelling; consider mapping to ISO codes for joins. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
165
top_value
France
top_rate
0.02094
cardinality
165
entropy
6.715
entropy_ratio
0.9116

gold

numeric feature high_skew outliers
Numeric count-style feature 'gold' ranging from 0 to 83 with median 1 and mean 4.06, so most rows sit near zero (zero_rate 0.339) while a long tail pulls the average up. Distribution is severely right-skewed (skew 4.26, kurtosis 23.14) with 134 outliers (9.35% of rows) above the q3 of 4. Only 52 unique values across 1433 rows suggests a discrete tally rather than a continuous measurement. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
52
min
0
max
83
mean
4.059
median
1
std
8.419
q1
0
q3
4
iqr
4
skew
4.259
kurtosis
23.14
n_outliers
134
outlier_rate
0.09351
zero_rate
0.3391

silver

numeric feature high_skew outliers
A non-negative integer-like count of silver medals or items, with 45 distinct values ranging 0 to 79 and a median of 2. The distribution is heavily right-skewed (skew 4.03, kurtosis 23.2) with 25.3% zeros and 9.8% flagged as outliers, so a small set of large counts dominates the mean (4.04) versus the median. Treatment: Apply a log1p transform before modelling to tame the skew and heavy tail. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
45
min
0
max
79
mean
4.038
median
2
std
7.121
q1
0
q3
4
iqr
4
skew
4.026
kurtosis
23.21
n_outliers
140
outlier_rate
0.0977
zero_rate
0.2526

bronze

numeric feature high_skew outliers
This is a count of bronze medals (or similar bronze-tier tally) per record, with 1433 rows, 44 distinct integer values from 0 to 78, and no nulls. The distribution is heavily right-skewed (skew 3.37, kurtosis 16.94): the median is 2 and Q3 is 5, yet the max reaches 78, producing 150 outliers (10.5%). Roughly 19.8% of rows are zero, so a sizeable share of entities have never won bronze. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
44
min
0
max
78
mean
4.398
median
2
std
6.853
q1
1
q3
5
iqr
4
skew
3.37
kurtosis
16.94
n_outliers
150
outlier_rate
0.1047
zero_rate
0.1982

total

numeric feature high_skew outliers
This appears to be a count-style numeric feature (total), heavily right-skewed: the median is 5 while the mean is 12.5 and the max reaches 234. Skew of 3.92 and kurtosis of 20.8 confirm a long tail, with 151 values (10.5%) flagged as outliers. No nulls or zeros, and only 97 unique values across 1,433 rows, suggesting a discrete count with a small repeating vocabulary. Treatment: log1p-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
97
min
1
max
234
mean
12.5
median
5
std
21.66
q1
2
q3
13
iqr
11
skew
3.922
kurtosis
20.8
n_outliers
151
outlier_rate
0.1054
zero_rate
0

rank_total

numeric feature
Integer-valued ranking field spanning 1 to 93 with 93 unique values across 1433 rows, suggesting a complete rank table repeated many times (e.g., per period or per group). Distribution is right-skewed (skew 0.74) with median 26 below mean 31.06, so lower ranks dominate while a tail extends toward 93. No nulls, no zeros, and no outliers flagged given the bounded range. Treatment: Use as an ordinal feature; consider inverting (e.g., 94 - rank) so higher means better before modelling. high · anthropic:claude-opus-4-7
n
1,433
nulls
0 (0.0%)
unique
93
min
1
max
93
mean
31.06
median
26
std
22.7
q1
13
q3
45
iqr
32
skew
0.739
kurtosis
-0.3677
n_outliers
0
outlier_rate
0
zero_rate
0