saturn·

olympics olympic medals data

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json

Saturn profiled 1,433 rows across 8 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json",
    "--findings", "olympics-olympic_medals_data.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 1,433 rows of Olympic medal counts by country and year, spanning 1896 to 2024 across 165 countries. Medal columns (gold, silver, bronze, total) are heavily right-skewed with high kurtosis and many outliers — a small number of dominant nations pull the means well above the medians (e.g. total has a median of 5 but a max of 234). Zero-rates are notable too: 33.9% of rows have zero gold medals and 25.3% zero silver, reflecting how often countries leave a Games empty-handed in a category. Country participation is fairly even at the top, with France and Great Britain tied as most-frequent entries (30 appearances each). Start by examining the shape of `total` and `gold` distributions and the `year` coverage to understand era effects.

citing: row_count · column_count · columns.total.stats · columns.gold.stats · columns.silver.stats · columns.bronze.stats · columns.country.top_values · columns.year.stats

Out[4]:

saturn.schema() · 8 columns

column kind n null% unique alerts
year numeric 1,433 0.0% 30
country categorical 1,433 0.0% 159
country_name categorical 1,433 0.0% 165
gold numeric 1,433 0.0% 52 high_skew outliers
silver numeric 1,433 0.0% 45 high_skew outliers
bronze numeric 1,433 0.0% 44 high_skew outliers
total numeric 1,433 0.0% 97 high_skew outliers
rank_total numeric 1,433 0.0% 93
Fig 1.
total · Highly skewed total medal counts — most rows are small while a few mega-hauls reach 234.
Show data table
Histogram bins for total (median: 5.0).
bincount
1 – 7.297907
7.297 – 13.59176
13.59 – 19.8999
19.89 – 26.1977
26.19 – 32.4942
32.49 – 38.7828
38.78 – 45.0820
45.08 – 51.3811
51.38 – 57.687
57.68 – 63.977
63.97 – 70.2712
70.27 – 76.575
76.57 – 82.861
82.86 – 89.163
89.16 – 95.4611
95.46 – 101.88
101.8 – 108.15
108.1 – 114.45
114.4 – 120.60
120.6 – 126.94
126.9 – 133.21
133.2 – 139.50
139.5 – 145.80
145.8 – 152.11
152.1 – 158.40
158.4 – 164.70
164.7 – 1710
171 – 177.31
177.3 – 183.60
183.6 – 189.90
189.9 – 196.21
196.2 – 202.50
202.5 – 208.80
208.8 – 215.10
215.1 – 221.40
221.4 – 227.70
227.7 – 2341
Fig 2.
gold · Gold medals are zero for ~34% of rows; check the long right tail of dominant performances.
Show data table
Histogram bins for gold (median: 1.0).
bincount
0 – 2.243948
2.243 – 4.486175
4.486 – 6.7376
6.73 – 8.97361
8.973 – 11.2252
11.22 – 13.4625
13.46 – 15.711
15.7 – 17.9513
17.95 – 20.1912
20.19 – 22.433
22.43 – 24.684
24.68 – 26.923
26.92 – 29.166
29.16 – 31.412
31.41 – 33.655
33.65 – 35.892
35.89 – 38.1411
38.14 – 40.386
40.38 – 42.621
42.62 – 44.863
44.86 – 47.115
47.11 – 49.353
49.35 – 51.591
51.59 – 53.840
53.84 – 56.082
56.08 – 58.320
58.32 – 60.570
60.57 – 62.810
62.81 – 65.050
65.05 – 67.30
67.3 – 69.540
69.54 – 71.780
71.78 – 74.030
74.03 – 76.270
76.27 – 78.511
78.51 – 80.761
80.76 – 831
Fig 3.
year · Coverage spans 1896–2024 with a left skew — more recent Games contribute more rows.
Show data table
Histogram bins for year (median: 1992.0).
bincount
1896 – 189910
1899 – 190318
1903 – 190612
1906 – 191019
1910 – 191319
1913 – 19170
1917 – 192022
1920 – 19240
1924 – 192727
1927 – 193133
1931 – 193428
1934 – 193832
1938 – 19410
1941 – 19440
1944 – 19480
1948 – 195138
1951 – 195543
1955 – 195838
1958 – 196244
1962 – 196541
1965 – 196944
1969 – 197248
1972 – 19760
1976 – 197941
1979 – 198236
1982 – 198647
1986 – 198952
1989 – 199364
1993 – 199679
1996 – 20000
2000 – 200380
2003 – 200774
2007 – 201087
2010 – 201486
2014 – 201786
2017 – 202193
2021 – 202492
Fig 4.
country · Top countries by appearances are tightly clustered, with France and Great Britain leading at 30 each.
Show data table
Top values for country (20 unique shown, of 159 total).
valuecountshare
FRA302.1%
GBR302.1%
USA292.0%
DEN292.0%
SUI292.0%
HUN282.0%
AUS282.0%
BEL282.0%
ITA282.0%
SWE282.0%
AUT271.9%
NED271.9%
CAN271.9%
NOR261.8%
FIN261.8%
JPN231.6%
NZL231.6%
POL231.6%
MEX221.5%
GRE211.5%
Fig 5.
rank_total · Ranks distribute fairly evenly from 1 to 93 — useful context for interpreting medal totals.
Show data table
Histogram bins for rank_total (median: 26.0).
bincount
1 – 3.48690
3.486 – 5.97360
5.973 – 8.45990
8.459 – 10.9560
10.95 – 13.4386
13.43 – 15.9256
15.92 – 18.4184
18.41 – 20.8952
20.89 – 23.3874
23.38 – 25.8648
25.86 – 28.3571
28.35 – 30.8444
30.84 – 33.3265
33.32 – 35.8140
35.81 – 38.358
38.3 – 40.7834
40.78 – 43.2747
43.27 – 45.7626
45.76 – 48.2435
48.24 – 50.7320
50.73 – 53.2229
53.22 – 55.718
55.7 – 58.1927
58.19 – 60.6818
60.68 – 63.1627
63.16 – 65.6517
65.65 – 68.1424
68.14 – 70.6216
70.62 – 73.1124
73.11 – 75.5915
75.59 – 78.0821
78.08 – 80.5713
80.57 – 83.0515
83.05 – 85.5410
85.54 – 88.0310
88.03 – 90.514
90.51 – 935
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
yearnumeric0.0%
countrycategorical0.0%
country_namecategorical0.0%
goldnumeric0.0%
silvernumeric0.0%
bronzenumeric0.0%
totalnumeric0.0%
rank_totalnumeric0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
yeargoldsilverbronzetotalrank_total
year+1.00-0.10-0.10-0.07-0.09+0.53
gold-0.10+1.00+0.94+0.87+0.97-0.43
silver-0.10+0.94+1.00+0.91+0.98-0.46
bronze-0.07+0.87+0.91+1.00+0.95-0.50
total-0.09+0.97+0.98+0.95+1.00-0.48
rank_total+0.53-0.43-0.46-0.50-0.48+1.00

year numeric timestamp

Four-digit calendar years spanning 1896 to 2024 with 30 distinct values across 1,433 rows and no nulls. The distribution is left-skewed (skew -0.76) toward recent decades, with a median of 1992 and IQR from 1960 to 2008, suggesting coverage is sparser in the early 20th century. No outliers were flagged.

Treatment: Treat as a temporal feature; bucket by decade or use directly without scaling.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["year"].stats

statvalue
n1,433
nulls0 (0.0%)
unique30
min 1,896
max 2,024
mean 1982
median 1,992
std 33.95
q1 1,960
q3 2,008
iqr 48
skew -0.7568
kurtosis -0.408
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of year. Vertical dash marks the median.
Show data table
Histogram bins for year (median: 1992.0).
bincount
1896 – 189910
1899 – 190318
1903 – 190612
1906 – 191019
1910 – 191319
1913 – 19170
1917 – 192022
1920 – 19240
1924 – 192727
1927 – 193133
1931 – 193428
1934 – 193832
1938 – 19410
1941 – 19440
1944 – 19480
1948 – 195138
1951 – 195543
1955 – 195838
1958 – 196244
1962 – 196541
1965 – 196944
1969 – 197248
1972 – 19760
1976 – 197941
1979 – 198236
1982 – 198647
1986 – 198952
1989 – 199364
1993 – 199679
1996 – 20000
2000 – 200380
2003 – 200774
2007 – 201087
2010 – 201486
2014 – 201786
2017 – 202193
2021 – 202492

country categorical feature

Three-letter country codes (e.g., FRA, GBR, USA, DEN, SUI) covering 159 distinct nations across 1433 rows with no nulls. The distribution is remarkably flat — the top value FRA accounts for only 2.1% of rows and entropy ratio is 0.92, so no country dominates. Top counts cluster tightly between 28 and 30, suggesting a near-uniform sampling design rather than organic population weights.

Treatment: Group rare codes or target-encode before modelling; one-hot would create 159 sparse columns.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["country"].stats

statvalue
n1,433
nulls0 (0.0%)
unique159
top_value FRA
top_rate 0.02094
cardinality 159
entropy 6.695
entropy_ratio 0.9155
Fig 9.
Top values for country.
Show data table
Top values for country (20 unique shown, of 159 total).
valuecountshare
FRA302.1%
GBR302.1%
USA292.0%
DEN292.0%
SUI292.0%
HUN282.0%
AUS282.0%
BEL282.0%
ITA282.0%
SWE282.0%
AUT271.9%
NED271.9%
CAN271.9%
NOR261.8%
FIN261.8%
JPN231.6%
NZL231.6%
POL231.6%
MEX221.5%
GRE211.5%

country_name categorical feature

Categorical country labels with 165 distinct values across 1433 rows and no nulls. Distribution is remarkably flat — the top value 'France' covers only 2.09% of rows, and the top ten countries each appear 28–30 times, giving an entropy ratio of 0.91 (near-uniform). This looks like a panel where each country contributes a similar number of observations rather than a skewed real-world sample.

Treatment: Use as a grouping key or target-encode/one-hot for modelling; consider mapping to ISO codes for joins.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["country_name"].stats

statvalue
n1,433
nulls0 (0.0%)
unique165
top_value France
top_rate 0.02094
cardinality 165
entropy 6.715
entropy_ratio 0.9116
Fig 10.
Top values for country_name.
Show data table
Top values for country_name (20 unique shown, of 165 total).
valuecountshare
France302.1%
Great Britain302.1%
United States292.0%
Denmark292.0%
Switzerland292.0%
Hungary282.0%
Australia282.0%
Belgium282.0%
Italy282.0%
Sweden282.0%
Austria271.9%
Netherlands271.9%
Canada271.9%
Norway261.8%
Finland261.8%
Japan231.6%
New Zealand231.6%
Poland231.6%
Mexico221.5%
Greece211.5%

gold numeric feature

Numeric count-style feature 'gold' ranging from 0 to 83 with median 1 and mean 4.06, so most rows sit near zero (zero_rate 0.339) while a long tail pulls the average up. Distribution is severely right-skewed (skew 4.26, kurtosis 23.14) with 134 outliers (9.35% of rows) above the q3 of 4. Only 52 unique values across 1433 rows suggests a discrete tally rather than a continuous measurement.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["gold"].stats

statvalue
n1,433
nulls0 (0.0%)
unique52
min 0
max 83
mean 4.059
median 1
std 8.419
q1 0
q3 4
iqr 4
skew 4.259
kurtosis 23.14
n_outliers 134
outlier_rate 0.09351
zero_rate 0.3391
alert: high_skewskew=+4.26
alert: outliers9.4% rows beyond 1.5 IQR
Fig 11.
Distribution of gold. Vertical dash marks the median.
Show data table
Histogram bins for gold (median: 1.0).
bincount
0 – 2.243948
2.243 – 4.486175
4.486 – 6.7376
6.73 – 8.97361
8.973 – 11.2252
11.22 – 13.4625
13.46 – 15.711
15.7 – 17.9513
17.95 – 20.1912
20.19 – 22.433
22.43 – 24.684
24.68 – 26.923
26.92 – 29.166
29.16 – 31.412
31.41 – 33.655
33.65 – 35.892
35.89 – 38.1411
38.14 – 40.386
40.38 – 42.621
42.62 – 44.863
44.86 – 47.115
47.11 – 49.353
49.35 – 51.591
51.59 – 53.840
53.84 – 56.082
56.08 – 58.320
58.32 – 60.570
60.57 – 62.810
62.81 – 65.050
65.05 – 67.30
67.3 – 69.540
69.54 – 71.780
71.78 – 74.030
74.03 – 76.270
76.27 – 78.511
78.51 – 80.761
80.76 – 831

silver numeric feature

A non-negative integer-like count of silver medals or items, with 45 distinct values ranging 0 to 79 and a median of 2. The distribution is heavily right-skewed (skew 4.03, kurtosis 23.2) with 25.3% zeros and 9.8% flagged as outliers, so a small set of large counts dominates the mean (4.04) versus the median.

Treatment: Apply a log1p transform before modelling to tame the skew and heavy tail.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["silver"].stats

statvalue
n1,433
nulls0 (0.0%)
unique45
min 0
max 79
mean 4.038
median 2
std 7.121
q1 0
q3 4
iqr 4
skew 4.026
kurtosis 23.21
n_outliers 140
outlier_rate 0.0977
zero_rate 0.2526
alert: high_skewskew=+4.03
alert: outliers9.8% rows beyond 1.5 IQR
Fig 12.
Distribution of silver. Vertical dash marks the median.
Show data table
Histogram bins for silver (median: 2.0).
bincount
0 – 2.135894
2.135 – 4.27187
4.27 – 6.405111
6.405 – 8.54159
8.541 – 10.6842
10.68 – 12.8132
12.81 – 14.9518
14.95 – 17.0816
17.08 – 19.2213
19.22 – 21.358
21.35 – 23.497
23.49 – 25.625
25.62 – 27.769
27.76 – 29.895
29.89 – 32.0310
32.03 – 34.161
34.16 – 36.33
36.3 – 38.433
38.43 – 40.572
40.57 – 42.73
42.7 – 44.841
44.84 – 46.970
46.97 – 49.110
49.11 – 51.241
51.24 – 53.380
53.38 – 55.510
55.51 – 57.650
57.65 – 59.780
59.78 – 61.921
61.92 – 64.050
64.05 – 66.190
66.19 – 68.320
68.32 – 70.461
70.46 – 72.590
72.59 – 74.730
74.73 – 76.860
76.86 – 791

bronze numeric feature

This is a count of bronze medals (or similar bronze-tier tally) per record, with 1433 rows, 44 distinct integer values from 0 to 78, and no nulls. The distribution is heavily right-skewed (skew 3.37, kurtosis 16.94): the median is 2 and Q3 is 5, yet the max reaches 78, producing 150 outliers (10.5%). Roughly 19.8% of rows are zero, so a sizeable share of entities have never won bronze.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["bronze"].stats

statvalue
n1,433
nulls0 (0.0%)
unique44
min 0
max 78
mean 4.398
median 2
std 6.853
q1 1
q3 5
iqr 4
skew 3.37
kurtosis 16.94
n_outliers 150
outlier_rate 0.1047
zero_rate 0.1982
alert: high_skewskew=+3.37
alert: outliers10.5% rows beyond 1.5 IQR
Fig 13.
Distribution of bronze. Vertical dash marks the median.
Show data table
Histogram bins for bronze (median: 2.0).
bincount
0 – 2.108841
2.108 – 4.216205
4.216 – 6.324105
6.324 – 8.43253
8.432 – 10.5460
10.54 – 12.6541
12.65 – 14.7623
14.76 – 16.8622
16.86 – 18.9717
18.97 – 21.0812
21.08 – 23.1910
23.19 – 25.35
25.3 – 27.418
27.41 – 29.515
29.51 – 31.628
31.62 – 33.734
33.73 – 35.843
35.84 – 37.953
37.95 – 40.053
40.05 – 42.162
42.16 – 44.270
44.27 – 46.382
46.38 – 48.490
48.49 – 50.590
50.59 – 52.70
52.7 – 54.810
54.81 – 56.920
56.92 – 59.030
59.03 – 61.140
61.14 – 63.240
63.24 – 65.350
65.35 – 67.460
67.46 – 69.570
69.57 – 71.680
71.68 – 73.780
73.78 – 75.890
75.89 – 781

total numeric feature

This appears to be a count-style numeric feature (total), heavily right-skewed: the median is 5 while the mean is 12.5 and the max reaches 234. Skew of 3.92 and kurtosis of 20.8 confirm a long tail, with 151 values (10.5%) flagged as outliers. No nulls or zeros, and only 97 unique values across 1,433 rows, suggesting a discrete count with a small repeating vocabulary.

Treatment: log1p-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["total"].stats

statvalue
n1,433
nulls0 (0.0%)
unique97
min 1
max 234
mean 12.5
median 5
std 21.66
q1 2
q3 13
iqr 11
skew 3.922
kurtosis 20.8
n_outliers 151
outlier_rate 0.1054
zero_rate 0
alert: high_skewskew=+3.92
alert: outliers10.5% rows beyond 1.5 IQR
Fig 14.
Distribution of total. Vertical dash marks the median.
Show data table
Histogram bins for total (median: 5.0).
bincount
1 – 7.297907
7.297 – 13.59176
13.59 – 19.8999
19.89 – 26.1977
26.19 – 32.4942
32.49 – 38.7828
38.78 – 45.0820
45.08 – 51.3811
51.38 – 57.687
57.68 – 63.977
63.97 – 70.2712
70.27 – 76.575
76.57 – 82.861
82.86 – 89.163
89.16 – 95.4611
95.46 – 101.88
101.8 – 108.15
108.1 – 114.45
114.4 – 120.60
120.6 – 126.94
126.9 – 133.21
133.2 – 139.50
139.5 – 145.80
145.8 – 152.11
152.1 – 158.40
158.4 – 164.70
164.7 – 1710
171 – 177.31
177.3 – 183.60
183.6 – 189.90
189.9 – 196.21
196.2 – 202.50
202.5 – 208.80
208.8 – 215.10
215.1 – 221.40
221.4 – 227.70
227.7 – 2341

rank_total numeric feature

Integer-valued ranking field spanning 1 to 93 with 93 unique values across 1433 rows, suggesting a complete rank table repeated many times (e.g., per period or per group). Distribution is right-skewed (skew 0.74) with median 26 below mean 31.06, so lower ranks dominate while a tail extends toward 93. No nulls, no zeros, and no outliers flagged given the bounded range.

Treatment: Use as an ordinal feature; consider inverting (e.g., 94 - rank) so higher means better before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["rank_total"].stats

statvalue
n1,433
nulls0 (0.0%)
unique93
min 1
max 93
mean 31.06
median 26
std 22.7
q1 13
q3 45
iqr 32
skew 0.739
kurtosis -0.3677
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 15.
Distribution of rank_total. Vertical dash marks the median.
Show data table
Histogram bins for rank_total (median: 26.0).
bincount
1 – 3.48690
3.486 – 5.97360
5.973 – 8.45990
8.459 – 10.9560
10.95 – 13.4386
13.43 – 15.9256
15.92 – 18.4184
18.41 – 20.8952
20.89 – 23.3874
23.38 – 25.8648
25.86 – 28.3571
28.35 – 30.8444
30.84 – 33.3265
33.32 – 35.8140
35.81 – 38.358
38.3 – 40.7834
40.78 – 43.2747
43.27 – 45.7626
45.76 – 48.2435
48.24 – 50.7320
50.73 – 53.2229
53.22 – 55.718
55.7 – 58.1927
58.19 – 60.6818
60.68 – 63.1627
63.16 – 65.6517
65.65 – 68.1424
68.14 – 70.6216
70.62 – 73.1124
73.11 – 75.5915
75.59 – 78.0821
78.08 – 80.5713
80.57 – 83.0515
83.05 – 85.5410
85.54 – 88.0310
88.03 – 90.514
90.51 – 935

How to cite

click to copy

BibTeX
@misc{saturn-olympics-olympic-medals-data-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: olympics olympic medals data},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/olympics-olympic_medals_data}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: olympics olympic medals data. Source: /home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/olympics-olympic_medals_data