olympics-olympic_medals_data · saturn notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json

Saturn profiled 1,433 rows across 8 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/cultural/olympics/olympic_medals_data.json",
    "--findings", "olympics-olympic_medals_data.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 1,433 rows of Olympic medal counts by country and year, spanning 1896 to 2024 across 165 countries. Medal columns (gold, silver, bronze, total) are heavily right-skewed with high kurtosis and many outliers — a small number of dominant nations pull the means well above the medians (e.g. total has a median of 5 but a max of 234). Zero-rates are notable too: 33.9% of rows have zero gold medals and 25.3% zero silver, reflecting how often countries leave a Games empty-handed in a category. Country participation is fairly even at the top, with France and Great Britain tied as most-frequent entries (30 appearances each). Start by examining the shape of `total` and `gold` distributions and the `year` coverage to understand era effects.

citing: row_count · column_count · columns.total.stats · columns.gold.stats · columns.silver.stats · columns.bronze.stats · columns.country.top_values · columns.year.stats

Out[4]:

saturn.schema() · 8 columns

column	kind	n	null%	unique	alerts
year	numeric	1,433	0.0%	30
country	categorical	1,433	0.0%	159
country_name	categorical	1,433	0.0%	165
gold	numeric	1,433	0.0%	52	high_skew outliers
silver	numeric	1,433	0.0%	45	high_skew outliers
bronze	numeric	1,433	0.0%	44	high_skew outliers
total	numeric	1,433	0.0%	97	high_skew outliers
rank_total	numeric	1,433	0.0%	93

Fig 1.

total · Highly skewed total medal counts — most rows are small while a few mega-hauls reach 234.

Show data table

Histogram bins for total (median: 5.0).
bin	count
1 – 7.297	907
7.297 – 13.59	176
13.59 – 19.89	99
19.89 – 26.19	77
26.19 – 32.49	42
32.49 – 38.78	28
38.78 – 45.08	20
45.08 – 51.38	11
51.38 – 57.68	7
57.68 – 63.97	7
63.97 – 70.27	12
70.27 – 76.57	5
76.57 – 82.86	1
82.86 – 89.16	3
89.16 – 95.46	11
95.46 – 101.8	8
101.8 – 108.1	5
108.1 – 114.4	5
114.4 – 120.6	0
120.6 – 126.9	4
126.9 – 133.2	1
133.2 – 139.5	0
139.5 – 145.8	0
145.8 – 152.1	1
152.1 – 158.4	0
158.4 – 164.7	0
164.7 – 171	0
171 – 177.3	1
177.3 – 183.6	0
183.6 – 189.9	0
189.9 – 196.2	1
196.2 – 202.5	0
202.5 – 208.8	0
208.8 – 215.1	0
215.1 – 221.4	0
221.4 – 227.7	0
227.7 – 234	1

Fig 2.

gold · Gold medals are zero for ~34% of rows; check the long right tail of dominant performances.

Show data table

Histogram bins for gold (median: 1.0).
bin	count
0 – 2.243	948
2.243 – 4.486	175
4.486 – 6.73	76
6.73 – 8.973	61
8.973 – 11.22	52
11.22 – 13.46	25
13.46 – 15.7	11
15.7 – 17.95	13
17.95 – 20.19	12
20.19 – 22.43	3
22.43 – 24.68	4
24.68 – 26.92	3
26.92 – 29.16	6
29.16 – 31.41	2
31.41 – 33.65	5
33.65 – 35.89	2
35.89 – 38.14	11
38.14 – 40.38	6
40.38 – 42.62	1
42.62 – 44.86	3
44.86 – 47.11	5
47.11 – 49.35	3
49.35 – 51.59	1
51.59 – 53.84	0
53.84 – 56.08	2
56.08 – 58.32	0
58.32 – 60.57	0
60.57 – 62.81	0
62.81 – 65.05	0
65.05 – 67.3	0
67.3 – 69.54	0
69.54 – 71.78	0
71.78 – 74.03	0
74.03 – 76.27	0
76.27 – 78.51	1
78.51 – 80.76	1
80.76 – 83	1

Fig 3.

year · Coverage spans 1896–2024 with a left skew — more recent Games contribute more rows.

Show data table

Histogram bins for year (median: 1992.0).
bin	count
1896 – 1899	10
1899 – 1903	18
1903 – 1906	12
1906 – 1910	19
1910 – 1913	19
1913 – 1917	0
1917 – 1920	22
1920 – 1924	0
1924 – 1927	27
1927 – 1931	33
1931 – 1934	28
1934 – 1938	32
1938 – 1941	0
1941 – 1944	0
1944 – 1948	0
1948 – 1951	38
1951 – 1955	43
1955 – 1958	38
1958 – 1962	44
1962 – 1965	41
1965 – 1969	44
1969 – 1972	48
1972 – 1976	0
1976 – 1979	41
1979 – 1982	36
1982 – 1986	47
1986 – 1989	52
1989 – 1993	64
1993 – 1996	79
1996 – 2000	0
2000 – 2003	80
2003 – 2007	74
2007 – 2010	87
2010 – 2014	86
2014 – 2017	86
2017 – 2021	93
2021 – 2024	92

Fig 4.

country · Top countries by appearances are tightly clustered, with France and Great Britain leading at 30 each.

Show data table

Top values for country (20 unique shown, of 159 total).
value	count	share
FRA	30	2.1%
GBR	30	2.1%
USA	29	2.0%
DEN	29	2.0%
SUI	29	2.0%
HUN	28	2.0%
AUS	28	2.0%
BEL	28	2.0%
ITA	28	2.0%
SWE	28	2.0%
AUT	27	1.9%
NED	27	1.9%
CAN	27	1.9%
NOR	26	1.8%
FIN	26	1.8%
JPN	23	1.6%
NZL	23	1.6%
POL	23	1.6%
MEX	22	1.5%
GRE	21	1.5%

Fig 5.

rank_total · Ranks distribute fairly evenly from 1 to 93 — useful context for interpreting medal totals.

Show data table

Histogram bins for rank_total (median: 26.0).
bin	count
1 – 3.486	90
3.486 – 5.973	60
5.973 – 8.459	90
8.459 – 10.95	60
10.95 – 13.43	86
13.43 – 15.92	56
15.92 – 18.41	84
18.41 – 20.89	52
20.89 – 23.38	74
23.38 – 25.86	48
25.86 – 28.35	71
28.35 – 30.84	44
30.84 – 33.32	65
33.32 – 35.81	40
35.81 – 38.3	58
38.3 – 40.78	34
40.78 – 43.27	47
43.27 – 45.76	26
45.76 – 48.24	35
48.24 – 50.73	20
50.73 – 53.22	29
53.22 – 55.7	18
55.7 – 58.19	27
58.19 – 60.68	18
60.68 – 63.16	27
63.16 – 65.65	17
65.65 – 68.14	24
68.14 – 70.62	16
70.62 – 73.11	24
73.11 – 75.59	15
75.59 – 78.08	21
78.08 – 80.57	13
80.57 – 83.05	15
83.05 – 85.54	10
85.54 – 88.03	10
88.03 – 90.51	4
90.51 – 93	5

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
year	numeric	0.0%
country	categorical	0.0%
country_name	categorical	0.0%
gold	numeric	0.0%
silver	numeric	0.0%
bronze	numeric	0.0%
total	numeric	0.0%
rank_total	numeric	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
	year	gold	silver	bronze	total	rank_total
year	+1.00	-0.10	-0.10	-0.07	-0.09	+0.53
gold	-0.10	+1.00	+0.94	+0.87	+0.97	-0.43
silver	-0.10	+0.94	+1.00	+0.91	+0.98	-0.46
bronze	-0.07	+0.87	+0.91	+1.00	+0.95	-0.50
total	-0.09	+0.97	+0.98	+0.95	+1.00	-0.48
rank_total	+0.53	-0.43	-0.46	-0.50	-0.48	+1.00

year numeric timestamp

Four-digit calendar years spanning 1896 to 2024 with 30 distinct values across 1,433 rows and no nulls. The distribution is left-skewed (skew -0.76) toward recent decades, with a median of 1992 and IQR from 1960 to 2008, suggesting coverage is sparser in the early 20th century. No outliers were flagged.

Treatment: Treat as a temporal feature; bucket by decade or use directly without scaling.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["year"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	30
min	1,896
max	2,024
mean	1982
median	1,992
std	33.95
q1	1,960
q3	2,008
iqr	48
skew	-0.7568
kurtosis	-0.408
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 8.

Distribution of year. Vertical dash marks the median.

Show data table

Histogram bins for year (median: 1992.0).
bin	count
1896 – 1899	10
1899 – 1903	18
1903 – 1906	12
1906 – 1910	19
1910 – 1913	19
1913 – 1917	0
1917 – 1920	22
1920 – 1924	0
1924 – 1927	27
1927 – 1931	33
1931 – 1934	28
1934 – 1938	32
1938 – 1941	0
1941 – 1944	0
1944 – 1948	0
1948 – 1951	38
1951 – 1955	43
1955 – 1958	38
1958 – 1962	44
1962 – 1965	41
1965 – 1969	44
1969 – 1972	48
1972 – 1976	0
1976 – 1979	41
1979 – 1982	36
1982 – 1986	47
1986 – 1989	52
1989 – 1993	64
1993 – 1996	79
1996 – 2000	0
2000 – 2003	80
2003 – 2007	74
2007 – 2010	87
2010 – 2014	86
2014 – 2017	86
2017 – 2021	93
2021 – 2024	92

country categorical feature

Three-letter country codes (e.g., FRA, GBR, USA, DEN, SUI) covering 159 distinct nations across 1433 rows with no nulls. The distribution is remarkably flat — the top value FRA accounts for only 2.1% of rows and entropy ratio is 0.92, so no country dominates. Top counts cluster tightly between 28 and 30, suggesting a near-uniform sampling design rather than organic population weights.

Treatment: Group rare codes or target-encode before modelling; one-hot would create 159 sparse columns.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["country"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	159
top_value	FRA
top_rate	0.02094
cardinality	159
entropy	6.695
entropy_ratio	0.9155

Fig 9.

Top values for country.

Show data table

Top values for country (20 unique shown, of 159 total).
value	count	share
FRA	30	2.1%
GBR	30	2.1%
USA	29	2.0%
DEN	29	2.0%
SUI	29	2.0%
HUN	28	2.0%
AUS	28	2.0%
BEL	28	2.0%
ITA	28	2.0%
SWE	28	2.0%
AUT	27	1.9%
NED	27	1.9%
CAN	27	1.9%
NOR	26	1.8%
FIN	26	1.8%
JPN	23	1.6%
NZL	23	1.6%
POL	23	1.6%
MEX	22	1.5%
GRE	21	1.5%

country_name categorical feature

Categorical country labels with 165 distinct values across 1433 rows and no nulls. Distribution is remarkably flat — the top value 'France' covers only 2.09% of rows, and the top ten countries each appear 28–30 times, giving an entropy ratio of 0.91 (near-uniform). This looks like a panel where each country contributes a similar number of observations rather than a skewed real-world sample.

Treatment: Use as a grouping key or target-encode/one-hot for modelling; consider mapping to ISO codes for joins.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["country_name"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	165
top_value	France
top_rate	0.02094
cardinality	165
entropy	6.715
entropy_ratio	0.9116

Fig 10.

Top values for country_name.

Show data table

Top values for country_name (20 unique shown, of 165 total).
value	count	share
France	30	2.1%
Great Britain	30	2.1%
United States	29	2.0%
Denmark	29	2.0%
Switzerland	29	2.0%
Hungary	28	2.0%
Australia	28	2.0%
Belgium	28	2.0%
Italy	28	2.0%
Sweden	28	2.0%
Austria	27	1.9%
Netherlands	27	1.9%
Canada	27	1.9%
Norway	26	1.8%
Finland	26	1.8%
Japan	23	1.6%
New Zealand	23	1.6%
Poland	23	1.6%
Mexico	22	1.5%
Greece	21	1.5%

gold numeric feature

Numeric count-style feature 'gold' ranging from 0 to 83 with median 1 and mean 4.06, so most rows sit near zero (zero_rate 0.339) while a long tail pulls the average up. Distribution is severely right-skewed (skew 4.26, kurtosis 23.14) with 134 outliers (9.35% of rows) above the q3 of 4. Only 52 unique values across 1433 rows suggests a discrete tally rather than a continuous measurement.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["gold"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	52
min	0
max	83
mean	4.059
median	1
std	8.419
q1	0
q3	4
iqr	4
skew	4.259
kurtosis	23.14
n_outliers	134
outlier_rate	0.09351
zero_rate	0.3391
alert: high_skew	skew=+4.26
alert: outliers	9.4% rows beyond 1.5 IQR

Fig 11.

Distribution of gold. Vertical dash marks the median.

Show data table

Histogram bins for gold (median: 1.0).
bin	count
0 – 2.243	948
2.243 – 4.486	175
4.486 – 6.73	76
6.73 – 8.973	61
8.973 – 11.22	52
11.22 – 13.46	25
13.46 – 15.7	11
15.7 – 17.95	13
17.95 – 20.19	12
20.19 – 22.43	3
22.43 – 24.68	4
24.68 – 26.92	3
26.92 – 29.16	6
29.16 – 31.41	2
31.41 – 33.65	5
33.65 – 35.89	2
35.89 – 38.14	11
38.14 – 40.38	6
40.38 – 42.62	1
42.62 – 44.86	3
44.86 – 47.11	5
47.11 – 49.35	3
49.35 – 51.59	1
51.59 – 53.84	0
53.84 – 56.08	2
56.08 – 58.32	0
58.32 – 60.57	0
60.57 – 62.81	0
62.81 – 65.05	0
65.05 – 67.3	0
67.3 – 69.54	0
69.54 – 71.78	0
71.78 – 74.03	0
74.03 – 76.27	0
76.27 – 78.51	1
78.51 – 80.76	1
80.76 – 83	1

silver numeric feature

A non-negative integer-like count of silver medals or items, with 45 distinct values ranging 0 to 79 and a median of 2. The distribution is heavily right-skewed (skew 4.03, kurtosis 23.2) with 25.3% zeros and 9.8% flagged as outliers, so a small set of large counts dominates the mean (4.04) versus the median.

Treatment: Apply a log1p transform before modelling to tame the skew and heavy tail.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["silver"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	45
min	0
max	79
mean	4.038
median	2
std	7.121
q1	0
q3	4
iqr	4
skew	4.026
kurtosis	23.21
n_outliers	140
outlier_rate	0.0977
zero_rate	0.2526
alert: high_skew	skew=+4.03
alert: outliers	9.8% rows beyond 1.5 IQR

Fig 12.

Distribution of silver. Vertical dash marks the median.

Show data table

Histogram bins for silver (median: 2.0).
bin	count
0 – 2.135	894
2.135 – 4.27	187
4.27 – 6.405	111
6.405 – 8.541	59
8.541 – 10.68	42
10.68 – 12.81	32
12.81 – 14.95	18
14.95 – 17.08	16
17.08 – 19.22	13
19.22 – 21.35	8
21.35 – 23.49	7
23.49 – 25.62	5
25.62 – 27.76	9
27.76 – 29.89	5
29.89 – 32.03	10
32.03 – 34.16	1
34.16 – 36.3	3
36.3 – 38.43	3
38.43 – 40.57	2
40.57 – 42.7	3
42.7 – 44.84	1
44.84 – 46.97	0
46.97 – 49.11	0
49.11 – 51.24	1
51.24 – 53.38	0
53.38 – 55.51	0
55.51 – 57.65	0
57.65 – 59.78	0
59.78 – 61.92	1
61.92 – 64.05	0
64.05 – 66.19	0
66.19 – 68.32	0
68.32 – 70.46	1
70.46 – 72.59	0
72.59 – 74.73	0
74.73 – 76.86	0
76.86 – 79	1

bronze numeric feature

This is a count of bronze medals (or similar bronze-tier tally) per record, with 1433 rows, 44 distinct integer values from 0 to 78, and no nulls. The distribution is heavily right-skewed (skew 3.37, kurtosis 16.94): the median is 2 and Q3 is 5, yet the max reaches 78, producing 150 outliers (10.5%). Roughly 19.8% of rows are zero, so a sizeable share of entities have never won bronze.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["bronze"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	44
min	0
max	78
mean	4.398
median	2
std	6.853
q1	1
q3	5
iqr	4
skew	3.37
kurtosis	16.94
n_outliers	150
outlier_rate	0.1047
zero_rate	0.1982
alert: high_skew	skew=+3.37
alert: outliers	10.5% rows beyond 1.5 IQR

Fig 13.

Distribution of bronze. Vertical dash marks the median.

Show data table

Histogram bins for bronze (median: 2.0).
bin	count
0 – 2.108	841
2.108 – 4.216	205
4.216 – 6.324	105
6.324 – 8.432	53
8.432 – 10.54	60
10.54 – 12.65	41
12.65 – 14.76	23
14.76 – 16.86	22
16.86 – 18.97	17
18.97 – 21.08	12
21.08 – 23.19	10
23.19 – 25.3	5
25.3 – 27.41	8
27.41 – 29.51	5
29.51 – 31.62	8
31.62 – 33.73	4
33.73 – 35.84	3
35.84 – 37.95	3
37.95 – 40.05	3
40.05 – 42.16	2
42.16 – 44.27	0
44.27 – 46.38	2
46.38 – 48.49	0
48.49 – 50.59	0
50.59 – 52.7	0
52.7 – 54.81	0
54.81 – 56.92	0
56.92 – 59.03	0
59.03 – 61.14	0
61.14 – 63.24	0
63.24 – 65.35	0
65.35 – 67.46	0
67.46 – 69.57	0
69.57 – 71.68	0
71.68 – 73.78	0
73.78 – 75.89	0
75.89 – 78	1

total numeric feature

This appears to be a count-style numeric feature (total), heavily right-skewed: the median is 5 while the mean is 12.5 and the max reaches 234. Skew of 3.92 and kurtosis of 20.8 confirm a long tail, with 151 values (10.5%) flagged as outliers. No nulls or zeros, and only 97 unique values across 1,433 rows, suggesting a discrete count with a small repeating vocabulary.

Treatment: log1p-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["total"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	97
min	1
max	234
mean	12.5
median	5
std	21.66
q1	2
q3	13
iqr	11
skew	3.922
kurtosis	20.8
n_outliers	151
outlier_rate	0.1054
zero_rate	0
alert: high_skew	skew=+3.92
alert: outliers	10.5% rows beyond 1.5 IQR

Fig 14.

Distribution of total. Vertical dash marks the median.

Show data table

Histogram bins for total (median: 5.0).
bin	count
1 – 7.297	907
7.297 – 13.59	176
13.59 – 19.89	99
19.89 – 26.19	77
26.19 – 32.49	42
32.49 – 38.78	28
38.78 – 45.08	20
45.08 – 51.38	11
51.38 – 57.68	7
57.68 – 63.97	7
63.97 – 70.27	12
70.27 – 76.57	5
76.57 – 82.86	1
82.86 – 89.16	3
89.16 – 95.46	11
95.46 – 101.8	8
101.8 – 108.1	5
108.1 – 114.4	5
114.4 – 120.6	0
120.6 – 126.9	4
126.9 – 133.2	1
133.2 – 139.5	0
139.5 – 145.8	0
145.8 – 152.1	1
152.1 – 158.4	0
158.4 – 164.7	0
164.7 – 171	0
171 – 177.3	1
177.3 – 183.6	0
183.6 – 189.9	0
189.9 – 196.2	1
196.2 – 202.5	0
202.5 – 208.8	0
208.8 – 215.1	0
215.1 – 221.4	0
221.4 – 227.7	0
227.7 – 234	1

rank_total numeric feature

Integer-valued ranking field spanning 1 to 93 with 93 unique values across 1433 rows, suggesting a complete rank table repeated many times (e.g., per period or per group). Distribution is right-skewed (skew 0.74) with median 26 below mean 31.06, so lower ranks dominate while a tail extends toward 93. No nulls, no zeros, and no outliers flagged given the bounded range.

Treatment: Use as an ordinal feature; consider inverting (e.g., 94 - rank) so higher means better before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["rank_total"].stats

stat	value
n	1,433
nulls	0 (0.0%)
unique	93
min	1
max	93
mean	31.06
median	26
std	22.7
q1	13
q3	45
iqr	32
skew	0.739
kurtosis	-0.3677
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 15.

Distribution of rank_total. Vertical dash marks the median.

Show data table

Histogram bins for rank_total (median: 26.0).
bin	count
1 – 3.486	90
3.486 – 5.973	60
5.973 – 8.459	90
8.459 – 10.95	60
10.95 – 13.43	86
13.43 – 15.92	56
15.92 – 18.41	84
18.41 – 20.89	52
20.89 – 23.38	74
23.38 – 25.86	48
25.86 – 28.35	71
28.35 – 30.84	44
30.84 – 33.32	65
33.32 – 35.81	40
35.81 – 38.3	58
38.3 – 40.78	34
40.78 – 43.27	47
43.27 – 45.76	26
45.76 – 48.24	35
48.24 – 50.73	20
50.73 – 53.22	29
53.22 – 55.7	18
55.7 – 58.19	27
58.19 – 60.68	18
60.68 – 63.16	27
63.16 – 65.65	17
65.65 – 68.14	24
68.14 – 70.62	16
70.62 – 73.11	24
73.11 – 75.59	15
75.59 – 78.08	21
78.08 – 80.57	13
80.57 – 83.05	15
83.05 – 85.54	10
85.54 – 88.03	10
88.03 – 90.51	4
90.51 – 93	5

olympics olympic medals data

Overview

Summary confidence: high

year numeric timestamp

country categorical feature

country_name categorical feature

gold numeric feature

silver numeric feature

bronze numeric feature

total numeric feature

rank_total numeric feature

How to cite