quirky-witch_trials · saturn notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json

Saturn profiled 10,940 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json",
    "--findings", "quirky-witch_trials.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 10,940 historical witch trial records across 6 columns, covering when and where trials occurred, how many people were tried, and how many died. Trials span from 1300 to 1850, with the bulk concentrated around the early 1600s (median year 1630), and they are heavily dominated by the United Kingdom (3,750 records) and Germany (3,417), which together account for roughly two-thirds of the data. The 'deaths' and 'tried' columns are extremely skewed: 75% of records report zero deaths, yet a small set of outlier events reach up to 500, so any aggregate analysis should treat these tails carefully. Also worth flagging: the 'city' field is 47.6% null and spans 906 unique values, so geographic analysis below the country level will be patchy.

citing: row_count · column_count · deaths.stats · tried.stats · year.stats · decade.stats · country.top_values · city.null_rate · city.n_unique

Out[4]:

saturn.schema() · 6 columns

column	kind	n	null%	unique	alerts
year	numeric	10,940	8.5%	430	outliers
decade	numeric	10,940	0.0%	53	outliers
city	categorical	10,940	47.7%	906	null_rate
country	categorical	10,940	0.0%	19
tried	numeric	10,940	0.0%	111	high_skew outliers
deaths	numeric	10,940	0.0%	74	high_skew outliers

Fig 1.

country · Shows how heavily the dataset is concentrated in the UK and Germany versus the long tail of other countries.

Show data table

Top values for country (19 unique shown, of 19 total).
value	count	share
United Kingdom	3750	34.3%
Germany	3417	31.2%
Switzerland	1272	11.6%
France	807	7.4%
Belgium	671	6.1%
Sweden	353	3.2%
Netherlands	314	2.9%
Italy	107	1.0%
Denmark	90	0.8%
Spain	29	0.3%
Hungary	26	0.2%
Norway	20	0.2%
Luxembourg	20	0.2%
Estonia	17	0.2%
Finland	17	0.2%
Austria	16	0.1%
Poland	9	0.1%
Ireland	4	0.0%
Czech Republic	1	0.0%

Fig 2.

year · Reveals when trials clustered in time — expect a sharp peak around the early 1600s.

Show data table

Histogram bins for year (median: 1630.0).
bin	count
1300 – 1314	13
1314 – 1328	39
1328 – 1341	29
1341 – 1355	8
1355 – 1369	2
1369 – 1382	16
1382 – 1396	29
1396 – 1410	36
1410 – 1424	31
1424 – 1438	49
1438 – 1451	78
1451 – 1465	85
1465 – 1479	86
1479 – 1492	120
1492 – 1506	68
1506 – 1520	40
1520 – 1534	77
1534 – 1548	101
1548 – 1561	145
1561 – 1575	338
1575 – 1589	405
1589 – 1602	1075
1602 – 1616	971
1616 – 1630	1042
1630 – 1644	936
1644 – 1658	1425
1658 – 1671	1372
1671 – 1685	415
1685 – 1699	307
1699 – 1712	276
1712 – 1726	139
1726 – 1740	90
1740 – 1754	128
1754 – 1768	21
1768 – 1781	9
1781 – 1795	5
1795 – 1809	0
1809 – 1822	0
1822 – 1836	2
1836 – 1850	1

Fig 3.

deaths · Highlights the extreme skew: most trials record zero deaths, but a few reach into the hundreds.

Show data table

Histogram bins for deaths (median: 0.0).
bin	count
0 – 12.5	10727
12.5 – 25	100
25 – 37.5	43
37.5 – 50	20
50 – 62.5	13
62.5 – 75	15
75 – 87.5	2
87.5 – 100	1
100 – 112.5	4
112.5 – 125	2
125 – 137.5	0
137.5 – 150	0
150 – 162.5	3
162.5 – 175	1
175 – 187.5	0
187.5 – 200	0
200 – 212.5	1
212.5 – 225	1
225 – 237.5	0
237.5 – 250	1
250 – 262.5	0
262.5 – 275	0
275 – 287.5	0
287.5 – 300	0
300 – 312.5	1
312.5 – 325	0
325 – 337.5	0
337.5 – 350	0
350 – 362.5	0
362.5 – 375	0
375 – 387.5	0
387.5 – 400	0
400 – 412.5	0
412.5 – 425	0
425 – 437.5	0
437.5 – 450	0
450 – 462.5	0
462.5 – 475	0
475 – 487.5	0
487.5 – 500	5

Fig 4.

tried · Most events involve a single accused person, but rare mass trials stretch the tail to 500.

Show data table

Histogram bins for tried (median: 1.0).
bin	count
1 – 13.47	10493
13.47 – 25.95	196
25.95 – 38.42	88
38.42 – 50.9	33
50.9 – 63.38	21
63.38 – 75.85	14
75.85 – 88.33	9
88.33 – 100.8	24
100.8 – 113.3	6
113.3 – 125.8	9
125.8 – 138.2	5
138.2 – 150.7	6
150.7 – 163.2	8
163.2 – 175.7	3
175.7 – 188.1	5
188.1 – 200.6	1
200.6 – 213.1	0
213.1 – 225.5	3
225.5 – 238	1
238 – 250.5	1
250.5 – 263	1
263 – 275.4	0
275.4 – 287.9	0
287.9 – 300.4	0
300.4 – 312.9	1
312.9 – 325.3	0
325.3 – 337.8	1
337.8 – 350.3	0
350.3 – 362.8	3
362.8 – 375.2	0
375.2 – 387.7	0
387.7 – 400.2	0
400.2 – 412.7	3
412.7 – 425.1	0
425.1 – 437.6	0
437.6 – 450.1	0
450.1 – 462.6	0
462.6 – 475.1	0
475.1 – 487.5	0
487.5 – 500	5

Fig 5.

decade · A coarser view of temporal concentration, useful for spotting which decades dominate the record.

Show data table

Histogram bins for decade (median: 1620.0).
bin	count
1300 – 1314	32
1314 – 1328	21
1328 – 1341	33
1341 – 1355	4
1355 – 1369	3
1369 – 1382	32
1382 – 1396	25
1396 – 1410	32
1410 – 1424	55
1424 – 1438	52
1438 – 1451	147
1451 – 1465	65
1465 – 1479	68
1479 – 1492	188
1492 – 1506	39
1506 – 1520	43
1520 – 1534	137
1534 – 1548	106
1548 – 1561	420
1561 – 1575	291
1575 – 1589	373
1589 – 1602	1576
1602 – 1616	878
1616 – 1630	934
1630 – 1644	1835
1644 – 1658	872
1658 – 1671	1469
1671 – 1685	260
1685 – 1699	295
1699 – 1712	302
1712 – 1726	105
1726 – 1740	68
1740 – 1754	151
1754 – 1768	8
1768 – 1781	14
1781 – 1795	4
1795 – 1809	0
1809 – 1822	1
1822 – 1836	1
1836 – 1850	1

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
year	numeric	8.5%
decade	numeric	0.0%
city	categorical	47.7%
country	categorical	0.0%
tried	numeric	0.0%
deaths	numeric	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	year	decade	tried	deaths
year	+1.00	+0.36	-0.19	-0.16
decade	+0.36	+1.00	+0.00	+0.03
tried	-0.19	+0.00	+1.00	+0.65
deaths	-0.16	+0.03	+0.65	+1.00

year numeric timestamp

This is a 'year' field spanning 1300-1850 with median 1630, so it likely records the creation or publication year of historical items rather than modern records. The distribution is left-skewed (skew -1.59, kurtosis 4.32) with 714 outliers (7.13%) sitting in the early-century tail, and 8.51% of rows are null. The tight IQR of 63 years (1597-1660) shows the corpus concentrates heavily in the late-Renaissance/early-Baroque period.

Treatment: Impute or bucket into era bins before modelling; consider winsorising the pre-1500 tail.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["year"].stats

stat	value
n	10,940
nulls	931 (8.5%)
unique	430
min	1,300
max	1,850
mean	1621
median	1,630
std	66.25
q1	1,597
q3	1,660
iqr	63
skew	-1.586
kurtosis	4.319
n_outliers	714
outlier_rate	0.07134
zero_rate	0
alert: outliers	7.1% rows beyond 1.5 IQR

Fig 8.

Distribution of year. Vertical dash marks the median.

Show data table

Histogram bins for year (median: 1630.0).
bin	count
1300 – 1314	13
1314 – 1328	39
1328 – 1341	29
1341 – 1355	8
1355 – 1369	2
1369 – 1382	16
1382 – 1396	29
1396 – 1410	36
1410 – 1424	31
1424 – 1438	49
1438 – 1451	78
1451 – 1465	85
1465 – 1479	86
1479 – 1492	120
1492 – 1506	68
1506 – 1520	40
1520 – 1534	77
1534 – 1548	101
1548 – 1561	145
1561 – 1575	338
1575 – 1589	405
1589 – 1602	1075
1602 – 1616	971
1616 – 1630	1042
1630 – 1644	936
1644 – 1658	1425
1658 – 1671	1372
1671 – 1685	415
1685 – 1699	307
1699 – 1712	276
1712 – 1726	139
1726 – 1740	90
1740 – 1754	128
1754 – 1768	21
1768 – 1781	9
1781 – 1795	5
1795 – 1809	0
1809 – 1822	0
1822 – 1836	2
1836 – 1850	1

decade numeric timestamp

This is a 'decade' field expressed as a four-digit year, ranging from 1300 to 1850 with a median of 1620 and a tight IQR of 60 years (Q1 1590, Q3 1650). The distribution is strongly left-skewed (skew -1.48, kurtosis 3.85) and 848 rows (7.75%) fall outside the Tukey fences — likely the pre-1500 records pulling the tail. With only 53 unique values across 10,940 rows and no nulls, it behaves as a coarse temporal bin rather than a continuous measurement.

Treatment: Treat as an ordered categorical decade bin; consider grouping pre-1500 tail before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["decade"].stats

stat	value
n	10,940
nulls	0 (0.0%)
unique	53
min	1,300
max	1,850
mean	1615
median	1,620
std	66.68
q1	1,590
q3	1,650
iqr	60
skew	-1.478
kurtosis	3.85
n_outliers	848
outlier_rate	0.07751
zero_rate	0
alert: outliers	7.8% rows beyond 1.5 IQR

Fig 9.

Distribution of decade. Vertical dash marks the median.

Show data table

Histogram bins for decade (median: 1620.0).
bin	count
1300 – 1314	32
1314 – 1328	21
1328 – 1341	33
1341 – 1355	4
1355 – 1369	3
1369 – 1382	32
1382 – 1396	25
1396 – 1410	32
1410 – 1424	55
1424 – 1438	52
1438 – 1451	147
1451 – 1465	65
1465 – 1479	68
1479 – 1492	188
1492 – 1506	39
1506 – 1520	43
1520 – 1534	137
1534 – 1548	106
1548 – 1561	420
1561 – 1575	291
1575 – 1589	373
1589 – 1602	1576
1602 – 1616	878
1616 – 1630	934
1630 – 1644	1835
1644 – 1658	872
1658 – 1671	1469
1671 – 1685	260
1685 – 1699	295
1699 – 1712	302
1712 – 1726	105
1726 – 1740	68
1740 – 1754	151
1754 – 1768	8
1768 – 1781	14
1781 – 1795	4
1795 – 1809	0
1809 – 1822	1
1822 – 1836	1
1836 – 1850	1

city categorical feature

This is a city-name categorical feature with 906 distinct values across 10,940 rows, dominated by European locations (Geneva, Budingen, Bruges, Munich). Nearly half the rows are null (47.65%), and even the top value only covers 5.59% of non-nulls, giving high entropy (ratio 0.856) and a long tail. The null rate combined with high cardinality is the main concern for downstream use.

Treatment: Impute or flag missingness, then group rare cities into an 'other' bucket before encoding.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["city"].stats

stat	value
n	10,940
nulls	5,213 (47.7%)
unique	906
top_value	Geneva
top_rate	0.05588
cardinality	906
entropy	8.406
entropy_ratio	0.8557
alert: null_rate	47.7% null

Fig 10.

Top values for city.

Show data table

Top values for city (20 unique shown, of 906 total).
value	count	share
Geneva	320	2.9%
Budingen	264	2.4%
Bruges	121	1.1%
Munich	80	0.7%
Augsburg	75	0.7%
Vesoul	64	0.6%
Venice	63	0.6%
Boudry	59	0.5%
Valangin	58	0.5%
Fleurier	54	0.5%
Gelnhausen	50	0.5%
Arnsberg	49	0.4%
Thielle-Wavre	47	0.4%
Burghausen	44	0.4%
Grundau	44	0.4%
Neuchatel	42	0.4%
Colombier	42	0.4%
Stuttgart	39	0.4%
Mitterfels	38	0.3%
Namur	37	0.3%

country categorical feature

Country of origin as a categorical label, with 19 distinct values across 10,940 rows and no nulls. The distribution is European-heavy and concentrated: United Kingdom alone accounts for 34.3% (3,750), Germany 3,417, and Switzerland 1,272, while the long tail (Italy, Denmark, Spain) drops to double or even single digits. Entropy ratio of 0.59 confirms moderate concentration rather than a uniform spread.

Treatment: One-hot encode the top countries and bucket the low-frequency tail into 'Other' before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["country"].stats

stat	value
n	10,940
nulls	0 (0.0%)
unique	19
top_value	United Kingdom
top_rate	0.3428
cardinality	19
entropy	2.502
entropy_ratio	0.5889

Fig 11.

Top values for country.

Show data table

Top values for country (19 unique shown, of 19 total).
value	count	share
United Kingdom	3750	34.3%
Germany	3417	31.2%
Switzerland	1272	11.6%
France	807	7.4%
Belgium	671	6.1%
Sweden	353	3.2%
Netherlands	314	2.9%
Italy	107	1.0%
Denmark	90	0.8%
Spain	29	0.3%
Hungary	26	0.2%
Norway	20	0.2%
Luxembourg	20	0.2%
Estonia	17	0.2%
Finland	17	0.2%
Austria	16	0.1%
Poland	9	0.1%
Ireland	4	0.0%
Czech Republic	1	0.0%

tried numeric feature

`tried` is a numeric attempt counter that is heavily concentrated at 1: the median, q1, and q3 are all 1.0, yet values stretch up to 500.0 with a mean of 3.95. Skew of 15.6 and kurtosis of 316 confirm an extremely long right tail, and 2457 rows (22.5%) flag as outliers under the IQR rule even though the IQR itself is 0. With only 111 distinct values and no nulls or zeros, this looks like a 'number of tries' field where most events succeed immediately and a small minority retry many times.

Treatment: log1p-transform or cap at a high quantile before modelling to tame the long tail.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["tried"].stats

stat	value
n	10,940
nulls	0 (0.0%)
unique	111
min	1
max	500
mean	3.952
median	1
std	19.26
q1	1
q3	1
iqr	0
skew	15.61
kurtosis	316
n_outliers	2,457
outlier_rate	0.2246
zero_rate	0
alert: high_skew	skew=+15.61
alert: outliers	22.5% rows beyond 1.5 IQR

Fig 12.

Distribution of tried. Vertical dash marks the median.

Show data table

Histogram bins for tried (median: 1.0).
bin	count
1 – 13.47	10493
13.47 – 25.95	196
25.95 – 38.42	88
38.42 – 50.9	33
50.9 – 63.38	21
63.38 – 75.85	14
75.85 – 88.33	9
88.33 – 100.8	24
100.8 – 113.3	6
113.3 – 125.8	9
125.8 – 138.2	5
138.2 – 150.7	6
150.7 – 163.2	8
163.2 – 175.7	3
175.7 – 188.1	5
188.1 – 200.6	1
200.6 – 213.1	0
213.1 – 225.5	3
225.5 – 238	1
238 – 250.5	1
250.5 – 263	1
263 – 275.4	0
275.4 – 287.9	0
287.9 – 300.4	0
300.4 – 312.9	1
312.9 – 325.3	0
325.3 – 337.8	1
337.8 – 350.3	0
350.3 – 362.8	3
362.8 – 375.2	0
375.2 – 387.7	0
387.7 – 400.2	0
400.2 – 412.7	3
412.7 – 425.1	0
425.1 – 437.6	0
437.6 – 450.1	0
450.1 – 462.6	0
462.6 – 475.1	0
475.1 – 487.5	0
487.5 – 500	5

deaths numeric numeric_target

Numeric count of deaths per record, with 10940 rows and only 74 distinct values. The distribution is heavily zero-inflated (zero_rate 0.7547) with median, q1, and q3 all at 0, yet the max reaches 500 and skew is 28.52 with kurtosis 991.65. Roughly 24.5% of rows (2684) flag as outliers, signalling rare but extreme mortality events rather than dirty data.

Treatment: Model with a zero-inflated or hurdle approach, or log1p-transform before regression.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["deaths"].stats

stat	value
n	10,940
nulls	0 (0.0%)
unique	74
min	0
max	500
mean	1.493
median	0
std	13.19
q1	0
q3	0
iqr	0
skew	28.52
kurtosis	991.6
n_outliers	2,684
outlier_rate	0.2453
zero_rate	0.7547
alert: high_skew	skew=+28.52
alert: outliers	24.5% rows beyond 1.5 IQR

Fig 13.

Distribution of deaths. Vertical dash marks the median.

Show data table

Histogram bins for deaths (median: 0.0).
bin	count
0 – 12.5	10727
12.5 – 25	100
25 – 37.5	43
37.5 – 50	20
50 – 62.5	13
62.5 – 75	15
75 – 87.5	2
87.5 – 100	1
100 – 112.5	4
112.5 – 125	2
125 – 137.5	0
137.5 – 150	0
150 – 162.5	3
162.5 – 175	1
175 – 187.5	0
187.5 – 200	0
200 – 212.5	1
212.5 – 225	1
225 – 237.5	0
237.5 – 250	1
250 – 262.5	0
262.5 – 275	0
275 – 287.5	0
287.5 – 300	0
300 – 312.5	1
312.5 – 325	0
325 – 337.5	0
337.5 – 350	0
350 – 362.5	0
362.5 – 375	0
375 – 387.5	0
387.5 – 400	0
400 – 412.5	0
412.5 – 425	0
425 – 437.5	0
437.5 – 450	0
450 – 462.5	0
462.5 – 475	0
475 – 487.5	0
487.5 – 500	5

quirky witch trials

Overview

Summary confidence: high

year numeric timestamp

decade numeric timestamp

city categorical feature

country categorical feature

tried numeric feature

deaths numeric numeric_target

How to cite