quirky witch trials

source /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json 10,940 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset catalogs 10,940 historical witch trial records across 6 columns, covering when and where trials occurred, how many people were tried, and how many died. Trials span from 1300 to 1850, with the bulk concentrated around the early 1600s (median year 1630), and they are heavily dominated by the United Kingdom (3,750 records) and Germany (3,417), which together account for roughly two-thirds of the data. The 'deaths' and 'tried' columns are extremely skewed: 75% of records report zero deaths, yet a small set of outlier events reach up to 500, so any aggregate analysis should treat these tails carefully. Also worth flagging: the 'city' field is 47.6% null and spans 906 unique values, so geographic analysis below the country level will be patchy.

citing: row_count · column_count · deaths.stats · tried.stats · year.stats · decade.stats · country.top_values · city.null_rate · city.n_unique

Charts the summary said to look at first

country · Shows how heavily the dataset is concentrated in the UK and Germany versus the long tail of other countries.

Show data table

Top values for country (19 unique shown, of 19 total).
value	count	share
United Kingdom	3750	34.3%
Germany	3417	31.2%
Switzerland	1272	11.6%
France	807	7.4%
Belgium	671	6.1%
Sweden	353	3.2%
Netherlands	314	2.9%
Italy	107	1.0%
Denmark	90	0.8%
Spain	29	0.3%
Hungary	26	0.2%
Norway	20	0.2%
Luxembourg	20	0.2%
Estonia	17	0.2%
Finland	17	0.2%
Austria	16	0.1%
Poland	9	0.1%
Ireland	4	0.0%
Czech Republic	1	0.0%

year · Reveals when trials clustered in time — expect a sharp peak around the early 1600s.

Show data table

Histogram bins for year (median: 1630.0).
bin	count
1300 – 1314	13
1314 – 1328	39
1328 – 1341	29
1341 – 1355	8
1355 – 1369	2
1369 – 1382	16
1382 – 1396	29
1396 – 1410	36
1410 – 1424	31
1424 – 1438	49
1438 – 1451	78
1451 – 1465	85
1465 – 1479	86
1479 – 1492	120
1492 – 1506	68
1506 – 1520	40
1520 – 1534	77
1534 – 1548	101
1548 – 1561	145
1561 – 1575	338
1575 – 1589	405
1589 – 1602	1075
1602 – 1616	971
1616 – 1630	1042
1630 – 1644	936
1644 – 1658	1425
1658 – 1671	1372
1671 – 1685	415
1685 – 1699	307
1699 – 1712	276
1712 – 1726	139
1726 – 1740	90
1740 – 1754	128
1754 – 1768	21
1768 – 1781	9
1781 – 1795	5
1795 – 1809	0
1809 – 1822	0
1822 – 1836	2
1836 – 1850	1

deaths · Highlights the extreme skew: most trials record zero deaths, but a few reach into the hundreds.

Show data table

Histogram bins for deaths (median: 0.0).
bin	count
0 – 12.5	10727
12.5 – 25	100
25 – 37.5	43
37.5 – 50	20
50 – 62.5	13
62.5 – 75	15
75 – 87.5	2
87.5 – 100	1
100 – 112.5	4
112.5 – 125	2
125 – 137.5	0
137.5 – 150	0
150 – 162.5	3
162.5 – 175	1
175 – 187.5	0
187.5 – 200	0
200 – 212.5	1
212.5 – 225	1
225 – 237.5	0
237.5 – 250	1
250 – 262.5	0
262.5 – 275	0
275 – 287.5	0
287.5 – 300	0
300 – 312.5	1
312.5 – 325	0
325 – 337.5	0
337.5 – 350	0
350 – 362.5	0
362.5 – 375	0
375 – 387.5	0
387.5 – 400	0
400 – 412.5	0
412.5 – 425	0
425 – 437.5	0
437.5 – 450	0
450 – 462.5	0
462.5 – 475	0
475 – 487.5	0
487.5 – 500	5

tried · Most events involve a single accused person, but rare mass trials stretch the tail to 500.

Show data table

Histogram bins for tried (median: 1.0).
bin	count
1 – 13.47	10493
13.47 – 25.95	196
25.95 – 38.42	88
38.42 – 50.9	33
50.9 – 63.38	21
63.38 – 75.85	14
75.85 – 88.33	9
88.33 – 100.8	24
100.8 – 113.3	6
113.3 – 125.8	9
125.8 – 138.2	5
138.2 – 150.7	6
150.7 – 163.2	8
163.2 – 175.7	3
175.7 – 188.1	5
188.1 – 200.6	1
200.6 – 213.1	0
213.1 – 225.5	3
225.5 – 238	1
238 – 250.5	1
250.5 – 263	1
263 – 275.4	0
275.4 – 287.9	0
287.9 – 300.4	0
300.4 – 312.9	1
312.9 – 325.3	0
325.3 – 337.8	1
337.8 – 350.3	0
350.3 – 362.8	3
362.8 – 375.2	0
375.2 – 387.7	0
387.7 – 400.2	0
400.2 – 412.7	3
412.7 – 425.1	0
425.1 – 437.6	0
437.6 – 450.1	0
450.1 – 462.6	0
462.6 – 475.1	0
475.1 – 487.5	0
487.5 – 500	5

decade · A coarser view of temporal concentration, useful for spotting which decades dominate the record.

Show data table

Histogram bins for decade (median: 1620.0).
bin	count
1300 – 1314	32
1314 – 1328	21
1328 – 1341	33
1341 – 1355	4
1355 – 1369	3
1369 – 1382	32
1382 – 1396	25
1396 – 1410	32
1410 – 1424	55
1424 – 1438	52
1438 – 1451	147
1451 – 1465	65
1465 – 1479	68
1479 – 1492	188
1492 – 1506	39
1506 – 1520	43
1520 – 1534	137
1534 – 1548	106
1548 – 1561	420
1561 – 1575	291
1575 – 1589	373
1589 – 1602	1576
1602 – 1616	878
1616 – 1630	934
1630 – 1644	1835
1644 – 1658	872
1658 – 1671	1469
1671 – 1685	260
1685 – 1699	295
1699 – 1712	302
1712 – 1726	105
1726 – 1740	68
1740 – 1754	151
1754 – 1768	8
1768 – 1781	14
1781 – 1795	4
1795 – 1809	0
1809 – 1822	1
1822 – 1836	1
1836 – 1850	1

Schema

6 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
year	numeric	8.5%	430	outliers
decade	numeric	0.0%	53	outliers
city	categorical	47.7%	906	null_rate
country	categorical	0.0%	19
tried	numeric	0.0%	111	high_skew outliers
deaths	numeric	0.0%	74	high_skew outliers

year

numeric timestamp outliers

This is a 'year' field spanning 1300-1850 with median 1630, so it likely records the creation or publication year of historical items rather than modern records. The distribution is left-skewed (skew -1.59, kurtosis 4.32) with 714 outliers (7.13%) sitting in the early-century tail, and 8.51% of rows are null. The tight IQR of 63 years (1597-1660) shows the corpus concentrates heavily in the late-Renaissance/early-Baroque period. Treatment: Impute or bucket into era bins before modelling; consider winsorising the pre-1500 tail. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 931 (8.5%)
unique: 430
min: 1,300
max: 1,850
mean: 1621
median: 1,630
std: 66.25
q1: 1,597
q3: 1,660
iqr: 63
skew: -1.586
kurtosis: 4.319
n_outliers: 714
outlier_rate: 0.07134
zero_rate: 0

decade

numeric timestamp outliers

This is a 'decade' field expressed as a four-digit year, ranging from 1300 to 1850 with a median of 1620 and a tight IQR of 60 years (Q1 1590, Q3 1650). The distribution is strongly left-skewed (skew -1.48, kurtosis 3.85) and 848 rows (7.75%) fall outside the Tukey fences — likely the pre-1500 records pulling the tail. With only 53 unique values across 10,940 rows and no nulls, it behaves as a coarse temporal bin rather than a continuous measurement. Treatment: Treat as an ordered categorical decade bin; consider grouping pre-1500 tail before modelling. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 0 (0.0%)
unique: 53
min: 1,300
max: 1,850
mean: 1615
median: 1,620
std: 66.68
q1: 1,590
q3: 1,650
iqr: 60
skew: -1.478
kurtosis: 3.85
n_outliers: 848
outlier_rate: 0.07751
zero_rate: 0

city

categorical feature null_rate

This is a city-name categorical feature with 906 distinct values across 10,940 rows, dominated by European locations (Geneva, Budingen, Bruges, Munich). Nearly half the rows are null (47.65%), and even the top value only covers 5.59% of non-nulls, giving high entropy (ratio 0.856) and a long tail. The null rate combined with high cardinality is the main concern for downstream use. Treatment: Impute or flag missingness, then group rare cities into an 'other' bucket before encoding. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 5,213 (47.7%)
unique: 906
top_value: Geneva
top_rate: 0.05588
cardinality: 906
entropy: 8.406
entropy_ratio: 0.8557

country

categorical feature

Country of origin as a categorical label, with 19 distinct values across 10,940 rows and no nulls. The distribution is European-heavy and concentrated: United Kingdom alone accounts for 34.3% (3,750), Germany 3,417, and Switzerland 1,272, while the long tail (Italy, Denmark, Spain) drops to double or even single digits. Entropy ratio of 0.59 confirms moderate concentration rather than a uniform spread. Treatment: One-hot encode the top countries and bucket the low-frequency tail into 'Other' before modelling. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 0 (0.0%)
unique: 19
top_value: United Kingdom
top_rate: 0.3428
cardinality: 19
entropy: 2.502
entropy_ratio: 0.5889

tried

numeric feature high_skew outliers

`tried` is a numeric attempt counter that is heavily concentrated at 1: the median, q1, and q3 are all 1.0, yet values stretch up to 500.0 with a mean of 3.95. Skew of 15.6 and kurtosis of 316 confirm an extremely long right tail, and 2457 rows (22.5%) flag as outliers under the IQR rule even though the IQR itself is 0. With only 111 distinct values and no nulls or zeros, this looks like a 'number of tries' field where most events succeed immediately and a small minority retry many times. Treatment: log1p-transform or cap at a high quantile before modelling to tame the long tail. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 0 (0.0%)
unique: 111
min: 1
max: 500
mean: 3.952
median: 1
std: 19.26
q1: 1
q3: 1
iqr: 0
skew: 15.61
kurtosis: 316
n_outliers: 2,457
outlier_rate: 0.2246
zero_rate: 0

deaths

numeric numeric_target high_skew outliers

Numeric count of deaths per record, with 10940 rows and only 74 distinct values. The distribution is heavily zero-inflated (zero_rate 0.7547) with median, q1, and q3 all at 0, yet the max reaches 500 and skew is 28.52 with kurtosis 991.65. Roughly 24.5% of rows (2684) flag as outliers, signalling rare but extreme mortality events rather than dirty data. Treatment: Model with a zero-inflated or hurdle approach, or log1p-transform before regression. high · anthropic:claude-opus-4-7

n: 10,940
nulls: 0 (0.0%)
unique: 74
min: 0
max: 500
mean: 1.493
median: 0
std: 13.19
q1: 0
q3: 0
iqr: 0
skew: 28.52
kurtosis: 991.6
n_outliers: 2,684
outlier_rate: 0.2453
zero_rate: 0.7547