data trove witch trials

source /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json 10,940 rows 6 columns profiled 2026-06-22 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:default

This dataset records historical witch trials across Europe, covering 10,940 cases with information on location, time period, and outcomes (people tried and deaths). Two things stand out immediately: the extreme skew in both 'deaths' and 'tried' — the vast majority of records show zero deaths and just one person tried, yet outliers reach as high as 500, suggesting a small number of mass trials drove most of the carnage. Temporally, activity clusters heavily between roughly 1590–1660 (the IQR), pointing to a well-known peak persecution era, with a long tail back to 1300 worth examining. Geographically, the United Kingdom and Germany together account for over two-thirds of all records, while Geneva dominates city-level entries despite nearly half of city values being missing.

citing: deaths.stats.median · deaths.stats.mean · deaths.stats.max · deaths.stats.zero_rate · deaths.stats.n_outliers · tried.stats.median · tried.stats.max · tried.stats.outlier_rate · year.stats.q1 · year.stats.q3 · year.stats.median · year.stats.min · country.top_values · city.top_values · city.null_rate

Charts the summary said to look at first

decade · Look for the sharp peak around 1620–1650 that marks the height of witch-trial persecution, plus the sparse early centuries.

Show data table

Histogram bins for decade (median: 1620.0).
bin	count
1300 – 1314	32
1314 – 1328	21
1328 – 1341	33
1341 – 1355	4
1355 – 1369	3
1369 – 1382	32
1382 – 1396	25
1396 – 1410	32
1410 – 1424	55
1424 – 1438	52
1438 – 1451	147
1451 – 1465	65
1465 – 1479	68
1479 – 1492	188
1492 – 1506	39
1506 – 1520	43
1520 – 1534	137
1534 – 1548	106
1548 – 1561	420
1561 – 1575	291
1575 – 1589	373
1589 – 1602	1576
1602 – 1616	878
1616 – 1630	934
1630 – 1644	1835
1644 – 1658	872
1658 – 1671	1469
1671 – 1685	260
1685 – 1699	295
1699 – 1712	302
1712 – 1726	105
1726 – 1740	68
1740 – 1754	151
1754 – 1768	8
1768 – 1781	14
1781 – 1795	4
1795 – 1809	0
1809 – 1822	1
1822 – 1836	1
1836 – 1850	1

country · United Kingdom and Germany dominate; compare their shares against smaller nations like Switzerland and France.

Show data table

Top values for country (19 unique shown, of 19 total).
value	count	share
United Kingdom	3750	34.3%
Germany	3417	31.2%
Switzerland	1272	11.6%
France	807	7.4%
Belgium	671	6.1%
Sweden	353	3.2%
Netherlands	314	2.9%
Italy	107	1.0%
Denmark	90	0.8%
Spain	29	0.3%
Hungary	26	0.2%
Norway	20	0.2%
Luxembourg	20	0.2%
Estonia	17	0.2%
Finland	17	0.2%
Austria	16	0.1%
Poland	9	0.1%
Ireland	4	0.0%
Czech Republic	1	0.0%

deaths · The overwhelming zero-death majority versus a thin but extreme tail of mass-execution events tells the core story of how unevenly lethal these trials were.

Show data table

Histogram bins for deaths (median: 0.0).
bin	count
0 – 12.5	10727
12.5 – 25	100
25 – 37.5	43
37.5 – 50	20
50 – 62.5	13
62.5 – 75	15
75 – 87.5	2
87.5 – 100	1
100 – 112.5	4
112.5 – 125	2
125 – 137.5	0
137.5 – 150	0
150 – 162.5	3
162.5 – 175	1
175 – 187.5	0
187.5 – 200	0
200 – 212.5	1
212.5 – 225	1
225 – 237.5	0
237.5 – 250	1
250 – 262.5	0
262.5 – 275	0
275 – 287.5	0
287.5 – 300	0
300 – 312.5	1
312.5 – 325	0
325 – 337.5	0
337.5 – 350	0
350 – 362.5	0
362.5 – 375	0
375 – 387.5	0
387.5 – 400	0
400 – 412.5	0
412.5 – 425	0
425 – 437.5	0
437.5 – 450	0
450 – 462.5	0
462.5 – 475	0
475 – 487.5	0
487.5 – 500	5

city · Geneva leads all cities by a wide margin — check whether its count reflects genuine concentration or a data-recording artefact given the high null rate.

Show data table

Top values for city (20 unique shown, of 906 total).
value	count	share
Geneva	320	2.9%
Budingen	264	2.4%
Bruges	121	1.1%
Munich	80	0.7%
Augsburg	75	0.7%
Vesoul	64	0.6%
Venice	63	0.6%
Boudry	59	0.5%
Valangin	58	0.5%
Fleurier	54	0.5%
Gelnhausen	50	0.5%
Arnsberg	49	0.4%
Thielle-Wavre	47	0.4%
Burghausen	44	0.4%
Grundau	44	0.4%
Neuchatel	42	0.4%
Colombier	42	0.4%
Stuttgart	39	0.4%
Mitterfels	38	0.3%
Namur	37	0.3%

tried · Most trials involved a single accused, but outliers at 500 reveal the rare but devastating mass trials that skew the overall statistics.

Show data table

Histogram bins for tried (median: 1.0).
bin	count
1 – 13.47	10493
13.47 – 25.95	196
25.95 – 38.42	88
38.42 – 50.9	33
50.9 – 63.38	21
63.38 – 75.85	14
75.85 – 88.33	9
88.33 – 100.8	24
100.8 – 113.3	6
113.3 – 125.8	9
125.8 – 138.2	5
138.2 – 150.7	6
150.7 – 163.2	8
163.2 – 175.7	3
175.7 – 188.1	5
188.1 – 200.6	1
200.6 – 213.1	0
213.1 – 225.5	3
225.5 – 238	1
238 – 250.5	1
250.5 – 263	1
263 – 275.4	0
275.4 – 287.9	0
287.9 – 300.4	0
300.4 – 312.9	1
312.9 – 325.3	0
325.3 – 337.8	1
337.8 – 350.3	0
350.3 – 362.8	3
362.8 – 375.2	0
375.2 – 387.7	0
387.7 – 400.2	0
400.2 – 412.7	3
412.7 – 425.1	0
425.1 – 437.6	0
437.6 – 450.1	0
450.1 – 462.6	0
462.6 – 475.1	0
475.1 – 487.5	0
487.5 – 500	5

Schema

6 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
year	numeric	8.5%	430	outliers
decade	numeric	0.0%	53	outliers
city	categorical	47.7%	906	null_rate
country	categorical	0.0%	19
tried	numeric	0.0%	111	high_skew outliers
deaths	numeric	0.0%	74	high_skew outliers

year

numeric timestamp outliers

This column represents a historical year, likely a creation, publication, or event date for records spanning 1300 to 1850 — consistent with a cultural heritage or manuscript dataset. The distribution is strongly left-skewed (skew = -1.59) with an unusually heavy left tail: while 50% of records fall between 1597 and 1660 (IQR = 63 years), the minimum reaches back to 1300, producing 714 outliers (7.1% of rows) at the early extreme. The high kurtosis (4.32) confirms a sharp central peak around the median of 1630 with fat tails, and an 8.5% null rate warrants attention before temporal analysis. Treatment: Treat as an ordinal/temporal feature; investigate records below ~1500 as potential data-quality issues or deliberate historical outliers before binning into periods for modelling. high · anthropic:default

n: 10,940
nulls: 931 (8.5%)
unique: 430
min: 1,300
max: 1,850
mean: 1621
median: 1,630
std: 66.25
q1: 1,597
q3: 1,660
iqr: 63
skew: -1.586
kurtosis: 4.319
n_outliers: 714
outlier_rate: 0.07134
zero_rate: 0

decade

numeric feature outliers

This column represents a decade or year of origin — likely a composition or publication year — spanning 1300 to 1850, with only 53 distinct values across 10,940 rows. The distribution is heavily left-skewed (skew = −1.48) with high kurtosis (3.85), meaning most records cluster in the 1590–1650 range (IQR = 60 years) while a long tail stretches back to 1300. Notably, 848 rows (≈7.75%) are flagged as outliers, likely corresponding to early-period records far from the central mass near the median of 1620. Treatment: Treat as an ordinal/temporal feature; consider binning into broader periods or applying a robust scaler given the heavy left tail and outlier concentration. high · anthropic:default

n: 10,940
nulls: 0 (0.0%)
unique: 53
min: 1,300
max: 1,850
mean: 1615
median: 1,620
std: 66.68
q1: 1,590
q3: 1,650
iqr: 60
skew: -1.478
kurtosis: 3.85
n_outliers: 848
outlier_rate: 0.07751
zero_rate: 0

city

categorical feature null_rate

This column contains city names, likely representing geographic origin or location associated with records in the dataset. The null rate of 47.65% is a significant concern — nearly half of all 10,940 rows are missing a city value, triggering an alert. With 906 unique cities and an entropy ratio of 0.856, the distribution is fairly broad, yet Geneva dominates with 320 occurrences (5.59% of non-null rows). The top cities — Geneva, Budingen, Bruges, Munich, Augsburg, Venice — suggest a historical European dataset, possibly pre-modern trade, migration, or administrative records. Treatment: Impute or flag nulls (47.65% missing); encode as categorical feature, potentially grouping rare cities below a frequency threshold given 906 unique values. high · anthropic:default

n: 10,940
nulls: 5,213 (47.7%)
unique: 906
top_value: Geneva
top_rate: 0.05588
cardinality: 906
entropy: 8.406
entropy_ratio: 0.8557

country

categorical feature

This column records the country of origin or location for each record, covering 19 distinct countries across 10,940 rows with no nulls. The distribution is heavily concentrated in Western Europe, with the United Kingdom (3,750 rows, 34.3%) and Germany (3,417 rows) together accounting for roughly two-thirds of all records. Switzerland (1,272) and France (807) are the next largest groups, while the remaining 15 countries collectively represent a small tail — Spain, for example, appears only 29 times. The entropy ratio of 0.59 reflects this moderate-to-high imbalance, which could bias any model trained on country as a feature. Treatment: One-hot encode or target-encode; consider grouping low-frequency countries (e.g., Spain with 29 rows) into an 'Other' bucket to reduce sparsity. high · anthropic:default

n: 10,940
nulls: 0 (0.0%)
unique: 19
top_value: United Kingdom
top_rate: 0.3428
cardinality: 19
entropy: 2.502
entropy_ratio: 0.5889

tried

numeric feature high_skew outliers

This column likely records the number of attempts made for some action (e.g., quiz attempts, login tries, or retry counts), given its name 'tried' and integer-like distribution starting at 1. The distribution is extremely concentrated: Q1, median, and Q3 are all 1.0, yet the mean is ~3.95 and the max is 500, indicating a tiny fraction of records drive nearly all the variance. With 22.5% of rows flagged as outliers and a kurtosis of 316, the tail is extraordinarily heavy and the bulk of users attempt something exactly once. Treatment: Log-transform (log1p) before modelling, or cap at a meaningful percentile threshold to reduce outlier influence; consider binning into ordinal buckets (1, 2–5, 6+). high · anthropic:default

n: 10,940
nulls: 0 (0.0%)
unique: 111
min: 1
max: 500
mean: 3.952
median: 1
std: 19.26
q1: 1
q3: 1
iqr: 0
skew: 15.61
kurtosis: 316
n_outliers: 2,457
outlier_rate: 0.2246
zero_rate: 0

deaths

numeric numeric_target high_skew outliers

This column records death counts per observation (likely per event, location, or time period), with the vast majority of rows — 75.5% — being exactly zero and a median of 0. The distribution is extraordinarily right-skewed (skew=28.52, kurtosis=991.65), driven by rare but extreme values reaching a maximum of 500; notably, 24.5% of rows are flagged as outliers, meaning non-zero death counts are statistically rare but not negligible. Only 74 unique values across 10,940 rows confirms the heavily zero-inflated, discrete nature of the data. Treatment: Model with zero-inflated or negative-binomial regression; apply log1p transform if used as a feature in standard ML pipelines. high · anthropic:default

n: 10,940
nulls: 0 (0.0%)
unique: 74
min: 0
max: 500
mean: 1.493
median: 0
std: 13.19
q1: 0
q3: 0
iqr: 0
skew: 28.52
kurtosis: 991.6
n_outliers: 2,684
outlier_rate: 0.2453
zero_rate: 0.7547