data trove witch trials
Reading
This dataset records historical witch trials across Europe, covering 10,940 cases with information on location, time period, and outcomes (people tried and deaths). Two things stand out immediately: the extreme skew in both 'deaths' and 'tried' — the vast majority of records show zero deaths and just one person tried, yet outliers reach as high as 500, suggesting a small number of mass trials drove most of the carnage. Temporally, activity clusters heavily between roughly 1590–1660 (the IQR), pointing to a well-known peak persecution era, with a long tail back to 1300 worth examining. Geographically, the United Kingdom and Germany together account for over two-thirds of all records, while Geneva dominates city-level entries despite nearly half of city values being missing.
citing: deaths.stats.median · deaths.stats.mean · deaths.stats.max · deaths.stats.zero_rate · deaths.stats.n_outliers · tried.stats.median · tried.stats.max · tried.stats.outlier_rate · year.stats.q1 · year.stats.q3 · year.stats.median · year.stats.min · country.top_values · city.top_values · city.null_rate
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1300 – 1314 | 32 |
| 1314 – 1328 | 21 |
| 1328 – 1341 | 33 |
| 1341 – 1355 | 4 |
| 1355 – 1369 | 3 |
| 1369 – 1382 | 32 |
| 1382 – 1396 | 25 |
| 1396 – 1410 | 32 |
| 1410 – 1424 | 55 |
| 1424 – 1438 | 52 |
| 1438 – 1451 | 147 |
| 1451 – 1465 | 65 |
| 1465 – 1479 | 68 |
| 1479 – 1492 | 188 |
| 1492 – 1506 | 39 |
| 1506 – 1520 | 43 |
| 1520 – 1534 | 137 |
| 1534 – 1548 | 106 |
| 1548 – 1561 | 420 |
| 1561 – 1575 | 291 |
| 1575 – 1589 | 373 |
| 1589 – 1602 | 1576 |
| 1602 – 1616 | 878 |
| 1616 – 1630 | 934 |
| 1630 – 1644 | 1835 |
| 1644 – 1658 | 872 |
| 1658 – 1671 | 1469 |
| 1671 – 1685 | 260 |
| 1685 – 1699 | 295 |
| 1699 – 1712 | 302 |
| 1712 – 1726 | 105 |
| 1726 – 1740 | 68 |
| 1740 – 1754 | 151 |
| 1754 – 1768 | 8 |
| 1768 – 1781 | 14 |
| 1781 – 1795 | 4 |
| 1795 – 1809 | 0 |
| 1809 – 1822 | 1 |
| 1822 – 1836 | 1 |
| 1836 – 1850 | 1 |
Show data table
| value | count | share |
|---|---|---|
| United Kingdom | 3750 | 34.3% |
| Germany | 3417 | 31.2% |
| Switzerland | 1272 | 11.6% |
| France | 807 | 7.4% |
| Belgium | 671 | 6.1% |
| Sweden | 353 | 3.2% |
| Netherlands | 314 | 2.9% |
| Italy | 107 | 1.0% |
| Denmark | 90 | 0.8% |
| Spain | 29 | 0.3% |
| Hungary | 26 | 0.2% |
| Norway | 20 | 0.2% |
| Luxembourg | 20 | 0.2% |
| Estonia | 17 | 0.2% |
| Finland | 17 | 0.2% |
| Austria | 16 | 0.1% |
| Poland | 9 | 0.1% |
| Ireland | 4 | 0.0% |
| Czech Republic | 1 | 0.0% |
Show data table
| bin | count |
|---|---|
| 0 – 12.5 | 10727 |
| 12.5 – 25 | 100 |
| 25 – 37.5 | 43 |
| 37.5 – 50 | 20 |
| 50 – 62.5 | 13 |
| 62.5 – 75 | 15 |
| 75 – 87.5 | 2 |
| 87.5 – 100 | 1 |
| 100 – 112.5 | 4 |
| 112.5 – 125 | 2 |
| 125 – 137.5 | 0 |
| 137.5 – 150 | 0 |
| 150 – 162.5 | 3 |
| 162.5 – 175 | 1 |
| 175 – 187.5 | 0 |
| 187.5 – 200 | 0 |
| 200 – 212.5 | 1 |
| 212.5 – 225 | 1 |
| 225 – 237.5 | 0 |
| 237.5 – 250 | 1 |
| 250 – 262.5 | 0 |
| 262.5 – 275 | 0 |
| 275 – 287.5 | 0 |
| 287.5 – 300 | 0 |
| 300 – 312.5 | 1 |
| 312.5 – 325 | 0 |
| 325 – 337.5 | 0 |
| 337.5 – 350 | 0 |
| 350 – 362.5 | 0 |
| 362.5 – 375 | 0 |
| 375 – 387.5 | 0 |
| 387.5 – 400 | 0 |
| 400 – 412.5 | 0 |
| 412.5 – 425 | 0 |
| 425 – 437.5 | 0 |
| 437.5 – 450 | 0 |
| 450 – 462.5 | 0 |
| 462.5 – 475 | 0 |
| 475 – 487.5 | 0 |
| 487.5 – 500 | 5 |
Show data table
| value | count | share |
|---|---|---|
| Geneva | 320 | 2.9% |
| Budingen | 264 | 2.4% |
| Bruges | 121 | 1.1% |
| Munich | 80 | 0.7% |
| Augsburg | 75 | 0.7% |
| Vesoul | 64 | 0.6% |
| Venice | 63 | 0.6% |
| Boudry | 59 | 0.5% |
| Valangin | 58 | 0.5% |
| Fleurier | 54 | 0.5% |
| Gelnhausen | 50 | 0.5% |
| Arnsberg | 49 | 0.4% |
| Thielle-Wavre | 47 | 0.4% |
| Burghausen | 44 | 0.4% |
| Grundau | 44 | 0.4% |
| Neuchatel | 42 | 0.4% |
| Colombier | 42 | 0.4% |
| Stuttgart | 39 | 0.4% |
| Mitterfels | 38 | 0.3% |
| Namur | 37 | 0.3% |
Show data table
| bin | count |
|---|---|
| 1 – 13.47 | 10493 |
| 13.47 – 25.95 | 196 |
| 25.95 – 38.42 | 88 |
| 38.42 – 50.9 | 33 |
| 50.9 – 63.38 | 21 |
| 63.38 – 75.85 | 14 |
| 75.85 – 88.33 | 9 |
| 88.33 – 100.8 | 24 |
| 100.8 – 113.3 | 6 |
| 113.3 – 125.8 | 9 |
| 125.8 – 138.2 | 5 |
| 138.2 – 150.7 | 6 |
| 150.7 – 163.2 | 8 |
| 163.2 – 175.7 | 3 |
| 175.7 – 188.1 | 5 |
| 188.1 – 200.6 | 1 |
| 200.6 – 213.1 | 0 |
| 213.1 – 225.5 | 3 |
| 225.5 – 238 | 1 |
| 238 – 250.5 | 1 |
| 250.5 – 263 | 1 |
| 263 – 275.4 | 0 |
| 275.4 – 287.9 | 0 |
| 287.9 – 300.4 | 0 |
| 300.4 – 312.9 | 1 |
| 312.9 – 325.3 | 0 |
| 325.3 – 337.8 | 1 |
| 337.8 – 350.3 | 0 |
| 350.3 – 362.8 | 3 |
| 362.8 – 375.2 | 0 |
| 375.2 – 387.7 | 0 |
| 387.7 – 400.2 | 0 |
| 400.2 – 412.7 | 3 |
| 412.7 – 425.1 | 0 |
| 425.1 – 437.6 | 0 |
| 437.6 – 450.1 | 0 |
| 450.1 – 462.6 | 0 |
| 462.6 – 475.1 | 0 |
| 475.1 – 487.5 | 0 |
| 487.5 – 500 | 5 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| year | numeric | 8.5% | 430 |
outliers
|
| decade | numeric | 0.0% | 53 |
outliers
|
| city | categorical | 47.7% | 906 |
null_rate
|
| country | categorical | 0.0% | 19 |
|
| tried | numeric | 0.0% | 111 |
high_skew
outliers
|
| deaths | numeric | 0.0% | 74 |
high_skew
outliers
|
year
numeric timestamp outliersThis column represents a historical year, likely a creation, publication, or event date for records spanning 1300 to 1850 — consistent with a cultural heritage or manuscript dataset. The distribution is strongly left-skewed (skew = -1.59) with an unusually heavy left tail: while 50% of records fall between 1597 and 1660 (IQR = 63 years), the minimum reaches back to 1300, producing 714 outliers (7.1% of rows) at the early extreme. The high kurtosis (4.32) confirms a sharp central peak around the median of 1630 with fat tails, and an 8.5% null rate warrants attention before temporal analysis. Treatment: Treat as an ordinal/temporal feature; investigate records below ~1500 as potential data-quality issues or deliberate historical outliers before binning into periods for modelling.
- n
- 10,940
- nulls
- 931 (8.5%)
- unique
- 430
- min
- 1,300
- max
- 1,850
- mean
- 1621
- median
- 1,630
- std
- 66.25
- q1
- 1,597
- q3
- 1,660
- iqr
- 63
- skew
- -1.586
- kurtosis
- 4.319
- n_outliers
- 714
- outlier_rate
- 0.07134
- zero_rate
- 0
decade
numeric feature outliersThis column represents a decade or year of origin — likely a composition or publication year — spanning 1300 to 1850, with only 53 distinct values across 10,940 rows. The distribution is heavily left-skewed (skew = −1.48) with high kurtosis (3.85), meaning most records cluster in the 1590–1650 range (IQR = 60 years) while a long tail stretches back to 1300. Notably, 848 rows (≈7.75%) are flagged as outliers, likely corresponding to early-period records far from the central mass near the median of 1620. Treatment: Treat as an ordinal/temporal feature; consider binning into broader periods or applying a robust scaler given the heavy left tail and outlier concentration.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 53
- min
- 1,300
- max
- 1,850
- mean
- 1615
- median
- 1,620
- std
- 66.68
- q1
- 1,590
- q3
- 1,650
- iqr
- 60
- skew
- -1.478
- kurtosis
- 3.85
- n_outliers
- 848
- outlier_rate
- 0.07751
- zero_rate
- 0
city
categorical feature null_rateThis column contains city names, likely representing geographic origin or location associated with records in the dataset. The null rate of 47.65% is a significant concern — nearly half of all 10,940 rows are missing a city value, triggering an alert. With 906 unique cities and an entropy ratio of 0.856, the distribution is fairly broad, yet Geneva dominates with 320 occurrences (5.59% of non-null rows). The top cities — Geneva, Budingen, Bruges, Munich, Augsburg, Venice — suggest a historical European dataset, possibly pre-modern trade, migration, or administrative records. Treatment: Impute or flag nulls (47.65% missing); encode as categorical feature, potentially grouping rare cities below a frequency threshold given 906 unique values.
- n
- 10,940
- nulls
- 5,213 (47.7%)
- unique
- 906
- top_value
- Geneva
- top_rate
- 0.05588
- cardinality
- 906
- entropy
- 8.406
- entropy_ratio
- 0.8557
country
categorical featureThis column records the country of origin or location for each record, covering 19 distinct countries across 10,940 rows with no nulls. The distribution is heavily concentrated in Western Europe, with the United Kingdom (3,750 rows, 34.3%) and Germany (3,417 rows) together accounting for roughly two-thirds of all records. Switzerland (1,272) and France (807) are the next largest groups, while the remaining 15 countries collectively represent a small tail — Spain, for example, appears only 29 times. The entropy ratio of 0.59 reflects this moderate-to-high imbalance, which could bias any model trained on country as a feature. Treatment: One-hot encode or target-encode; consider grouping low-frequency countries (e.g., Spain with 29 rows) into an 'Other' bucket to reduce sparsity.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 19
- top_value
- United Kingdom
- top_rate
- 0.3428
- cardinality
- 19
- entropy
- 2.502
- entropy_ratio
- 0.5889
tried
numeric feature high_skew outliersThis column likely records the number of attempts made for some action (e.g., quiz attempts, login tries, or retry counts), given its name 'tried' and integer-like distribution starting at 1. The distribution is extremely concentrated: Q1, median, and Q3 are all 1.0, yet the mean is ~3.95 and the max is 500, indicating a tiny fraction of records drive nearly all the variance. With 22.5% of rows flagged as outliers and a kurtosis of 316, the tail is extraordinarily heavy and the bulk of users attempt something exactly once. Treatment: Log-transform (log1p) before modelling, or cap at a meaningful percentile threshold to reduce outlier influence; consider binning into ordinal buckets (1, 2–5, 6+).
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 111
- min
- 1
- max
- 500
- mean
- 3.952
- median
- 1
- std
- 19.26
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 15.61
- kurtosis
- 316
- n_outliers
- 2,457
- outlier_rate
- 0.2246
- zero_rate
- 0
deaths
numeric numeric_target high_skew outliersThis column records death counts per observation (likely per event, location, or time period), with the vast majority of rows — 75.5% — being exactly zero and a median of 0. The distribution is extraordinarily right-skewed (skew=28.52, kurtosis=991.65), driven by rare but extreme values reaching a maximum of 500; notably, 24.5% of rows are flagged as outliers, meaning non-zero death counts are statistically rare but not negligible. Only 74 unique values across 10,940 rows confirms the heavily zero-inflated, discrete nature of the data. Treatment: Model with zero-inflated or negative-binomial regression; apply log1p transform if used as a feature in standard ML pipelines.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 74
- min
- 0
- max
- 500
- mean
- 1.493
- median
- 0
- std
- 13.19
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 28.52
- kurtosis
- 991.6
- n_outliers
- 2,684
- outlier_rate
- 0.2453
- zero_rate
- 0.7547