quirky witch trials
Reading
This dataset catalogs 10,940 historical witch trial records across 6 columns, covering when and where trials occurred, how many people were tried, and how many died. Trials span from 1300 to 1850, with the bulk concentrated around the early 1600s (median year 1630), and they are heavily dominated by the United Kingdom (3,750 records) and Germany (3,417), which together account for roughly two-thirds of the data. The 'deaths' and 'tried' columns are extremely skewed: 75% of records report zero deaths, yet a small set of outlier events reach up to 500, so any aggregate analysis should treat these tails carefully. Also worth flagging: the 'city' field is 47.6% null and spans 906 unique values, so geographic analysis below the country level will be patchy.
citing: row_count · column_count · deaths.stats · tried.stats · year.stats · decade.stats · country.top_values · city.null_rate · city.n_unique
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| United Kingdom | 3750 | 34.3% |
| Germany | 3417 | 31.2% |
| Switzerland | 1272 | 11.6% |
| France | 807 | 7.4% |
| Belgium | 671 | 6.1% |
| Sweden | 353 | 3.2% |
| Netherlands | 314 | 2.9% |
| Italy | 107 | 1.0% |
| Denmark | 90 | 0.8% |
| Spain | 29 | 0.3% |
| Hungary | 26 | 0.2% |
| Norway | 20 | 0.2% |
| Luxembourg | 20 | 0.2% |
| Estonia | 17 | 0.2% |
| Finland | 17 | 0.2% |
| Austria | 16 | 0.1% |
| Poland | 9 | 0.1% |
| Ireland | 4 | 0.0% |
| Czech Republic | 1 | 0.0% |
Show data table
| bin | count |
|---|---|
| 1300 – 1314 | 13 |
| 1314 – 1328 | 39 |
| 1328 – 1341 | 29 |
| 1341 – 1355 | 8 |
| 1355 – 1369 | 2 |
| 1369 – 1382 | 16 |
| 1382 – 1396 | 29 |
| 1396 – 1410 | 36 |
| 1410 – 1424 | 31 |
| 1424 – 1438 | 49 |
| 1438 – 1451 | 78 |
| 1451 – 1465 | 85 |
| 1465 – 1479 | 86 |
| 1479 – 1492 | 120 |
| 1492 – 1506 | 68 |
| 1506 – 1520 | 40 |
| 1520 – 1534 | 77 |
| 1534 – 1548 | 101 |
| 1548 – 1561 | 145 |
| 1561 – 1575 | 338 |
| 1575 – 1589 | 405 |
| 1589 – 1602 | 1075 |
| 1602 – 1616 | 971 |
| 1616 – 1630 | 1042 |
| 1630 – 1644 | 936 |
| 1644 – 1658 | 1425 |
| 1658 – 1671 | 1372 |
| 1671 – 1685 | 415 |
| 1685 – 1699 | 307 |
| 1699 – 1712 | 276 |
| 1712 – 1726 | 139 |
| 1726 – 1740 | 90 |
| 1740 – 1754 | 128 |
| 1754 – 1768 | 21 |
| 1768 – 1781 | 9 |
| 1781 – 1795 | 5 |
| 1795 – 1809 | 0 |
| 1809 – 1822 | 0 |
| 1822 – 1836 | 2 |
| 1836 – 1850 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 12.5 | 10727 |
| 12.5 – 25 | 100 |
| 25 – 37.5 | 43 |
| 37.5 – 50 | 20 |
| 50 – 62.5 | 13 |
| 62.5 – 75 | 15 |
| 75 – 87.5 | 2 |
| 87.5 – 100 | 1 |
| 100 – 112.5 | 4 |
| 112.5 – 125 | 2 |
| 125 – 137.5 | 0 |
| 137.5 – 150 | 0 |
| 150 – 162.5 | 3 |
| 162.5 – 175 | 1 |
| 175 – 187.5 | 0 |
| 187.5 – 200 | 0 |
| 200 – 212.5 | 1 |
| 212.5 – 225 | 1 |
| 225 – 237.5 | 0 |
| 237.5 – 250 | 1 |
| 250 – 262.5 | 0 |
| 262.5 – 275 | 0 |
| 275 – 287.5 | 0 |
| 287.5 – 300 | 0 |
| 300 – 312.5 | 1 |
| 312.5 – 325 | 0 |
| 325 – 337.5 | 0 |
| 337.5 – 350 | 0 |
| 350 – 362.5 | 0 |
| 362.5 – 375 | 0 |
| 375 – 387.5 | 0 |
| 387.5 – 400 | 0 |
| 400 – 412.5 | 0 |
| 412.5 – 425 | 0 |
| 425 – 437.5 | 0 |
| 437.5 – 450 | 0 |
| 450 – 462.5 | 0 |
| 462.5 – 475 | 0 |
| 475 – 487.5 | 0 |
| 487.5 – 500 | 5 |
Show data table
| bin | count |
|---|---|
| 1 – 13.47 | 10493 |
| 13.47 – 25.95 | 196 |
| 25.95 – 38.42 | 88 |
| 38.42 – 50.9 | 33 |
| 50.9 – 63.38 | 21 |
| 63.38 – 75.85 | 14 |
| 75.85 – 88.33 | 9 |
| 88.33 – 100.8 | 24 |
| 100.8 – 113.3 | 6 |
| 113.3 – 125.8 | 9 |
| 125.8 – 138.2 | 5 |
| 138.2 – 150.7 | 6 |
| 150.7 – 163.2 | 8 |
| 163.2 – 175.7 | 3 |
| 175.7 – 188.1 | 5 |
| 188.1 – 200.6 | 1 |
| 200.6 – 213.1 | 0 |
| 213.1 – 225.5 | 3 |
| 225.5 – 238 | 1 |
| 238 – 250.5 | 1 |
| 250.5 – 263 | 1 |
| 263 – 275.4 | 0 |
| 275.4 – 287.9 | 0 |
| 287.9 – 300.4 | 0 |
| 300.4 – 312.9 | 1 |
| 312.9 – 325.3 | 0 |
| 325.3 – 337.8 | 1 |
| 337.8 – 350.3 | 0 |
| 350.3 – 362.8 | 3 |
| 362.8 – 375.2 | 0 |
| 375.2 – 387.7 | 0 |
| 387.7 – 400.2 | 0 |
| 400.2 – 412.7 | 3 |
| 412.7 – 425.1 | 0 |
| 425.1 – 437.6 | 0 |
| 437.6 – 450.1 | 0 |
| 450.1 – 462.6 | 0 |
| 462.6 – 475.1 | 0 |
| 475.1 – 487.5 | 0 |
| 487.5 – 500 | 5 |
Show data table
| bin | count |
|---|---|
| 1300 – 1314 | 32 |
| 1314 – 1328 | 21 |
| 1328 – 1341 | 33 |
| 1341 – 1355 | 4 |
| 1355 – 1369 | 3 |
| 1369 – 1382 | 32 |
| 1382 – 1396 | 25 |
| 1396 – 1410 | 32 |
| 1410 – 1424 | 55 |
| 1424 – 1438 | 52 |
| 1438 – 1451 | 147 |
| 1451 – 1465 | 65 |
| 1465 – 1479 | 68 |
| 1479 – 1492 | 188 |
| 1492 – 1506 | 39 |
| 1506 – 1520 | 43 |
| 1520 – 1534 | 137 |
| 1534 – 1548 | 106 |
| 1548 – 1561 | 420 |
| 1561 – 1575 | 291 |
| 1575 – 1589 | 373 |
| 1589 – 1602 | 1576 |
| 1602 – 1616 | 878 |
| 1616 – 1630 | 934 |
| 1630 – 1644 | 1835 |
| 1644 – 1658 | 872 |
| 1658 – 1671 | 1469 |
| 1671 – 1685 | 260 |
| 1685 – 1699 | 295 |
| 1699 – 1712 | 302 |
| 1712 – 1726 | 105 |
| 1726 – 1740 | 68 |
| 1740 – 1754 | 151 |
| 1754 – 1768 | 8 |
| 1768 – 1781 | 14 |
| 1781 – 1795 | 4 |
| 1795 – 1809 | 0 |
| 1809 – 1822 | 1 |
| 1822 – 1836 | 1 |
| 1836 – 1850 | 1 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| year | numeric | 8.5% | 430 |
outliers
|
| decade | numeric | 0.0% | 53 |
outliers
|
| city | categorical | 47.7% | 906 |
null_rate
|
| country | categorical | 0.0% | 19 |
|
| tried | numeric | 0.0% | 111 |
high_skew
outliers
|
| deaths | numeric | 0.0% | 74 |
high_skew
outliers
|
year
numeric timestamp outliersThis is a 'year' field spanning 1300-1850 with median 1630, so it likely records the creation or publication year of historical items rather than modern records. The distribution is left-skewed (skew -1.59, kurtosis 4.32) with 714 outliers (7.13%) sitting in the early-century tail, and 8.51% of rows are null. The tight IQR of 63 years (1597-1660) shows the corpus concentrates heavily in the late-Renaissance/early-Baroque period. Treatment: Impute or bucket into era bins before modelling; consider winsorising the pre-1500 tail.
- n
- 10,940
- nulls
- 931 (8.5%)
- unique
- 430
- min
- 1,300
- max
- 1,850
- mean
- 1621
- median
- 1,630
- std
- 66.25
- q1
- 1,597
- q3
- 1,660
- iqr
- 63
- skew
- -1.586
- kurtosis
- 4.319
- n_outliers
- 714
- outlier_rate
- 0.07134
- zero_rate
- 0
decade
numeric timestamp outliersThis is a 'decade' field expressed as a four-digit year, ranging from 1300 to 1850 with a median of 1620 and a tight IQR of 60 years (Q1 1590, Q3 1650). The distribution is strongly left-skewed (skew -1.48, kurtosis 3.85) and 848 rows (7.75%) fall outside the Tukey fences — likely the pre-1500 records pulling the tail. With only 53 unique values across 10,940 rows and no nulls, it behaves as a coarse temporal bin rather than a continuous measurement. Treatment: Treat as an ordered categorical decade bin; consider grouping pre-1500 tail before modelling.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 53
- min
- 1,300
- max
- 1,850
- mean
- 1615
- median
- 1,620
- std
- 66.68
- q1
- 1,590
- q3
- 1,650
- iqr
- 60
- skew
- -1.478
- kurtosis
- 3.85
- n_outliers
- 848
- outlier_rate
- 0.07751
- zero_rate
- 0
city
categorical feature null_rateThis is a city-name categorical feature with 906 distinct values across 10,940 rows, dominated by European locations (Geneva, Budingen, Bruges, Munich). Nearly half the rows are null (47.65%), and even the top value only covers 5.59% of non-nulls, giving high entropy (ratio 0.856) and a long tail. The null rate combined with high cardinality is the main concern for downstream use. Treatment: Impute or flag missingness, then group rare cities into an 'other' bucket before encoding.
- n
- 10,940
- nulls
- 5,213 (47.7%)
- unique
- 906
- top_value
- Geneva
- top_rate
- 0.05588
- cardinality
- 906
- entropy
- 8.406
- entropy_ratio
- 0.8557
country
categorical featureCountry of origin as a categorical label, with 19 distinct values across 10,940 rows and no nulls. The distribution is European-heavy and concentrated: United Kingdom alone accounts for 34.3% (3,750), Germany 3,417, and Switzerland 1,272, while the long tail (Italy, Denmark, Spain) drops to double or even single digits. Entropy ratio of 0.59 confirms moderate concentration rather than a uniform spread. Treatment: One-hot encode the top countries and bucket the low-frequency tail into 'Other' before modelling.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 19
- top_value
- United Kingdom
- top_rate
- 0.3428
- cardinality
- 19
- entropy
- 2.502
- entropy_ratio
- 0.5889
tried
numeric feature high_skew outliers`tried` is a numeric attempt counter that is heavily concentrated at 1: the median, q1, and q3 are all 1.0, yet values stretch up to 500.0 with a mean of 3.95. Skew of 15.6 and kurtosis of 316 confirm an extremely long right tail, and 2457 rows (22.5%) flag as outliers under the IQR rule even though the IQR itself is 0. With only 111 distinct values and no nulls or zeros, this looks like a 'number of tries' field where most events succeed immediately and a small minority retry many times. Treatment: log1p-transform or cap at a high quantile before modelling to tame the long tail.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 111
- min
- 1
- max
- 500
- mean
- 3.952
- median
- 1
- std
- 19.26
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 15.61
- kurtosis
- 316
- n_outliers
- 2,457
- outlier_rate
- 0.2246
- zero_rate
- 0
deaths
numeric numeric_target high_skew outliersNumeric count of deaths per record, with 10940 rows and only 74 distinct values. The distribution is heavily zero-inflated (zero_rate 0.7547) with median, q1, and q3 all at 0, yet the max reaches 500 and skew is 28.52 with kurtosis 991.65. Roughly 24.5% of rows (2684) flag as outliers, signalling rare but extreme mortality events rather than dirty data. Treatment: Model with a zero-inflated or hurdle approach, or log1p-transform before regression.
- n
- 10,940
- nulls
- 0 (0.0%)
- unique
- 74
- min
- 0
- max
- 500
- mean
- 1.493
- median
- 0
- std
- 13.19
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 28.52
- kurtosis
- 991.6
- n_outliers
- 2,684
- outlier_rate
- 0.2453
- zero_rate
- 0.7547