saturn·

quirky witch trials

source /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json 10,940 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset catalogs 10,940 historical witch trial records across 6 columns, covering when and where trials occurred, how many people were tried, and how many died. Trials span from 1300 to 1850, with the bulk concentrated around the early 1600s (median year 1630), and they are heavily dominated by the United Kingdom (3,750 records) and Germany (3,417), which together account for roughly two-thirds of the data. The 'deaths' and 'tried' columns are extremely skewed: 75% of records report zero deaths, yet a small set of outlier events reach up to 500, so any aggregate analysis should treat these tails carefully. Also worth flagging: the 'city' field is 47.6% null and spans 906 unique values, so geographic analysis below the country level will be patchy.

citing: row_count · column_count · deaths.stats · tried.stats · year.stats · decade.stats · country.top_values · city.null_rate · city.n_unique

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
year numeric 8.5% 430
outliers
decade numeric 0.0% 53
outliers
city categorical 47.7% 906
null_rate
country categorical 0.0% 19
tried numeric 0.0% 111
high_skew outliers
deaths numeric 0.0% 74
high_skew outliers

year

numeric timestamp outliers
This is a 'year' field spanning 1300-1850 with median 1630, so it likely records the creation or publication year of historical items rather than modern records. The distribution is left-skewed (skew -1.59, kurtosis 4.32) with 714 outliers (7.13%) sitting in the early-century tail, and 8.51% of rows are null. The tight IQR of 63 years (1597-1660) shows the corpus concentrates heavily in the late-Renaissance/early-Baroque period. Treatment: Impute or bucket into era bins before modelling; consider winsorising the pre-1500 tail. high · anthropic:claude-opus-4-7
n
10,940
nulls
931 (8.5%)
unique
430
min
1,300
max
1,850
mean
1621
median
1,630
std
66.25
q1
1,597
q3
1,660
iqr
63
skew
-1.586
kurtosis
4.319
n_outliers
714
outlier_rate
0.07134
zero_rate
0

decade

numeric timestamp outliers
This is a 'decade' field expressed as a four-digit year, ranging from 1300 to 1850 with a median of 1620 and a tight IQR of 60 years (Q1 1590, Q3 1650). The distribution is strongly left-skewed (skew -1.48, kurtosis 3.85) and 848 rows (7.75%) fall outside the Tukey fences — likely the pre-1500 records pulling the tail. With only 53 unique values across 10,940 rows and no nulls, it behaves as a coarse temporal bin rather than a continuous measurement. Treatment: Treat as an ordered categorical decade bin; consider grouping pre-1500 tail before modelling. high · anthropic:claude-opus-4-7
n
10,940
nulls
0 (0.0%)
unique
53
min
1,300
max
1,850
mean
1615
median
1,620
std
66.68
q1
1,590
q3
1,650
iqr
60
skew
-1.478
kurtosis
3.85
n_outliers
848
outlier_rate
0.07751
zero_rate
0

city

categorical feature null_rate
This is a city-name categorical feature with 906 distinct values across 10,940 rows, dominated by European locations (Geneva, Budingen, Bruges, Munich). Nearly half the rows are null (47.65%), and even the top value only covers 5.59% of non-nulls, giving high entropy (ratio 0.856) and a long tail. The null rate combined with high cardinality is the main concern for downstream use. Treatment: Impute or flag missingness, then group rare cities into an 'other' bucket before encoding. high · anthropic:claude-opus-4-7
n
10,940
nulls
5,213 (47.7%)
unique
906
top_value
Geneva
top_rate
0.05588
cardinality
906
entropy
8.406
entropy_ratio
0.8557

country

categorical feature
Country of origin as a categorical label, with 19 distinct values across 10,940 rows and no nulls. The distribution is European-heavy and concentrated: United Kingdom alone accounts for 34.3% (3,750), Germany 3,417, and Switzerland 1,272, while the long tail (Italy, Denmark, Spain) drops to double or even single digits. Entropy ratio of 0.59 confirms moderate concentration rather than a uniform spread. Treatment: One-hot encode the top countries and bucket the low-frequency tail into 'Other' before modelling. high · anthropic:claude-opus-4-7
n
10,940
nulls
0 (0.0%)
unique
19
top_value
United Kingdom
top_rate
0.3428
cardinality
19
entropy
2.502
entropy_ratio
0.5889

tried

numeric feature high_skew outliers
`tried` is a numeric attempt counter that is heavily concentrated at 1: the median, q1, and q3 are all 1.0, yet values stretch up to 500.0 with a mean of 3.95. Skew of 15.6 and kurtosis of 316 confirm an extremely long right tail, and 2457 rows (22.5%) flag as outliers under the IQR rule even though the IQR itself is 0. With only 111 distinct values and no nulls or zeros, this looks like a 'number of tries' field where most events succeed immediately and a small minority retry many times. Treatment: log1p-transform or cap at a high quantile before modelling to tame the long tail. high · anthropic:claude-opus-4-7
n
10,940
nulls
0 (0.0%)
unique
111
min
1
max
500
mean
3.952
median
1
std
19.26
q1
1
q3
1
iqr
0
skew
15.61
kurtosis
316
n_outliers
2,457
outlier_rate
0.2246
zero_rate
0

deaths

numeric numeric_target high_skew outliers
Numeric count of deaths per record, with 10940 rows and only 74 distinct values. The distribution is heavily zero-inflated (zero_rate 0.7547) with median, q1, and q3 all at 0, yet the max reaches 500 and skew is 28.52 with kurtosis 991.65. Roughly 24.5% of rows (2684) flag as outliers, signalling rare but extreme mortality events rather than dirty data. Treatment: Model with a zero-inflated or hurdle approach, or log1p-transform before regression. high · anthropic:claude-opus-4-7
n
10,940
nulls
0 (0.0%)
unique
74
min
0
max
500
mean
1.493
median
0
std
13.19
q1
0
q3
0
iqr
0
skew
28.52
kurtosis
991.6
n_outliers
2,684
outlier_rate
0.2453
zero_rate
0.7547