saturn·

data trove witch trials

saturn notebook · generated 2026-06-22 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json

Saturn profiled 10,940 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json",
    "--findings", "data-trove-witch-trials.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset records historical witch trials across Europe, covering 10,940 cases with information on location, time period, and outcomes (people tried and deaths). Two things stand out immediately: the extreme skew in both 'deaths' and 'tried' — the vast majority of records show zero deaths and just one person tried, yet outliers reach as high as 500, suggesting a small number of mass trials drove most of the carnage. Temporally, activity clusters heavily between roughly 1590–1660 (the IQR), pointing to a well-known peak persecution era, with a long tail back to 1300 worth examining. Geographically, the United Kingdom and Germany together account for over two-thirds of all records, while Geneva dominates city-level entries despite nearly half of city values being missing.

citing: deaths.stats.median · deaths.stats.mean · deaths.stats.max · deaths.stats.zero_rate · deaths.stats.n_outliers · tried.stats.median · tried.stats.max · tried.stats.outlier_rate · year.stats.q1 · year.stats.q3 · year.stats.median · year.stats.min · country.top_values · city.top_values · city.null_rate

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
year numeric 10,940 8.5% 430 outliers
decade numeric 10,940 0.0% 53 outliers
city categorical 10,940 47.7% 906 null_rate
country categorical 10,940 0.0% 19
tried numeric 10,940 0.0% 111 high_skew outliers
deaths numeric 10,940 0.0% 74 high_skew outliers
Fig 1.
decade · Look for the sharp peak around 1620–1650 that marks the height of witch-trial persecution, plus the sparse early centuries.
Show data table
Histogram bins for decade (median: 1620.0).
bincount
1300 – 131432
1314 – 132821
1328 – 134133
1341 – 13554
1355 – 13693
1369 – 138232
1382 – 139625
1396 – 141032
1410 – 142455
1424 – 143852
1438 – 1451147
1451 – 146565
1465 – 147968
1479 – 1492188
1492 – 150639
1506 – 152043
1520 – 1534137
1534 – 1548106
1548 – 1561420
1561 – 1575291
1575 – 1589373
1589 – 16021576
1602 – 1616878
1616 – 1630934
1630 – 16441835
1644 – 1658872
1658 – 16711469
1671 – 1685260
1685 – 1699295
1699 – 1712302
1712 – 1726105
1726 – 174068
1740 – 1754151
1754 – 17688
1768 – 178114
1781 – 17954
1795 – 18090
1809 – 18221
1822 – 18361
1836 – 18501
Fig 2.
country · United Kingdom and Germany dominate; compare their shares against smaller nations like Switzerland and France.
Show data table
Top values for country (19 unique shown, of 19 total).
valuecountshare
United Kingdom375034.3%
Germany341731.2%
Switzerland127211.6%
France8077.4%
Belgium6716.1%
Sweden3533.2%
Netherlands3142.9%
Italy1071.0%
Denmark900.8%
Spain290.3%
Hungary260.2%
Norway200.2%
Luxembourg200.2%
Estonia170.2%
Finland170.2%
Austria160.1%
Poland90.1%
Ireland40.0%
Czech Republic10.0%
Fig 3.
deaths · The overwhelming zero-death majority versus a thin but extreme tail of mass-execution events tells the core story of how unevenly lethal these trials were.
Show data table
Histogram bins for deaths (median: 0.0).
bincount
0 – 12.510727
12.5 – 25100
25 – 37.543
37.5 – 5020
50 – 62.513
62.5 – 7515
75 – 87.52
87.5 – 1001
100 – 112.54
112.5 – 1252
125 – 137.50
137.5 – 1500
150 – 162.53
162.5 – 1751
175 – 187.50
187.5 – 2000
200 – 212.51
212.5 – 2251
225 – 237.50
237.5 – 2501
250 – 262.50
262.5 – 2750
275 – 287.50
287.5 – 3000
300 – 312.51
312.5 – 3250
325 – 337.50
337.5 – 3500
350 – 362.50
362.5 – 3750
375 – 387.50
387.5 – 4000
400 – 412.50
412.5 – 4250
425 – 437.50
437.5 – 4500
450 – 462.50
462.5 – 4750
475 – 487.50
487.5 – 5005
Fig 4.
city · Geneva leads all cities by a wide margin — check whether its count reflects genuine concentration or a data-recording artefact given the high null rate.
Show data table
Top values for city (20 unique shown, of 906 total).
valuecountshare
Geneva3202.9%
Budingen2642.4%
Bruges1211.1%
Munich800.7%
Augsburg750.7%
Vesoul640.6%
Venice630.6%
Boudry590.5%
Valangin580.5%
Fleurier540.5%
Gelnhausen500.5%
Arnsberg490.4%
Thielle-Wavre470.4%
Burghausen440.4%
Grundau440.4%
Neuchatel420.4%
Colombier420.4%
Stuttgart390.4%
Mitterfels380.3%
Namur370.3%
Fig 5.
tried · Most trials involved a single accused, but outliers at 500 reveal the rare but devastating mass trials that skew the overall statistics.
Show data table
Histogram bins for tried (median: 1.0).
bincount
1 – 13.4710493
13.47 – 25.95196
25.95 – 38.4288
38.42 – 50.933
50.9 – 63.3821
63.38 – 75.8514
75.85 – 88.339
88.33 – 100.824
100.8 – 113.36
113.3 – 125.89
125.8 – 138.25
138.2 – 150.76
150.7 – 163.28
163.2 – 175.73
175.7 – 188.15
188.1 – 200.61
200.6 – 213.10
213.1 – 225.53
225.5 – 2381
238 – 250.51
250.5 – 2631
263 – 275.40
275.4 – 287.90
287.9 – 300.40
300.4 – 312.91
312.9 – 325.30
325.3 – 337.81
337.8 – 350.30
350.3 – 362.83
362.8 – 375.20
375.2 – 387.70
387.7 – 400.20
400.2 – 412.73
412.7 – 425.10
425.1 – 437.60
437.6 – 450.10
450.1 – 462.60
462.6 – 475.10
475.1 – 487.50
487.5 – 5005
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
yearnumeric8.5%
decadenumeric0.0%
citycategorical47.7%
countrycategorical0.0%
triednumeric0.0%
deathsnumeric0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
yeardecadetrieddeaths
year+1.00+0.36-0.19-0.16
decade+0.36+1.00+0.00+0.03
tried-0.19+0.00+1.00+0.65
deaths-0.16+0.03+0.65+1.00

year numeric timestamp

This column represents a historical year, likely a creation, publication, or event date for records spanning 1300 to 1850 — consistent with a cultural heritage or manuscript dataset. The distribution is strongly left-skewed (skew = -1.59) with an unusually heavy left tail: while 50% of records fall between 1597 and 1660 (IQR = 63 years), the minimum reaches back to 1300, producing 714 outliers (7.1% of rows) at the early extreme. The high kurtosis (4.32) confirms a sharp central peak around the median of 1630 with fat tails, and an 8.5% null rate warrants attention before temporal analysis.

Treatment: Treat as an ordinal/temporal feature; investigate records below ~1500 as potential data-quality issues or deliberate historical outliers before binning into periods for modelling.

anthropic:default · confidence high
Out[13]:

saturn.columns["year"].stats

statvalue
n10,940
nulls931 (8.5%)
unique430
min 1,300
max 1,850
mean 1621
median 1,630
std 66.25
q1 1,597
q3 1,660
iqr 63
skew -1.586
kurtosis 4.319
n_outliers 714
outlier_rate 0.07134
zero_rate 0
alert: outliers7.1% rows beyond 1.5 IQR
Fig 8.
Distribution of year. Vertical dash marks the median.
Show data table
Histogram bins for year (median: 1630.0).
bincount
1300 – 131413
1314 – 132839
1328 – 134129
1341 – 13558
1355 – 13692
1369 – 138216
1382 – 139629
1396 – 141036
1410 – 142431
1424 – 143849
1438 – 145178
1451 – 146585
1465 – 147986
1479 – 1492120
1492 – 150668
1506 – 152040
1520 – 153477
1534 – 1548101
1548 – 1561145
1561 – 1575338
1575 – 1589405
1589 – 16021075
1602 – 1616971
1616 – 16301042
1630 – 1644936
1644 – 16581425
1658 – 16711372
1671 – 1685415
1685 – 1699307
1699 – 1712276
1712 – 1726139
1726 – 174090
1740 – 1754128
1754 – 176821
1768 – 17819
1781 – 17955
1795 – 18090
1809 – 18220
1822 – 18362
1836 – 18501

decade numeric feature

This column represents a decade or year of origin — likely a composition or publication year — spanning 1300 to 1850, with only 53 distinct values across 10,940 rows. The distribution is heavily left-skewed (skew = −1.48) with high kurtosis (3.85), meaning most records cluster in the 1590–1650 range (IQR = 60 years) while a long tail stretches back to 1300. Notably, 848 rows (≈7.75%) are flagged as outliers, likely corresponding to early-period records far from the central mass near the median of 1620.

Treatment: Treat as an ordinal/temporal feature; consider binning into broader periods or applying a robust scaler given the heavy left tail and outlier concentration.

anthropic:default · confidence high
Out[16]:

saturn.columns["decade"].stats

statvalue
n10,940
nulls0 (0.0%)
unique53
min 1,300
max 1,850
mean 1615
median 1,620
std 66.68
q1 1,590
q3 1,650
iqr 60
skew -1.478
kurtosis 3.85
n_outliers 848
outlier_rate 0.07751
zero_rate 0
alert: outliers7.8% rows beyond 1.5 IQR
Fig 9.
Distribution of decade. Vertical dash marks the median.
Show data table
Histogram bins for decade (median: 1620.0).
bincount
1300 – 131432
1314 – 132821
1328 – 134133
1341 – 13554
1355 – 13693
1369 – 138232
1382 – 139625
1396 – 141032
1410 – 142455
1424 – 143852
1438 – 1451147
1451 – 146565
1465 – 147968
1479 – 1492188
1492 – 150639
1506 – 152043
1520 – 1534137
1534 – 1548106
1548 – 1561420
1561 – 1575291
1575 – 1589373
1589 – 16021576
1602 – 1616878
1616 – 1630934
1630 – 16441835
1644 – 1658872
1658 – 16711469
1671 – 1685260
1685 – 1699295
1699 – 1712302
1712 – 1726105
1726 – 174068
1740 – 1754151
1754 – 17688
1768 – 178114
1781 – 17954
1795 – 18090
1809 – 18221
1822 – 18361
1836 – 18501

city categorical feature

This column contains city names, likely representing geographic origin or location associated with records in the dataset. The null rate of 47.65% is a significant concern — nearly half of all 10,940 rows are missing a city value, triggering an alert. With 906 unique cities and an entropy ratio of 0.856, the distribution is fairly broad, yet Geneva dominates with 320 occurrences (5.59% of non-null rows). The top cities — Geneva, Budingen, Bruges, Munich, Augsburg, Venice — suggest a historical European dataset, possibly pre-modern trade, migration, or administrative records.

Treatment: Impute or flag nulls (47.65% missing); encode as categorical feature, potentially grouping rare cities below a frequency threshold given 906 unique values.

anthropic:default · confidence high
Out[19]:

saturn.columns["city"].stats

statvalue
n10,940
nulls5,213 (47.7%)
unique906
top_value Geneva
top_rate 0.05588
cardinality 906
entropy 8.406
entropy_ratio 0.8557
alert: null_rate47.7% null
Fig 10.
Top values for city.
Show data table
Top values for city (20 unique shown, of 906 total).
valuecountshare
Geneva3202.9%
Budingen2642.4%
Bruges1211.1%
Munich800.7%
Augsburg750.7%
Vesoul640.6%
Venice630.6%
Boudry590.5%
Valangin580.5%
Fleurier540.5%
Gelnhausen500.5%
Arnsberg490.4%
Thielle-Wavre470.4%
Burghausen440.4%
Grundau440.4%
Neuchatel420.4%
Colombier420.4%
Stuttgart390.4%
Mitterfels380.3%
Namur370.3%

country categorical feature

This column records the country of origin or location for each record, covering 19 distinct countries across 10,940 rows with no nulls. The distribution is heavily concentrated in Western Europe, with the United Kingdom (3,750 rows, 34.3%) and Germany (3,417 rows) together accounting for roughly two-thirds of all records. Switzerland (1,272) and France (807) are the next largest groups, while the remaining 15 countries collectively represent a small tail — Spain, for example, appears only 29 times. The entropy ratio of 0.59 reflects this moderate-to-high imbalance, which could bias any model trained on country as a feature.

Treatment: One-hot encode or target-encode; consider grouping low-frequency countries (e.g., Spain with 29 rows) into an 'Other' bucket to reduce sparsity.

anthropic:default · confidence high
Out[22]:

saturn.columns["country"].stats

statvalue
n10,940
nulls0 (0.0%)
unique19
top_value United Kingdom
top_rate 0.3428
cardinality 19
entropy 2.502
entropy_ratio 0.5889
Fig 11.
Top values for country.
Show data table
Top values for country (19 unique shown, of 19 total).
valuecountshare
United Kingdom375034.3%
Germany341731.2%
Switzerland127211.6%
France8077.4%
Belgium6716.1%
Sweden3533.2%
Netherlands3142.9%
Italy1071.0%
Denmark900.8%
Spain290.3%
Hungary260.2%
Norway200.2%
Luxembourg200.2%
Estonia170.2%
Finland170.2%
Austria160.1%
Poland90.1%
Ireland40.0%
Czech Republic10.0%

tried numeric feature

This column likely records the number of attempts made for some action (e.g., quiz attempts, login tries, or retry counts), given its name 'tried' and integer-like distribution starting at 1. The distribution is extremely concentrated: Q1, median, and Q3 are all 1.0, yet the mean is ~3.95 and the max is 500, indicating a tiny fraction of records drive nearly all the variance. With 22.5% of rows flagged as outliers and a kurtosis of 316, the tail is extraordinarily heavy and the bulk of users attempt something exactly once.

Treatment: Log-transform (log1p) before modelling, or cap at a meaningful percentile threshold to reduce outlier influence; consider binning into ordinal buckets (1, 2–5, 6+).

anthropic:default · confidence high
Out[25]:

saturn.columns["tried"].stats

statvalue
n10,940
nulls0 (0.0%)
unique111
min 1
max 500
mean 3.952
median 1
std 19.26
q1 1
q3 1
iqr 0
skew 15.61
kurtosis 316
n_outliers 2,457
outlier_rate 0.2246
zero_rate 0
alert: high_skewskew=+15.61
alert: outliers22.5% rows beyond 1.5 IQR
Fig 12.
Distribution of tried. Vertical dash marks the median.
Show data table
Histogram bins for tried (median: 1.0).
bincount
1 – 13.4710493
13.47 – 25.95196
25.95 – 38.4288
38.42 – 50.933
50.9 – 63.3821
63.38 – 75.8514
75.85 – 88.339
88.33 – 100.824
100.8 – 113.36
113.3 – 125.89
125.8 – 138.25
138.2 – 150.76
150.7 – 163.28
163.2 – 175.73
175.7 – 188.15
188.1 – 200.61
200.6 – 213.10
213.1 – 225.53
225.5 – 2381
238 – 250.51
250.5 – 2631
263 – 275.40
275.4 – 287.90
287.9 – 300.40
300.4 – 312.91
312.9 – 325.30
325.3 – 337.81
337.8 – 350.30
350.3 – 362.83
362.8 – 375.20
375.2 – 387.70
387.7 – 400.20
400.2 – 412.73
412.7 – 425.10
425.1 – 437.60
437.6 – 450.10
450.1 – 462.60
462.6 – 475.10
475.1 – 487.50
487.5 – 5005

deaths numeric numeric_target

This column records death counts per observation (likely per event, location, or time period), with the vast majority of rows — 75.5% — being exactly zero and a median of 0. The distribution is extraordinarily right-skewed (skew=28.52, kurtosis=991.65), driven by rare but extreme values reaching a maximum of 500; notably, 24.5% of rows are flagged as outliers, meaning non-zero death counts are statistically rare but not negligible. Only 74 unique values across 10,940 rows confirms the heavily zero-inflated, discrete nature of the data.

Treatment: Model with zero-inflated or negative-binomial regression; apply log1p transform if used as a feature in standard ML pipelines.

anthropic:default · confidence high
Out[28]:

saturn.columns["deaths"].stats

statvalue
n10,940
nulls0 (0.0%)
unique74
min 0
max 500
mean 1.493
median 0
std 13.19
q1 0
q3 0
iqr 0
skew 28.52
kurtosis 991.6
n_outliers 2,684
outlier_rate 0.2453
zero_rate 0.7547
alert: high_skewskew=+28.52
alert: outliers24.5% rows beyond 1.5 IQR
Fig 13.
Distribution of deaths. Vertical dash marks the median.
Show data table
Histogram bins for deaths (median: 0.0).
bincount
0 – 12.510727
12.5 – 25100
25 – 37.543
37.5 – 5020
50 – 62.513
62.5 – 7515
75 – 87.52
87.5 – 1001
100 – 112.54
112.5 – 1252
125 – 137.50
137.5 – 1500
150 – 162.53
162.5 – 1751
175 – 187.50
187.5 – 2000
200 – 212.51
212.5 – 2251
225 – 237.50
237.5 – 2501
250 – 262.50
262.5 – 2750
275 – 287.50
287.5 – 3000
300 – 312.51
312.5 – 3250
325 – 337.50
337.5 – 3500
350 – 362.50
362.5 – 3750
375 – 387.50
387.5 – 4000
400 – 412.50
412.5 – 4250
425 – 437.50
437.5 – 4500
450 – 462.50
462.5 – 4750
475 – 487.50
487.5 – 5005

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-witch-trials-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove witch trials},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-witch-trials}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove witch trials. Source: /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-witch-trials