saturn·

quirky witch trials

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json

Saturn profiled 10,940 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json",
    "--findings", "quirky-witch_trials.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 10,940 historical witch trial records across 6 columns, covering when and where trials occurred, how many people were tried, and how many died. Trials span from 1300 to 1850, with the bulk concentrated around the early 1600s (median year 1630), and they are heavily dominated by the United Kingdom (3,750 records) and Germany (3,417), which together account for roughly two-thirds of the data. The 'deaths' and 'tried' columns are extremely skewed: 75% of records report zero deaths, yet a small set of outlier events reach up to 500, so any aggregate analysis should treat these tails carefully. Also worth flagging: the 'city' field is 47.6% null and spans 906 unique values, so geographic analysis below the country level will be patchy.

citing: row_count · column_count · deaths.stats · tried.stats · year.stats · decade.stats · country.top_values · city.null_rate · city.n_unique

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
year numeric 10,940 8.5% 430 outliers
decade numeric 10,940 0.0% 53 outliers
city categorical 10,940 47.7% 906 null_rate
country categorical 10,940 0.0% 19
tried numeric 10,940 0.0% 111 high_skew outliers
deaths numeric 10,940 0.0% 74 high_skew outliers
Fig 1.
country · Shows how heavily the dataset is concentrated in the UK and Germany versus the long tail of other countries.
Show data table
Top values for country (19 unique shown, of 19 total).
valuecountshare
United Kingdom375034.3%
Germany341731.2%
Switzerland127211.6%
France8077.4%
Belgium6716.1%
Sweden3533.2%
Netherlands3142.9%
Italy1071.0%
Denmark900.8%
Spain290.3%
Hungary260.2%
Norway200.2%
Luxembourg200.2%
Estonia170.2%
Finland170.2%
Austria160.1%
Poland90.1%
Ireland40.0%
Czech Republic10.0%
Fig 2.
year · Reveals when trials clustered in time — expect a sharp peak around the early 1600s.
Show data table
Histogram bins for year (median: 1630.0).
bincount
1300 – 131413
1314 – 132839
1328 – 134129
1341 – 13558
1355 – 13692
1369 – 138216
1382 – 139629
1396 – 141036
1410 – 142431
1424 – 143849
1438 – 145178
1451 – 146585
1465 – 147986
1479 – 1492120
1492 – 150668
1506 – 152040
1520 – 153477
1534 – 1548101
1548 – 1561145
1561 – 1575338
1575 – 1589405
1589 – 16021075
1602 – 1616971
1616 – 16301042
1630 – 1644936
1644 – 16581425
1658 – 16711372
1671 – 1685415
1685 – 1699307
1699 – 1712276
1712 – 1726139
1726 – 174090
1740 – 1754128
1754 – 176821
1768 – 17819
1781 – 17955
1795 – 18090
1809 – 18220
1822 – 18362
1836 – 18501
Fig 3.
deaths · Highlights the extreme skew: most trials record zero deaths, but a few reach into the hundreds.
Show data table
Histogram bins for deaths (median: 0.0).
bincount
0 – 12.510727
12.5 – 25100
25 – 37.543
37.5 – 5020
50 – 62.513
62.5 – 7515
75 – 87.52
87.5 – 1001
100 – 112.54
112.5 – 1252
125 – 137.50
137.5 – 1500
150 – 162.53
162.5 – 1751
175 – 187.50
187.5 – 2000
200 – 212.51
212.5 – 2251
225 – 237.50
237.5 – 2501
250 – 262.50
262.5 – 2750
275 – 287.50
287.5 – 3000
300 – 312.51
312.5 – 3250
325 – 337.50
337.5 – 3500
350 – 362.50
362.5 – 3750
375 – 387.50
387.5 – 4000
400 – 412.50
412.5 – 4250
425 – 437.50
437.5 – 4500
450 – 462.50
462.5 – 4750
475 – 487.50
487.5 – 5005
Fig 4.
tried · Most events involve a single accused person, but rare mass trials stretch the tail to 500.
Show data table
Histogram bins for tried (median: 1.0).
bincount
1 – 13.4710493
13.47 – 25.95196
25.95 – 38.4288
38.42 – 50.933
50.9 – 63.3821
63.38 – 75.8514
75.85 – 88.339
88.33 – 100.824
100.8 – 113.36
113.3 – 125.89
125.8 – 138.25
138.2 – 150.76
150.7 – 163.28
163.2 – 175.73
175.7 – 188.15
188.1 – 200.61
200.6 – 213.10
213.1 – 225.53
225.5 – 2381
238 – 250.51
250.5 – 2631
263 – 275.40
275.4 – 287.90
287.9 – 300.40
300.4 – 312.91
312.9 – 325.30
325.3 – 337.81
337.8 – 350.30
350.3 – 362.83
362.8 – 375.20
375.2 – 387.70
387.7 – 400.20
400.2 – 412.73
412.7 – 425.10
425.1 – 437.60
437.6 – 450.10
450.1 – 462.60
462.6 – 475.10
475.1 – 487.50
487.5 – 5005
Fig 5.
decade · A coarser view of temporal concentration, useful for spotting which decades dominate the record.
Show data table
Histogram bins for decade (median: 1620.0).
bincount
1300 – 131432
1314 – 132821
1328 – 134133
1341 – 13554
1355 – 13693
1369 – 138232
1382 – 139625
1396 – 141032
1410 – 142455
1424 – 143852
1438 – 1451147
1451 – 146565
1465 – 147968
1479 – 1492188
1492 – 150639
1506 – 152043
1520 – 1534137
1534 – 1548106
1548 – 1561420
1561 – 1575291
1575 – 1589373
1589 – 16021576
1602 – 1616878
1616 – 1630934
1630 – 16441835
1644 – 1658872
1658 – 16711469
1671 – 1685260
1685 – 1699295
1699 – 1712302
1712 – 1726105
1726 – 174068
1740 – 1754151
1754 – 17688
1768 – 178114
1781 – 17954
1795 – 18090
1809 – 18221
1822 – 18361
1836 – 18501
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
yearnumeric8.5%
decadenumeric0.0%
citycategorical47.7%
countrycategorical0.0%
triednumeric0.0%
deathsnumeric0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
yeardecadetrieddeaths
year+1.00+0.36-0.19-0.16
decade+0.36+1.00+0.00+0.03
tried-0.19+0.00+1.00+0.65
deaths-0.16+0.03+0.65+1.00

year numeric timestamp

This is a 'year' field spanning 1300-1850 with median 1630, so it likely records the creation or publication year of historical items rather than modern records. The distribution is left-skewed (skew -1.59, kurtosis 4.32) with 714 outliers (7.13%) sitting in the early-century tail, and 8.51% of rows are null. The tight IQR of 63 years (1597-1660) shows the corpus concentrates heavily in the late-Renaissance/early-Baroque period.

Treatment: Impute or bucket into era bins before modelling; consider winsorising the pre-1500 tail.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["year"].stats

statvalue
n10,940
nulls931 (8.5%)
unique430
min 1,300
max 1,850
mean 1621
median 1,630
std 66.25
q1 1,597
q3 1,660
iqr 63
skew -1.586
kurtosis 4.319
n_outliers 714
outlier_rate 0.07134
zero_rate 0
alert: outliers7.1% rows beyond 1.5 IQR
Fig 8.
Distribution of year. Vertical dash marks the median.
Show data table
Histogram bins for year (median: 1630.0).
bincount
1300 – 131413
1314 – 132839
1328 – 134129
1341 – 13558
1355 – 13692
1369 – 138216
1382 – 139629
1396 – 141036
1410 – 142431
1424 – 143849
1438 – 145178
1451 – 146585
1465 – 147986
1479 – 1492120
1492 – 150668
1506 – 152040
1520 – 153477
1534 – 1548101
1548 – 1561145
1561 – 1575338
1575 – 1589405
1589 – 16021075
1602 – 1616971
1616 – 16301042
1630 – 1644936
1644 – 16581425
1658 – 16711372
1671 – 1685415
1685 – 1699307
1699 – 1712276
1712 – 1726139
1726 – 174090
1740 – 1754128
1754 – 176821
1768 – 17819
1781 – 17955
1795 – 18090
1809 – 18220
1822 – 18362
1836 – 18501

decade numeric timestamp

This is a 'decade' field expressed as a four-digit year, ranging from 1300 to 1850 with a median of 1620 and a tight IQR of 60 years (Q1 1590, Q3 1650). The distribution is strongly left-skewed (skew -1.48, kurtosis 3.85) and 848 rows (7.75%) fall outside the Tukey fences — likely the pre-1500 records pulling the tail. With only 53 unique values across 10,940 rows and no nulls, it behaves as a coarse temporal bin rather than a continuous measurement.

Treatment: Treat as an ordered categorical decade bin; consider grouping pre-1500 tail before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["decade"].stats

statvalue
n10,940
nulls0 (0.0%)
unique53
min 1,300
max 1,850
mean 1615
median 1,620
std 66.68
q1 1,590
q3 1,650
iqr 60
skew -1.478
kurtosis 3.85
n_outliers 848
outlier_rate 0.07751
zero_rate 0
alert: outliers7.8% rows beyond 1.5 IQR
Fig 9.
Distribution of decade. Vertical dash marks the median.
Show data table
Histogram bins for decade (median: 1620.0).
bincount
1300 – 131432
1314 – 132821
1328 – 134133
1341 – 13554
1355 – 13693
1369 – 138232
1382 – 139625
1396 – 141032
1410 – 142455
1424 – 143852
1438 – 1451147
1451 – 146565
1465 – 147968
1479 – 1492188
1492 – 150639
1506 – 152043
1520 – 1534137
1534 – 1548106
1548 – 1561420
1561 – 1575291
1575 – 1589373
1589 – 16021576
1602 – 1616878
1616 – 1630934
1630 – 16441835
1644 – 1658872
1658 – 16711469
1671 – 1685260
1685 – 1699295
1699 – 1712302
1712 – 1726105
1726 – 174068
1740 – 1754151
1754 – 17688
1768 – 178114
1781 – 17954
1795 – 18090
1809 – 18221
1822 – 18361
1836 – 18501

city categorical feature

This is a city-name categorical feature with 906 distinct values across 10,940 rows, dominated by European locations (Geneva, Budingen, Bruges, Munich). Nearly half the rows are null (47.65%), and even the top value only covers 5.59% of non-nulls, giving high entropy (ratio 0.856) and a long tail. The null rate combined with high cardinality is the main concern for downstream use.

Treatment: Impute or flag missingness, then group rare cities into an 'other' bucket before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["city"].stats

statvalue
n10,940
nulls5,213 (47.7%)
unique906
top_value Geneva
top_rate 0.05588
cardinality 906
entropy 8.406
entropy_ratio 0.8557
alert: null_rate47.7% null
Fig 10.
Top values for city.
Show data table
Top values for city (20 unique shown, of 906 total).
valuecountshare
Geneva3202.9%
Budingen2642.4%
Bruges1211.1%
Munich800.7%
Augsburg750.7%
Vesoul640.6%
Venice630.6%
Boudry590.5%
Valangin580.5%
Fleurier540.5%
Gelnhausen500.5%
Arnsberg490.4%
Thielle-Wavre470.4%
Burghausen440.4%
Grundau440.4%
Neuchatel420.4%
Colombier420.4%
Stuttgart390.4%
Mitterfels380.3%
Namur370.3%

country categorical feature

Country of origin as a categorical label, with 19 distinct values across 10,940 rows and no nulls. The distribution is European-heavy and concentrated: United Kingdom alone accounts for 34.3% (3,750), Germany 3,417, and Switzerland 1,272, while the long tail (Italy, Denmark, Spain) drops to double or even single digits. Entropy ratio of 0.59 confirms moderate concentration rather than a uniform spread.

Treatment: One-hot encode the top countries and bucket the low-frequency tail into 'Other' before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["country"].stats

statvalue
n10,940
nulls0 (0.0%)
unique19
top_value United Kingdom
top_rate 0.3428
cardinality 19
entropy 2.502
entropy_ratio 0.5889
Fig 11.
Top values for country.
Show data table
Top values for country (19 unique shown, of 19 total).
valuecountshare
United Kingdom375034.3%
Germany341731.2%
Switzerland127211.6%
France8077.4%
Belgium6716.1%
Sweden3533.2%
Netherlands3142.9%
Italy1071.0%
Denmark900.8%
Spain290.3%
Hungary260.2%
Norway200.2%
Luxembourg200.2%
Estonia170.2%
Finland170.2%
Austria160.1%
Poland90.1%
Ireland40.0%
Czech Republic10.0%

tried numeric feature

`tried` is a numeric attempt counter that is heavily concentrated at 1: the median, q1, and q3 are all 1.0, yet values stretch up to 500.0 with a mean of 3.95. Skew of 15.6 and kurtosis of 316 confirm an extremely long right tail, and 2457 rows (22.5%) flag as outliers under the IQR rule even though the IQR itself is 0. With only 111 distinct values and no nulls or zeros, this looks like a 'number of tries' field where most events succeed immediately and a small minority retry many times.

Treatment: log1p-transform or cap at a high quantile before modelling to tame the long tail.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["tried"].stats

statvalue
n10,940
nulls0 (0.0%)
unique111
min 1
max 500
mean 3.952
median 1
std 19.26
q1 1
q3 1
iqr 0
skew 15.61
kurtosis 316
n_outliers 2,457
outlier_rate 0.2246
zero_rate 0
alert: high_skewskew=+15.61
alert: outliers22.5% rows beyond 1.5 IQR
Fig 12.
Distribution of tried. Vertical dash marks the median.
Show data table
Histogram bins for tried (median: 1.0).
bincount
1 – 13.4710493
13.47 – 25.95196
25.95 – 38.4288
38.42 – 50.933
50.9 – 63.3821
63.38 – 75.8514
75.85 – 88.339
88.33 – 100.824
100.8 – 113.36
113.3 – 125.89
125.8 – 138.25
138.2 – 150.76
150.7 – 163.28
163.2 – 175.73
175.7 – 188.15
188.1 – 200.61
200.6 – 213.10
213.1 – 225.53
225.5 – 2381
238 – 250.51
250.5 – 2631
263 – 275.40
275.4 – 287.90
287.9 – 300.40
300.4 – 312.91
312.9 – 325.30
325.3 – 337.81
337.8 – 350.30
350.3 – 362.83
362.8 – 375.20
375.2 – 387.70
387.7 – 400.20
400.2 – 412.73
412.7 – 425.10
425.1 – 437.60
437.6 – 450.10
450.1 – 462.60
462.6 – 475.10
475.1 – 487.50
487.5 – 5005

deaths numeric numeric_target

Numeric count of deaths per record, with 10940 rows and only 74 distinct values. The distribution is heavily zero-inflated (zero_rate 0.7547) with median, q1, and q3 all at 0, yet the max reaches 500 and skew is 28.52 with kurtosis 991.65. Roughly 24.5% of rows (2684) flag as outliers, signalling rare but extreme mortality events rather than dirty data.

Treatment: Model with a zero-inflated or hurdle approach, or log1p-transform before regression.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["deaths"].stats

statvalue
n10,940
nulls0 (0.0%)
unique74
min 0
max 500
mean 1.493
median 0
std 13.19
q1 0
q3 0
iqr 0
skew 28.52
kurtosis 991.6
n_outliers 2,684
outlier_rate 0.2453
zero_rate 0.7547
alert: high_skewskew=+28.52
alert: outliers24.5% rows beyond 1.5 IQR
Fig 13.
Distribution of deaths. Vertical dash marks the median.
Show data table
Histogram bins for deaths (median: 0.0).
bincount
0 – 12.510727
12.5 – 25100
25 – 37.543
37.5 – 5020
50 – 62.513
62.5 – 7515
75 – 87.52
87.5 – 1001
100 – 112.54
112.5 – 1252
125 – 137.50
137.5 – 1500
150 – 162.53
162.5 – 1751
175 – 187.50
187.5 – 2000
200 – 212.51
212.5 – 2251
225 – 237.50
237.5 – 2501
250 – 262.50
262.5 – 2750
275 – 287.50
287.5 – 3000
300 – 312.51
312.5 – 3250
325 – 337.50
337.5 – 3500
350 – 362.50
362.5 – 3750
375 – 387.50
387.5 – 4000
400 – 412.50
412.5 – 4250
425 – 437.50
437.5 – 4500
450 – 462.50
462.5 – 4750
475 – 487.50
487.5 – 5005

How to cite

click to copy

BibTeX
@misc{saturn-quirky-witch-trials-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky witch trials},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-witch_trials}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky witch trials. Source: /home/coolhand/html/datavis/data_trove/data/quirky/witch_trials.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-witch_trials