joshua-project-joshua_project_countries

Overview

Source: /home/coolhand/html/datavis/data_trove/joshua-project/joshua_project_countries.json

Saturn profiled 238 rows across 39 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/joshua-project/joshua_project_countries.json",
    "--findings", "joshua-project-joshua_project_countries.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset profiles 238 countries from the Joshua Project, combining demographic data (population, people groups, languages) with religious composition percentages and Bible translation/evangelization status. Christianity dominates as the primary religion in 159 of 238 countries, while the JPScaleText field shows 89 countries are 'Significantly Reached' versus 43 'Unreached' — a useful starting lens for mission analysis. Population and people-group counts are extremely right-skewed (skew >9, with outliers like 1.46B population), so log-scale views or per-capita ratios will be more informative than raw totals. Religion percentage columns also have very high zero-rates (e.g., Hinduism 56%, Buddhism 48%), reflecting that most countries have negligible presence of any given non-dominant religion. Note also that PoplPeoplesFPG and CntPeoplesFPG have substantial null rates (32% and 29%), so any analysis of frontier/unreached people groups should account for missing coverage.

citing: ReligionPrimary · JPScaleText · RegionName · Window1040 · PercentChristianity · Population · PoplPeoplesFPG · CntPeoplesFPG · PercentIslam · PercentHinduism · PercentBuddhism

Out[4]:

saturn.schema() · 39 columns

column	kind	n	null%	unique	alerts
PoplPeoplesLR	numeric	238	17.6%	180	high_skew outliers
PercentUnknown	numeric	238	0.0%	158	high_skew
ReligionPrimary	categorical	238	0.0%	6
SecurityLevel	numeric	238	0.0%	3
PercentNonReligious	numeric	238	0.0%	228	high_skew outliers
ISO2	categorical	238	0.0%	238	long_tail
JPScaleImageURL	categorical	238	0.0%	5
PercentEthnicReligions	numeric	238	0.0%	199	high_skew outliers
RegionName	categorical	238	0.0%	12
CntPeoples	numeric	238	0.0%	96	high_skew outliers
PoplPeoplesFPG	numeric	238	31.5%	142	null_rate high_skew outliers
ROG3	categorical	238	0.0%	238	long_tail
PercentEvangelical	numeric	238	2.5%	232
ROL3OfficialLanguage	categorical	238	0.0%	88	long_tail
TranslationUnspecified	numeric	238	0.0%	21	high_skew outliers
TranslationNeeded	numeric	238	0.0%	18	high_skew outliers
TranslationStarted	numeric	238	0.0%	30	high_skew outliers
RegionCode	numeric	238	0.0%	12
Ctry	categorical	238	0.0%	238	long_tail
BibleNewTestament	numeric	238	0.0%	45	high_skew outliers
BibleComplete	numeric	238	0.0%	54	high_skew outliers
Window1040	categorical	238	0.0%	2
JPScaleText	categorical	238	0.0%	5
CntPrimaryLanguages	numeric	238	0.0%	91	high_skew outliers
OfficialLang	categorical	238	0.4%	87	long_tail
BiblePortions	numeric	238	0.0%	35	high_skew outliers
ROG2	categorical	238	0.0%	7
PercentIslam	numeric	238	0.0%	198	outliers
RLG3Primary	numeric	238	0.0%	6
Capital	categorical	238	1.7%	233	long_tail
Population	numeric	238	0.0%	230	high_skew outliers
CntPeoplesLR	numeric	238	15.1%	57	high_skew outliers
CntPeoplesFPG	numeric	238	28.6%	44	null_rate high_skew outliers
PercentHinduism	numeric	238	0.0%	106	high_skew outliers
PercentOtherSmall	numeric	238	0.0%	199	high_skew outliers
PercentBuddhism	numeric	238	0.0%	125	high_skew outliers
JPScaleCtry	numeric	238	0.0%	5
PercentChristianity	numeric	238	0.0%	237
ISO3	categorical	238	0.0%	238	long_tail

Fig 1.

ReligionPrimary · Christianity is the primary religion for two-thirds of countries; Islam is a distant second.

Show data table

Top values for ReligionPrimary (6 unique shown, of 6 total).
value	count	share
Christianity	159	66.8%
Islam	55	23.1%
Buddhism	10	4.2%
Ethnic Religions	6	2.5%
Non-Religious	5	2.1%
Hinduism	3	1.3%

Fig 2.

JPScaleText · Distribution across the Joshua Project reach scale — 'Significantly Reached' leads but 'Unreached' is the third-largest bucket.

Show data table

Top values for JPScaleText (5 unique shown, of 5 total).
value	count	share
Significantly Reached	89	37.4%
Partially Reached	67	28.2%
Unreached	43	18.1%
Superficially Reached	28	11.8%
Minimally Reached	11	4.6%

Fig 3.

RegionName · Country counts by world region show fairly even coverage across Africa, Europe, the Americas, and Asia.

Show data table

Top values for RegionName (12 unique shown, of 12 total).
value	count	share
America, North and Caribbean	30	12.6%
Europe, Western	28	11.8%
Africa, East and Southern	28	11.8%
Australia and Pacific	27	11.3%
Africa, West and Central	24	10.1%
Europe, Eastern and Eurasia	23	9.7%
America, Latin	22	9.2%
Africa, North and Middle East	19	8.0%
Asia, Southeast	11	4.6%
Asia, Central	10	4.2%
Asia, South	8	3.4%
Asia, Northeast	8	3.4%

Fig 4.

PercentChristianity · Bimodal-leaning distribution: many countries are either heavily Christian or barely so, with few in between.

Show data table

Histogram bins for PercentChristianity (median: 75.30076398378219).
bin	count
0.0165 – 6.682	48
6.682 – 13.35	16
13.35 – 20.01	2
20.01 – 26.68	2
26.68 – 33.34	5
33.34 – 40.01	3
40.01 – 46.68	5
46.68 – 53.34	8
53.34 – 60.01	6
60.01 – 66.67	13
66.67 – 73.34	7
73.34 – 80	15
80 – 86.67	25
86.67 – 93.33	42
93.33 – 100	41

Fig 5.

Window1040 · About 29% of countries fall inside the 10/40 Window, the typical focus zone for unreached-peoples work.

Show data table

Top values for Window1040 (2 unique shown, of 2 total).
value	count	share
N	170	71.4%
Y	68	28.6%

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
PoplPeoplesLR	numeric	17.6%
PercentUnknown	numeric	0.0%
ReligionPrimary	categorical	0.0%
SecurityLevel	numeric	0.0%
PercentNonReligious	numeric	0.0%
ISO2	categorical	0.0%
JPScaleImageURL	categorical	0.0%
PercentEthnicReligions	numeric	0.0%
RegionName	categorical	0.0%
CntPeoples	numeric	0.0%
PoplPeoplesFPG	numeric	31.5%
ROG3	categorical	0.0%
PercentEvangelical	numeric	2.5%
ROL3OfficialLanguage	categorical	0.0%
TranslationUnspecified	numeric	0.0%
TranslationNeeded	numeric	0.0%
TranslationStarted	numeric	0.0%
RegionCode	numeric	0.0%
Ctry	categorical	0.0%
BibleNewTestament	numeric	0.0%
BibleComplete	numeric	0.0%
Window1040	categorical	0.0%
JPScaleText	categorical	0.0%
CntPrimaryLanguages	numeric	0.0%
OfficialLang	categorical	0.4%
BiblePortions	numeric	0.0%
ROG2	categorical	0.0%
PercentIslam	numeric	0.0%
RLG3Primary	numeric	0.0%
Capital	categorical	1.7%
Population	numeric	0.0%
CntPeoplesLR	numeric	15.1%
CntPeoplesFPG	numeric	28.6%
PercentHinduism	numeric	0.0%
PercentOtherSmall	numeric	0.0%
PercentBuddhism	numeric	0.0%
JPScaleCtry	numeric	0.0%
PercentChristianity	numeric	0.0%
ISO3	categorical	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
	PoplPeoplesLR	PercentUnknown	SecurityLevel	PercentNonReligious	PercentEthnicReligions	CntPeoples	PoplPeoplesFPG	PercentEvangelical	TranslationUnspecified	TranslationNeeded	TranslationStarted	RegionCode
PoplPeoplesLR	+1.00	+0.03	+0.02	-0.03	+0.03	-0.00	-0.00	-0.10	-0.02	+0.00	-0.01	+0.03
PercentUnknown	+0.03	+1.00	+0.40	-0.17	+0.06	+0.26	+0.04	-0.09	+0.06	+0.10	+0.11	-0.14
SecurityLevel	+0.02	+0.40	+1.00	-0.16	+0.09	+0.31	-0.02	-0.10	+0.22	+0.27	+0.30	-0.30
PercentNonReligious	-0.03	-0.17	-0.16	+1.00	-0.01	-0.06	+0.04	-0.14	+0.02	+0.13	-0.11	-0.02
PercentEthnicReligions	+0.03	+0.06	+0.09	-0.01	+1.00	+0.02	-0.01	-0.05	+0.08	+0.21	+0.07	-0.16
CntPeoples	-0.00	+0.26	+0.31	-0.06	+0.02	+1.00	-0.03	-0.04	+0.38	+0.33	+0.41	-0.15
PoplPeoplesFPG	-0.00	+0.04	-0.02	+0.04	-0.01	-0.03	+1.00	-0.00	-0.02	-0.03	-0.03	+0.08
PercentEvangelical	-0.10	-0.09	-0.10	-0.14	-0.05	-0.04	-0.00	+1.00	-0.03	-0.06	-0.02	+0.08
TranslationUnspecified	-0.02	+0.06	+0.22	+0.02	+0.08	+0.38	-0.02	-0.03	+1.00	+0.62	+0.70	-0.14
TranslationNeeded	+0.00	+0.10	+0.27	+0.13	+0.21	+0.33	-0.03	-0.06	+0.62	+1.00	+0.42	-0.14
TranslationStarted	-0.01	+0.11	+0.30	-0.11	+0.07	+0.41	-0.03	-0.02	+0.70	+0.42	+1.00	-0.13
RegionCode	+0.03	-0.14	-0.30	-0.02	-0.16	-0.15	+0.08	+0.08	-0.14	-0.14	-0.13	+1.00

PoplPeoplesLR numeric feature

Likely a population count by some geographic or organisational unit ('PoplPeoples'), spanning from 50 to ~1.39B with a median of 532,500. The distribution is extremely right-skewed (skew 12.1, kurtosis 156) with 31 outliers (15.8%) and a std (~104M) far exceeding the mean (~18M), suggesting a few massive entities dominate. Also notable: 17.65% of rows are null.

Treatment: Log-transform and impute the ~17.65% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["PoplPeoplesLR"].stats

stat	value
n	238
nulls	42 (17.6%)
unique	180
min	50
max	1.395e+09
mean	1.823e+07
median	532,500
std	1.038e+08
q1	32,750
q3	6.022e+06
iqr	5.99e+06
skew	12.1
kurtosis	156.4
n_outliers	31
outlier_rate	0.1582
zero_rate	0
alert: high_skew	skew=+12.10
alert: outliers	15.8% rows beyond 1.5 IQR

Fig 8.

Distribution of PoplPeoplesLR. Vertical dash marks the median.

Show data table

Histogram bins for PoplPeoplesLR (median: 532500.0).
bin	count
50 – 9.963e+07	190
9.963e+07 – 1.993e+08	3
1.993e+08 – 2.989e+08	2
2.989e+08 – 3.985e+08	0
3.985e+08 – 4.982e+08	0
4.982e+08 – 5.978e+08	0
5.978e+08 – 6.974e+08	0
6.974e+08 – 7.971e+08	0
7.971e+08 – 8.967e+08	0
8.967e+08 – 9.963e+08	0
9.963e+08 – 1.096e+09	0
1.096e+09 – 1.196e+09	0
1.196e+09 – 1.295e+09	0
1.295e+09 – 1.395e+09	1

PercentUnknown numeric feature

PercentUnknown is a numeric proportion ranging from 0.0 to 2.679, with a mean of 0.218 and median of 0.176. About 34% of the 238 rows are exactly zero (q1 is also 0.0), yet the column is heavily right-tailed with skew 3.92 and kurtosis 32.16, plus a max above 1.0 that is unusual if this was meant to be a 0-1 share. Three outliers (1.3%) sit far above the bulk of the distribution.

Treatment: Verify whether values >1 are valid, then log1p-transform given the heavy right skew and zero-inflation.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["PercentUnknown"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	158
min	0
max	2.679
mean	0.2176
median	0.1758
std	0.2604
q1	0
q3	0.3673
iqr	0.3673
skew	3.918
kurtosis	32.16
n_outliers	3
outlier_rate	0.01261
zero_rate	0.3403
alert: high_skew	skew=+3.92

Fig 9.

Distribution of PercentUnknown. Vertical dash marks the median.

Show data table

Histogram bins for PercentUnknown (median: 0.175768845635591).
bin	count
0 – 0.1786	119
0.1786 – 0.3572	57
0.3572 – 0.5358	50
0.5358 – 0.7144	6
0.7144 – 0.8931	3
0.8931 – 1.072	2
1.072 – 1.25	0
1.25 – 1.429	0
1.429 – 1.607	0
1.607 – 1.786	0
1.786 – 1.965	0
1.965 – 2.143	0
2.143 – 2.322	0
2.322 – 2.501	0
2.501 – 2.679	1

ReligionPrimary categorical feature

Primary religion of each record across 238 rows with 6 distinct values and no nulls. Christianity dominates at 159/238 (top_rate 0.668), followed by Islam at 55, with Buddhism, Ethnic Religions, Non-Religious, and Hinduism sharing the long tail under 10 each. Entropy ratio of 0.54 confirms the heavy concentration in one category.

Treatment: One-hot encode, optionally collapsing the four smallest categories into 'Other' to handle imbalance.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["ReligionPrimary"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	6
top_value	Christianity
top_rate	0.6681
cardinality	6
entropy	1.4
entropy_ratio	0.5415

Fig 10.

Top values for ReligionPrimary.

Show data table

Top values for ReligionPrimary (6 unique shown, of 6 total).
value	count	share
Christianity	159	66.8%
Islam	55	23.1%
Buddhism	10	4.2%
Ethnic Religions	6	2.5%
Non-Religious	5	2.1%
Hinduism	3	1.3%

SecurityLevel numeric feature

SecurityLevel takes only 3 distinct integer values across 238 rows (min 0, max 2) with no nulls, suggesting an ordinal tier code rather than a continuous measure. The distribution is heavily weighted toward the lowest tier: 67.6% of rows are zero, the median is 0, and the mean is just 0.55, producing a right skew of 1.01. No outliers were flagged, which is consistent with a bounded categorical scale.

Treatment: Treat as an ordinal category (0/1/2) rather than a continuous numeric.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["SecurityLevel"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	3
min	0
max	2
mean	0.5462
median	0
std	0.8344
q1	0
q3	1
iqr	1
skew	1.011
kurtosis	-0.796
n_outliers	0
outlier_rate	0
zero_rate	0.6765

Fig 11.

Distribution of SecurityLevel. Vertical dash marks the median.

Show data table

Histogram bins for SecurityLevel (median: 0.0).
bin	count
0 – 0.1333	161
0.1333 – 0.2667	0
0.2667 – 0.4	0
0.4 – 0.5333	0
0.5333 – 0.6667	0
0.6667 – 0.8	0
0.8 – 0.9333	0
0.9333 – 1.067	24
1.067 – 1.2	0
1.2 – 1.333	0
1.333 – 1.467	0
1.467 – 1.6	0
1.6 – 1.733	0
1.733 – 1.867	0
1.867 – 2	53

PercentNonReligious numeric feature

This column reports the percentage of a population that is non-religious across 238 rows, with 228 unique values and no nulls. The distribution is heavily right-skewed (skew 2.61, kurtosis 8.00): the median is just 2.85% while the mean is 7.31% and the max reaches 68.81%, with 30 outliers (12.6% of rows) and 4.6% of values exactly zero. The std (11.39) dwarfs the IQR (7.08), confirming a long upper tail rather than a symmetric spread.

Treatment: Log1p- or rank-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["PercentNonReligious"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	228
min	0
max	68.81
mean	7.308
median	2.851
std	11.39
q1	0.5635
q3	7.646
iqr	7.083
skew	2.611
kurtosis	7.999
n_outliers	30
outlier_rate	0.1261
zero_rate	0.04622
alert: high_skew	skew=+2.61
alert: outliers	12.6% rows beyond 1.5 IQR

Fig 12.

Distribution of PercentNonReligious. Vertical dash marks the median.

Show data table

Histogram bins for PercentNonReligious (median: 2.851063037151095).
bin	count
0 – 4.587	150
4.587 – 9.174	34
9.174 – 13.76	9
13.76 – 18.35	15
18.35 – 22.94	6
22.94 – 27.52	7
27.52 – 32.11	5
32.11 – 36.7	3
36.7 – 41.28	3
41.28 – 45.87	3
45.87 – 50.46	0
50.46 – 55.05	1
55.05 – 59.63	0
59.63 – 64.22	0
64.22 – 68.81	2

ISO2 categorical identifier

This column holds ISO2 country codes (AF, AL, DZ, AS, AD...), serving as a unique key with 238 distinct values across 238 rows and zero nulls. Cardinality equals row count and entropy_ratio is 1.0, meaning every code appears exactly once — top_rate is just 0.0042. The long_tail alert is expected here since the column is effectively a primary identifier.

Treatment: Use as the join key to merge country-level attributes.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["ISO2"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	238
top_value	AF
top_rate	0.004202
cardinality	238
entropy	7.895
entropy_ratio	1
alert: long_tail	238 singleton categories

Fig 13.

Top values for ISO2.

Show data table

Top values for ISO2 (20 unique shown, of 238 total).
value	count	share
AF	1	0.4%
AL	1	0.4%
DZ	1	0.4%
AS	1	0.4%
AD	1	0.4%
AO	1	0.4%
AI	1	0.4%
AG	1	0.4%
AR	1	0.4%
AM	1	0.4%
AW	1	0.4%
AU	1	0.4%
AT	1	0.4%
AZ	1	0.4%
BS	1	0.4%
BH	1	0.4%
BD	1	0.4%
BB	1	0.4%
BY	1	0.4%
BE	1	0.4%

JPScaleImageURL categorical feature

This column holds URLs to one of five Joshua Project 'gauge' images (gauge-1.png through gauge-5.png), almost certainly a visual encoding of an ordinal progress/status score on a 1-5 scale. With only 5 unique values across 238 rows and no nulls, gauge-5 leads at 37.4% (89 rows) while gauge-2 is rarest at 11 rows, suggesting a skew toward the high end of the scale. The URL itself carries no information beyond the trailing digit.

Treatment: Extract the trailing digit (1-5) and treat as an ordinal feature; drop the URL string.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["JPScaleImageURL"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	5
top_value	https://joshuaproject.net/assets/img/gauge/gauge-5.png
top_rate	0.3739
cardinality	5
entropy	2.06
entropy_ratio	0.8871

Fig 14.

Top values for JPScaleImageURL.

Show data table

Top values for JPScaleImageURL (5 unique shown, of 5 total).
value	count	share
https://joshuaproject.net/assets/img/gauge/gauge-5.png	89	37.4%
https://joshuaproject.net/assets/img/gauge/gauge-4.png	67	28.2%
https://joshuaproject.net/assets/img/gauge/gauge-1.png	43	18.1%
https://joshuaproject.net/assets/img/gauge/gauge-3.png	28	11.8%
https://joshuaproject.net/assets/img/gauge/gauge-2.png	11	4.6%

PercentEthnicReligions numeric feature

Numeric share (0–75.16) representing the percentage of ethnic-religion adherents per row, likely one country or region per record across 238 entries with 199 unique values. The distribution is heavily right-skewed (skew 3.29, kurtosis 12.38) with a median of just 1.12% but a mean of 5.59% and a long tail producing 27 outliers (11.3% of rows); 16.8% of rows are exact zeros. The IQR (5.52) is far smaller than the std (11.21), confirming most values cluster near zero while a few cases dominate.

Treatment: Apply a log1p or similar transform before modelling to tame the heavy right tail and zero mass.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["PercentEthnicReligions"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	199
min	0
max	75.16
mean	5.59
median	1.116
std	11.21
q1	0.05109
q3	5.576
iqr	5.525
skew	3.295
kurtosis	12.38
n_outliers	27
outlier_rate	0.1134
zero_rate	0.1681
alert: high_skew	skew=+3.29
alert: outliers	11.3% rows beyond 1.5 IQR

Fig 15.

Distribution of PercentEthnicReligions. Vertical dash marks the median.

Show data table

Histogram bins for PercentEthnicReligions (median: 1.1158466177266).
bin	count
0 – 5.011	172
5.011 – 10.02	29
10.02 – 15.03	12
15.03 – 20.04	5
20.04 – 25.05	6
25.05 – 30.06	1
30.06 – 35.07	3
35.07 – 40.09	4
40.09 – 45.1	2
45.1 – 50.11	0
50.11 – 55.12	0
55.12 – 60.13	2
60.13 – 65.14	1
65.14 – 70.15	0
70.15 – 75.16	1

RegionName categorical feature

RegionName is a categorical geographic grouping with 12 distinct values across 238 rows and no nulls. The distribution is remarkably even — entropy ratio 0.96 and the top bucket 'America, North and Caribbean' accounts for only 12.6% (30 rows) — suggesting these are world regions assigned to countries or similar entities. The Asian regions (Southeast: 11, Central: 10) are notably smaller than the African and European groupings.

Treatment: one-hot or target-encode for modelling; safe to use as a grouping key.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["RegionName"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	12
top_value	America, North and Caribbean
top_rate	0.1261
cardinality	12
entropy	3.454
entropy_ratio	0.9634

Fig 16.

Top values for RegionName.

Show data table

Top values for RegionName (12 unique shown, of 12 total).
value	count	share
America, North and Caribbean	30	12.6%
Europe, Western	28	11.8%
Africa, East and Southern	28	11.8%
Australia and Pacific	27	11.3%
Africa, West and Central	24	10.1%
Europe, Eastern and Eurasia	23	9.7%
America, Latin	22	9.2%
Africa, North and Middle East	19	8.0%
Asia, Southeast	11	4.6%
Asia, Central	10	4.2%
Asia, South	8	3.4%
Asia, Northeast	8	3.4%

CntPeoples numeric feature

Numeric count of people per record, fully populated across 238 rows with 96 distinct values ranging from 1 to 2262. The distribution is severely right-skewed (skew 8.20, kurtosis 85.91): the median is 24.5 and Q3 is 56.75, yet the mean is 68.83 and the max reaches 2262, with 24 outliers (10.1% outlier rate). Std (184.37) far exceeds the IQR (48.75), confirming a long heavy tail.

Treatment: log-transform (or winsorize the top decile) before any distance- or variance-based modelling.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["CntPeoples"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	96
min	1
max	2,262
mean	68.83
median	24.5
std	184.4
q1	8
q3	56.75
iqr	48.75
skew	8.202
kurtosis	85.91
n_outliers	24
outlier_rate	0.1008
zero_rate	0
alert: high_skew	skew=+8.20
alert: outliers	10.1% rows beyond 1.5 IQR

Fig 17.

Distribution of CntPeoples. Vertical dash marks the median.

Show data table

Histogram bins for CntPeoples (median: 24.5).
bin	count
1 – 151.7	216
151.7 – 302.5	13
302.5 – 453.2	2
453.2 – 603.9	3
603.9 – 754.7	0
754.7 – 905.4	3
905.4 – 1056	0
1056 – 1207	0
1207 – 1358	0
1358 – 1508	0
1508 – 1659	0
1659 – 1810	0
1810 – 1961	0
1961 – 2111	0
2111 – 2262	1

PoplPeoplesFPG numeric feature

Likely a population count of 'people groups' (PoplPeoplesFPG), with values ranging from 50 to roughly 1.09 billion and a median of 217,000. The distribution is extremely right-skewed (skew 11.6, kurtosis 138.7) with 19% of values flagged as outliers, and 31.5% of rows are null. The mean (~12.26M) sits far above the median, confirming a handful of massive groups dominate.

Treatment: log-transform and impute the 31.5% nulls before any modelling.

anthropic:claude-opus-4-7 · confidence high

Out[43]:

saturn.columns["PoplPeoplesFPG"].stats

stat	value
n	238
nulls	75 (31.5%)
unique	142
min	50
max	1.09e+09
mean	1.226e+07
median	217,000
std	8.773e+07
q1	16,500
q3	1.842e+06
iqr	1.825e+06
skew	11.6
kurtosis	138.7
n_outliers	31
outlier_rate	0.1902
zero_rate	0
alert: null_rate	31.5% null
alert: high_skew	skew=+11.60
alert: outliers	19.0% rows beyond 1.5 IQR

Fig 18.

Distribution of PoplPeoplesFPG. Vertical dash marks the median.

Show data table

Histogram bins for PoplPeoplesFPG (median: 217000.0).
bin	count
50 – 9.082e+07	161
9.082e+07 – 1.816e+08	0
1.816e+08 – 2.725e+08	1
2.725e+08 – 3.633e+08	0
3.633e+08 – 4.541e+08	0
4.541e+08 – 5.449e+08	0
5.449e+08 – 6.357e+08	0
6.357e+08 – 7.265e+08	0
7.265e+08 – 8.174e+08	0
8.174e+08 – 9.082e+08	0
9.082e+08 – 9.99e+08	0
9.99e+08 – 1.09e+09	1

ROG3 categorical identifier

ROG3 looks like a country/region code identifier — every one of the 238 rows holds a unique two-letter value (AF, AL, AG, AQ, ...), giving cardinality 238 and entropy_ratio 1.0. With top_rate at 0.0042 and no nulls, this column carries no predictive signal on its own and behaves as a primary key for the row.

Treatment: Use as a join key to country-level attributes; drop from any model as a feature.

anthropic:claude-opus-4-7 · confidence high

Out[46]:

saturn.columns["ROG3"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	238
top_value	AF
top_rate	0.004202
cardinality	238
entropy	7.895
entropy_ratio	1
alert: long_tail	238 singleton categories

Fig 19.

Top values for ROG3.

Show data table

Top values for ROG3 (20 unique shown, of 238 total).
value	count	share
AF	1	0.4%
AL	1	0.4%
AG	1	0.4%
AQ	1	0.4%
AN	1	0.4%
AO	1	0.4%
AV	1	0.4%
AC	1	0.4%
AR	1	0.4%
AM	1	0.4%
AA	1	0.4%
AS	1	0.4%
AU	1	0.4%
AJ	1	0.4%
BF	1	0.4%
BA	1	0.4%
BG	1	0.4%
BB	1	0.4%
BO	1	0.4%
BE	1	0.4%

PercentEvangelical numeric feature

Numeric share (0–53.4%) of an evangelical population across 238 rows, almost all unique (232 distinct values), suggesting one observation per geographic or demographic unit. The distribution is right-skewed (skew 1.25) with mean 10.46% well above the median 6.92% and an IQR spanning 1.38–17.69%, so a long tail of high-evangelical units pulls the average up. About 2.5% of rows are null and 4 outliers sit beyond the upper whisker; no zero values were recorded.

Treatment: Consider a log or sqrt transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[49]:

saturn.columns["PercentEvangelical"].stats

stat	value
n	238
nulls	6 (2.5%)
unique	232
min	0.000766
max	53.44
mean	10.46
median	6.916
std	11.35
q1	1.38
q3	17.69
iqr	16.31
skew	1.247
kurtosis	0.9595
n_outliers	4
outlier_rate	0.01724
zero_rate	0

Fig 20.

Distribution of PercentEvangelical. Vertical dash marks the median.

Show data table

Histogram bins for PercentEvangelical (median: 6.915966415391055).
bin	count
0.000766 – 3.563	93
3.563 – 7.126	27
7.126 – 10.69	27
10.69 – 14.25	18
14.25 – 17.81	9
17.81 – 21.38	14
21.38 – 24.94	11
24.94 – 28.5	14
28.5 – 32.07	4
32.07 – 35.63	8
35.63 – 39.19	0
39.19 – 42.75	3
42.75 – 46.32	3
46.32 – 49.88	0
49.88 – 53.44	1

ROL3OfficialLanguage categorical feature

ISO 639-3 language codes denoting each entity's official language, with 88 distinct values across 238 rows and no nulls. English dominates at 26.5% (63 rows) followed by French (25), Spanish (21), and Arabic (20), but the long tail is heavy — entropy ratio 0.74 against cardinality 88 means most codes appear only once or twice (e.g. aln, smo at 2). Worth noting some codes like 'arb' and 'cmn' are macro-language specific variants, so consistency of coding granularity should be checked.

Treatment: Group the long tail into an 'other' bucket or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[52]:

saturn.columns["ROL3OfficialLanguage"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	88
top_value	eng
top_rate	0.2647
cardinality	88
entropy	4.802
entropy_ratio	0.7434
alert: long_tail	70 singleton categories

Fig 21.

Top values for ROL3OfficialLanguage.

Show data table

Top values for ROL3OfficialLanguage (20 unique shown, of 88 total).
value	count	share
eng	63	26.5%
fra	25	10.5%
spa	21	8.8%
arb	20	8.4%
por	8	3.4%
nld	4	1.7%
cmn	4	1.7%
deu	3	1.3%
aln	2	0.8%
smo	2	0.8%
zlm	2	0.8%
ell	2	0.8%
dan	2	0.8%
ita	2	0.8%
kor	2	0.8%
ron	2	0.8%
srp	2	0.8%
nor	2	0.8%
pbt	1	0.4%
cat	1	0.4%

TranslationUnspecified numeric feature

A heavily right-skewed count of 'TranslationUnspecified' occurrences per row, with 21 distinct integer values across 238 rows. Roughly 42% of rows are zero and the median is 1, yet the maximum reaches 80 against a Q3 of 2, producing extreme skew (6.02) and kurtosis (45.1). About 11% of values (26 rows) flag as outliers, so a small tail dominates the mean (2.80) versus median (1).

Treatment: Log1p- or rank-transform before modelling, and consider winsorising the long tail.

anthropic:claude-opus-4-7 · confidence high

Out[55]:

saturn.columns["TranslationUnspecified"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	21
min	0
max	80
mean	2.798
median	1
std	7.881
q1	0
q3	2
iqr	2
skew	6.016
kurtosis	45.14
n_outliers	26
outlier_rate	0.1092
zero_rate	0.4244
alert: high_skew	skew=+6.02
alert: outliers	10.9% rows beyond 1.5 IQR

Fig 22.

Distribution of TranslationUnspecified. Vertical dash marks the median.

Show data table

Histogram bins for TranslationUnspecified (median: 1.0).
bin	count
0 – 5.333	212
5.333 – 10.67	14
10.67 – 16	2
16 – 21.33	3
21.33 – 26.67	2
26.67 – 32	0
32 – 37.33	1
37.33 – 42.67	2
42.67 – 48	1
48 – 53.33	0
53.33 – 58.67	0
58.67 – 64	0
64 – 69.33	0
69.33 – 74.67	0
74.67 – 80	1

TranslationNeeded numeric feature

TranslationNeeded is a numeric count column where 68% of rows are zero and the median and Q1-Q3 sit at 0-1, suggesting it tracks how many items required translation per record. The distribution is extremely right-skewed (skew 8.56, kurtosis 79.6) with a max of 104 against a mean of 2.24, and 31 rows (13%) flag as outliers. With only 18 unique values across 238 rows, this behaves more like a sparse event counter than a continuous metric.

Treatment: Log1p-transform or binarise (zero vs non-zero) before modelling to tame the heavy tail.

anthropic:claude-opus-4-7 · confidence high

Out[58]:

saturn.columns["TranslationNeeded"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	18
min	0
max	104
mean	2.235
median	0
std	10.27
q1	0
q3	1
iqr	1
skew	8.56
kurtosis	79.64
n_outliers	31
outlier_rate	0.1303
zero_rate	0.6807
alert: high_skew	skew=+8.56
alert: outliers	13.0% rows beyond 1.5 IQR

Fig 23.

Distribution of TranslationNeeded. Vertical dash marks the median.

Show data table

Histogram bins for TranslationNeeded (median: 0.0).
bin	count
0 – 6.933	224
6.933 – 13.87	7
13.87 – 20.8	2
20.8 – 27.73	2
27.73 – 34.67	0
34.67 – 41.6	1
41.6 – 48.53	0
48.53 – 55.47	0
55.47 – 62.4	0
62.4 – 69.33	0
69.33 – 76.27	0
76.27 – 83.2	0
83.2 – 90.13	0
90.13 – 97.07	0
97.07 – 104	2

TranslationStarted numeric feature

A numeric counter named TranslationStarted, likely the number of translation jobs initiated per row/entity. Over half the rows are zero (zero_rate 0.517) and the median is 0 with q3=3, yet the max reaches 261, producing extreme skew (7.81) and kurtosis (64.98) plus 35 outliers (14.7%). The mean of 6.46 is pulled far above the median by these heavy-tail cases.

Treatment: Apply a log1p (or zero-inflated) transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[61]:

saturn.columns["TranslationStarted"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	30
min	0
max	261
mean	6.458
median	0
std	27.06
q1	0
q3	3
iqr	3
skew	7.807
kurtosis	64.98
n_outliers	35
outlier_rate	0.1471
zero_rate	0.5168
alert: high_skew	skew=+7.81
alert: outliers	14.7% rows beyond 1.5 IQR

Fig 24.

Distribution of TranslationStarted. Vertical dash marks the median.

Show data table

Histogram bins for TranslationStarted (median: 0.0).
bin	count
0 – 17.4	222
17.4 – 34.8	8
34.8 – 52.2	4
52.2 – 69.6	1
69.6 – 87	0
87 – 104.4	0
104.4 – 121.8	0
121.8 – 139.2	0
139.2 – 156.6	0
156.6 – 174	0
174 – 191.4	1
191.4 – 208.8	0
208.8 – 226.2	0
226.2 – 243.6	0
243.6 – 261	2

RegionCode numeric feature

RegionCode holds 12 distinct integer values from 1 to 12 across 238 rows with no nulls, which strongly suggests a categorical region identifier rather than a true numeric measure. The distribution is mildly left-skewed (skew -0.47) with a median of 8 and no outliers, indicating fairly even coverage across the higher-numbered regions.

Treatment: Cast to categorical and one-hot or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[64]:

saturn.columns["RegionCode"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	12
min	1
max	12
mean	7.336
median	8
std	3.528
q1	5
q3	10
iqr	5
skew	-0.4715
kurtosis	-0.9103
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 25.

Distribution of RegionCode. Vertical dash marks the median.

Show data table

Histogram bins for RegionCode (median: 8.0).
bin	count
1 – 1.733	27
1.733 – 2.467	11
2.467 – 3.2	8
3.2 – 3.933	0
3.933 – 4.667	8
4.667 – 5.4	10
5.4 – 6.133	19
6.133 – 6.867	0
6.867 – 7.6	28
7.6 – 8.333	24
8.333 – 9.067	23
9.067 – 9.8	0
9.8 – 10.53	28
10.53 – 11.27	22
11.27 – 12	30

Ctry categorical identifier

Despite the abbreviated header `Ctry`, this column holds full country names (Afghanistan, Albania, Algeria, …) and acts as a unique key: all 238 rows have distinct values, with entropy_ratio of ~1.0 and a top_rate of just 0.0042. There are no nulls, and the alphabetical run in top_values suggests the dataset is a one-row-per-country reference table.

Treatment: Use as the primary key; left-join other country-level data on this column.

anthropic:claude-opus-4-7 · confidence high

Out[67]:

saturn.columns["Ctry"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	238
top_value	Afghanistan
top_rate	0.004202
cardinality	238
entropy	7.895
entropy_ratio	1
alert: long_tail	238 singleton categories

Fig 26.

Top values for Ctry.

Show data table

Top values for Ctry (20 unique shown, of 238 total).
value	count	share
Afghanistan	1	0.4%
Albania	1	0.4%
Algeria	1	0.4%
American Samoa	1	0.4%
Andorra	1	0.4%
Angola	1	0.4%
Anguilla	1	0.4%
Antigua and Barbuda	1	0.4%
Argentina	1	0.4%
Armenia	1	0.4%
Aruba	1	0.4%
Australia	1	0.4%
Austria	1	0.4%
Azerbaijan	1	0.4%
Bahamas	1	0.4%
Bahrain	1	0.4%
Bangladesh	1	0.4%
Barbados	1	0.4%
Belarus	1	0.4%
Belgium	1	0.4%

BibleNewTestament numeric feature

Numeric counts of New Testament references per row, ranging from 0 to 274 with a median of just 3. The distribution is extremely right-skewed (skew 6.08, kurtosis 49.4) with 23.1% zeros and 10.5% outliers, so a small number of rows dominate the totals while most carry few or no references.

Treatment: Apply a log1p transform and consider winsorising before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[70]:

saturn.columns["BibleNewTestament"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	45
min	0
max	274
mean	10.82
median	3
std	25.76
q1	1
q3	10
iqr	9
skew	6.079
kurtosis	49.43
n_outliers	25
outlier_rate	0.105
zero_rate	0.2311
alert: high_skew	skew=+6.08
alert: outliers	10.5% rows beyond 1.5 IQR

Fig 27.

Distribution of BibleNewTestament. Vertical dash marks the median.

Show data table

Histogram bins for BibleNewTestament (median: 3.0).
bin	count
0 – 18.27	204
18.27 – 36.53	20
36.53 – 54.8	5
54.8 – 73.07	1
73.07 – 91.33	3
91.33 – 109.6	2
109.6 – 127.9	1
127.9 – 146.1	1
146.1 – 164.4	0
164.4 – 182.7	0
182.7 – 200.9	0
200.9 – 219.2	0
219.2 – 237.5	0
237.5 – 255.7	0
255.7 – 274	1

BibleComplete numeric feature

A numeric count, plausibly the number of times a respondent has read the Bible cover-to-cover or a similar completion tally, ranging from 0 to 162 across 238 rows with no nulls. The distribution is heavily right-skewed (skew 3.32, kurtosis 17.0): median is 9 with an IQR of 15, yet 18 values (7.6%) qualify as outliers and the max of 162 sits far above q3=19. Only 1.7% are zero, so non-engagement is rare; the long tail of high counts is the dominant surprise.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[73]:

saturn.columns["BibleComplete"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	54
min	0
max	162
mean	15.56
median	9
std	18.9
q1	4
q3	19
iqr	15
skew	3.319
kurtosis	17.02
n_outliers	18
outlier_rate	0.07563
zero_rate	0.01681
alert: high_skew	skew=+3.32
alert: outliers	7.6% rows beyond 1.5 IQR

Fig 28.

Distribution of BibleComplete. Vertical dash marks the median.

Show data table

Histogram bins for BibleComplete (median: 9.0).
bin	count
0 – 10.8	127
10.8 – 21.6	59
21.6 – 32.4	20
32.4 – 43.2	17
43.2 – 54	6
54 – 64.8	2
64.8 – 75.6	2
75.6 – 86.4	2
86.4 – 97.2	2
97.2 – 108	0
108 – 118.8	0
118.8 – 129.6	0
129.6 – 140.4	0
140.4 – 151.2	0
151.2 – 162	1

Window1040 categorical feature

Binary Y/N flag with no nulls across 238 rows. The distribution is imbalanced toward 'N' at 71.4% (170 of 238) versus 'Y' at 68, giving an entropy ratio of 0.86. The 'Window1040' name suggests a windowed indicator tied to event 1040, but the evidence does not clarify what that event represents.

Treatment: Encode as a 0/1 boolean for modelling.

anthropic:claude-opus-4-7 · confidence high

Out[76]:

saturn.columns["Window1040"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	2
top_value	N
top_rate	0.7143
cardinality	2
entropy	0.8631
entropy_ratio	0.8631

Fig 29.

Top values for Window1040.

Show data table

Top values for Window1040 (2 unique shown, of 2 total).
value	count	share
N	170	71.4%
Y	68	28.6%

JPScaleText categorical label

JPScaleText is a 5-level ordinal label describing reach status (Unreached, Minimally, Superficially, Partially, Significantly Reached) across 238 complete rows. The distribution is fairly balanced with high entropy ratio (0.887) and a modal class of 'Significantly Reached' at 37.4%, though 'Minimally Reached' is sparse at only 11 records. No nulls and tight cardinality make this a clean categorical feature.

Treatment: Encode as an ordered ordinal (Unreached → Significantly Reached) before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[79]:

saturn.columns["JPScaleText"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	5
top_value	Significantly Reached
top_rate	0.3739
cardinality	5
entropy	2.06
entropy_ratio	0.8871

Fig 30.

Top values for JPScaleText.

Show data table

Top values for JPScaleText (5 unique shown, of 5 total).
value	count	share
Significantly Reached	89	37.4%
Partially Reached	67	28.2%
Unreached	43	18.1%
Superficially Reached	28	11.8%
Minimally Reached	11	4.6%

CntPrimaryLanguages numeric feature

CntPrimaryLanguages is a numeric count (likely the number of primary languages associated with each row, e.g., a country or region) ranging from 1 to 827 across 238 rows with no nulls. The distribution is heavily right-skewed (skew 5.42, kurtosis 36.5): the median is just 20 while the mean is 45.6 and the max reaches 827, with 20 outliers (8.4%) sitting well above the Q3 of 46.5. Most entities have modest language counts but a small tail dominates the variance (std 90.1).

Treatment: log-transform (or winsorize the upper tail) before any distance- or regression-based modelling.

anthropic:claude-opus-4-7 · confidence high

Out[82]:

saturn.columns["CntPrimaryLanguages"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	91
min	1
max	827
mean	45.55
median	20
std	90.13
q1	7
q3	46.5
iqr	39.5
skew	5.42
kurtosis	36.52
n_outliers	20
outlier_rate	0.08403
zero_rate	0
alert: high_skew	skew=+5.42
alert: outliers	8.4% rows beyond 1.5 IQR

Fig 31.

Distribution of CntPrimaryLanguages. Vertical dash marks the median.

Show data table

Histogram bins for CntPrimaryLanguages (median: 20.0).
bin	count
1 – 56.07	192
56.07 – 111.1	28
111.1 – 166.2	5
166.2 – 221.3	5
221.3 – 276.3	1
276.3 – 331.4	3
331.4 – 386.5	1
386.5 – 441.5	0
441.5 – 496.6	0
496.6 – 551.7	1
551.7 – 606.7	0
606.7 – 661.8	0
661.8 – 716.9	1
716.9 – 771.9	0
771.9 – 827	1

OfficialLang categorical feature

This column lists the official language(s) of 238 entities, almost certainly countries or territories, with one near-null. English dominates at 63 occurrences (26.6% top rate), followed by French (25), Spanish (21), and Standard Arabic (20), but the long tail spans 87 distinct values including narrow entries like Gheg Albanian and Samoan, yielding entropy 4.78 (ratio 0.74). The high cardinality relative to 238 rows means many languages appear only once or twice.

Treatment: Group rare languages into an 'Other' bucket before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high

Out[85]:

saturn.columns["OfficialLang"].stats

stat	value
n	238
nulls	1 (0.4%)
unique	87
top_value	English
top_rate	0.2658
cardinality	87
entropy	4.783
entropy_ratio	0.7423
alert: long_tail	69 singleton categories

Fig 32.

Top values for OfficialLang.

Show data table

Top values for OfficialLang (20 unique shown, of 87 total).
value	count	share
English	63	26.5%
French	25	10.5%
Spanish	21	8.8%
Arabic, Standard	20	8.4%
Portuguese	8	3.4%
Dutch	4	1.7%
Chinese, Mandarin	4	1.7%
German, Standard	3	1.3%
Albanian, Gheg	2	0.8%
Samoan	2	0.8%
Malay	2	0.8%
Greek	2	0.8%
Danish	2	0.8%
Italian	2	0.8%
Korean	2	0.8%
Romanian	2	0.8%
Serbian	2	0.8%
Norwegian	2	0.8%
Pashto, Southern	1	0.4%
Catalan	1	0.4%

BiblePortions numeric feature

Numeric count of Bible portions, likely per language or region, across 238 rows with no nulls and only 35 unique values. The distribution is severely right-skewed (skew 5.57, kurtosis 36.36) with a median of 2 against a max of 161, and 26.05% of rows are zero. Roughly 10.9% of values flag as outliers, so a few entries dominate the tail.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[88]:

saturn.columns["BiblePortions"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	35
min	0
max	161
mean	7.681
median	2
std	18.63
q1	0
q3	6.75
iqr	6.75
skew	5.573
kurtosis	36.36
n_outliers	26
outlier_rate	0.1092
zero_rate	0.2605
alert: high_skew	skew=+5.57
alert: outliers	10.9% rows beyond 1.5 IQR

Fig 33.

Distribution of BiblePortions. Vertical dash marks the median.

Show data table

Histogram bins for BiblePortions (median: 2.0).
bin	count
0 – 10.73	192
10.73 – 21.47	25
21.47 – 32.2	12
32.2 – 42.93	1
42.93 – 53.67	2
53.67 – 64.4	1
64.4 – 75.13	2
75.13 – 85.87	0
85.87 – 96.6	0
96.6 – 107.3	0
107.3 – 118.1	0
118.1 – 128.8	0
128.8 – 139.5	2
139.5 – 150.3	0
150.3 – 161	1

ROG2 categorical feature

ROG2 is a low-cardinality categorical with 7 region-like codes (AFR, EUR, ASI, NAR, SOP, LAM, AUS) across 238 rows and no nulls. The distribution is fairly even — entropy ratio 0.893 and the top value AFR holds just 24.4% — though AUS is a tiny tail with only 2 records. The codes look like geographic groupings (Africa, Europe, Asia, North America, South Pacific, Latin America, Australia).

Treatment: One-hot encode; consider merging AUS (n=2) into a neighbouring region or 'Other' to avoid sparse dummies.

anthropic:claude-opus-4-7 · confidence high

Out[91]:

saturn.columns["ROG2"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	7
top_value	AFR
top_rate	0.2437
cardinality	7
entropy	2.508
entropy_ratio	0.8934

Fig 34.

Top values for ROG2.

Show data table

Top values for ROG2 (7 unique shown, of 7 total).
value	count	share
AFR	58	24.4%
EUR	51	21.4%
ASI	50	21.0%
NAR	38	16.0%
SOP	25	10.5%
LAM	14	5.9%
AUS	2	0.8%

PercentIslam numeric feature

Numeric share (0–99.47) of Muslim population per row, almost certainly one row per country or territory given n=238. The distribution is heavily right-skewed (skew 1.35) with median just 2.51% but mean 22.31%, and 17.2% of rows are exactly zero while 16.8% flag as outliers — a bimodal world where most places have negligible Muslim populations and a minority are overwhelmingly Muslim.

Treatment: Consider a log1p or logit transform, or bucket into low/medium/high bands, before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[94]:

saturn.columns["PercentIslam"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	198
min	0
max	99.47
mean	22.31
median	2.514
std	34.82
q1	0.1012
q3	28.76
iqr	28.66
skew	1.349
kurtosis	0.1114
n_outliers	40
outlier_rate	0.1681
zero_rate	0.1723
alert: outliers	16.8% rows beyond 1.5 IQR

Fig 35.

Distribution of PercentIslam. Vertical dash marks the median.

Show data table

Histogram bins for PercentIslam (median: 2.51392156930404).
bin	count
0 – 6.631	146
6.631 – 13.26	17
13.26 – 19.89	12
19.89 – 26.53	3
26.53 – 33.16	3
33.16 – 39.79	2
39.79 – 46.42	1
46.42 – 53.05	4
53.05 – 59.68	5
59.68 – 66.31	3
66.31 – 72.94	2
72.94 – 79.58	1
79.58 – 86.21	6
86.21 – 92.84	10
92.84 – 99.47	23

RLG3Primary numeric feature

RLG3Primary is a small-cardinality numeric code with only 6 unique values spanning 1 to 7 across 238 rows and no nulls. The distribution is bottom-heavy: median is 1.0 and Q1 equals the minimum, yet Q3 reaches 5.75, producing a wide IQR of 4.75 and right skew of 0.98. This looks like an ordinal category (e.g., a primary-rating or grade code) masquerading as a number rather than a continuous measurement.

Treatment: Treat as an ordinal categorical and one-hot or ordinal-encode before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[97]:

saturn.columns["RLG3Primary"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	6
min	1
max	7
mean	2.45
median	1
std	2.219
q1	1
q3	5.75
iqr	4.75
skew	0.9751
kurtosis	-0.9467
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 36.

Distribution of RLG3Primary. Vertical dash marks the median.

Show data table

Histogram bins for RLG3Primary (median: 1.0).
bin	count
1 – 1.4	159
1.4 – 1.8	0
1.8 – 2.2	10
2.2 – 2.6	0
2.6 – 3	0
3 – 3.4	0
3.4 – 3.8	0
3.8 – 4.2	6
4.2 – 4.6	0
4.6 – 5	0
5 – 5.4	3
5.4 – 5.8	0
5.8 – 6.2	55
6.2 – 6.6	0
6.6 – 7	5

Capital categorical identifier

This column lists capital cities, with 233 unique values across 238 rows and a null rate of 1.68%. Cardinality is essentially one-per-row (entropy ratio 0.9997), and the only repeat is Kingston appearing twice — likely Jamaica and Norfolk Island sharing the name. Effectively a near-unique label tied to the country/territory record.

Treatment: Treat as a near-unique label; drop or use as a join key rather than a model feature.

anthropic:claude-opus-4-7 · confidence high

Out[100]:

saturn.columns["Capital"].stats

stat	value
n	238
nulls	4 (1.7%)
unique	233
top_value	Kingston
top_rate	0.008547
cardinality	233
entropy	7.862
entropy_ratio	0.9997
alert: long_tail	232 singleton categories

Fig 37.

Top values for Capital.

Show data table

Top values for Capital (20 unique shown, of 233 total).
value	count	share
Kingston	2	0.8%
Kabul	1	0.4%
Tirana	1	0.4%
Algiers	1	0.4%
Pago Pago	1	0.4%
Andorra la Vella	1	0.4%
Luanda	1	0.4%
The Valley	1	0.4%
Saint John's	1	0.4%
Buenos Aires	1	0.4%
Yerevan	1	0.4%
Oranjestad	1	0.4%
Canberra	1	0.4%
Vienna	1	0.4%
Baku	1	0.4%
Nassau	1	0.4%
Manama	1	0.4%
Dhaka	1	0.4%
Bridgetown	1	0.4%
Minsk	1	0.4%

Population numeric feature

This is a country/region population count, with 238 rows and 230 unique values, no nulls. The distribution is extremely right-skewed (skew 9.10, kurtosis 89.06): the median is 5,606,500 but the max reaches 1,463,866,000, dwarfing the Q3 of 23,200,000. About 10.9% of values (26 rows) flag as outliers, consistent with a handful of population giants like China/India-scale entities.

Treatment: log-transform before any modelling or aggregation.

anthropic:claude-opus-4-7 · confidence high

Out[103]:

saturn.columns["Population"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	230
min	50
max	1.464e+09
mean	3.459e+07
median	5.606e+06
std	1.378e+08
q1	399,250
q3	2.32e+07
iqr	2.28e+07
skew	9.099
kurtosis	89.06
n_outliers	26
outlier_rate	0.1092
zero_rate	0
alert: high_skew	skew=+9.10
alert: outliers	10.9% rows beyond 1.5 IQR

Fig 38.

Distribution of Population. Vertical dash marks the median.

Show data table

Histogram bins for Population (median: 5606500.0).
bin	count
50 – 9.759e+07	222
9.759e+07 – 1.952e+08	9
1.952e+08 – 2.928e+08	4
2.928e+08 – 3.904e+08	1
3.904e+08 – 4.88e+08	0
4.88e+08 – 5.855e+08	0
5.855e+08 – 6.831e+08	0
6.831e+08 – 7.807e+08	0
7.807e+08 – 8.783e+08	0
8.783e+08 – 9.759e+08	0
9.759e+08 – 1.074e+09	0
1.074e+09 – 1.171e+09	0
1.171e+09 – 1.269e+09	0
1.269e+09 – 1.366e+09	0
1.366e+09 – 1.464e+09	2

CntPeoplesLR numeric feature

CntPeoplesLR is a numeric count of people (likely a left/right group size or attendance metric) with 57 distinct values across 238 rows and a 15.13% null rate. The distribution is severely right-skewed (skew 10.79, kurtosis 129.0): the median is 6.5 and Q3 is 24.75, yet the max reaches 2032 and the mean is 35.27 with std 157.42. 17 outliers (8.42%) pull the tail dramatically, and no zeros are recorded.

Treatment: log-transform (or winsorize the upper tail) and impute the ~15% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[106]:

saturn.columns["CntPeoplesLR"].stats

stat	value
n	238
nulls	36 (15.1%)
unique	57
min	1
max	2,032
mean	35.27
median	6.5
std	157.4
q1	2
q3	24.75
iqr	22.75
skew	10.79
kurtosis	129
n_outliers	17
outlier_rate	0.08416
zero_rate	0
alert: high_skew	skew=+10.79
alert: outliers	8.4% rows beyond 1.5 IQR

Fig 39.

Distribution of CntPeoplesLR. Vertical dash marks the median.

Show data table

Histogram bins for CntPeoplesLR (median: 6.5).
bin	count
1 – 146.1	195
146.1 – 291.1	4
291.1 – 436.2	0
436.2 – 581.3	1
581.3 – 726.4	0
726.4 – 871.4	1
871.4 – 1017	0
1017 – 1162	0
1162 – 1307	0
1307 – 1452	0
1452 – 1597	0
1597 – 1742	0
1742 – 1887	0
1887 – 2032	1

CntPeoplesFPG numeric feature

A numeric count column (likely 'count of people FPG') with only 44 unique values across 238 rows and 28.57% nulls. The distribution is extremely right-skewed (skew 10.07, kurtosis 109.5): median is 4 and Q3 is 12.75, yet the max reaches 1700, producing 18 outliers (10.6% rate) and inflating the mean to 28.04 against a std of 143.69.

Treatment: Impute the 28.57% nulls and apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[109]:

saturn.columns["CntPeoplesFPG"].stats

stat	value
n	238
nulls	68 (28.6%)
unique	44
min	1
max	1,700
mean	28.04
median	4
std	143.7
q1	1
q3	12.75
iqr	11.75
skew	10.07
kurtosis	109.5
n_outliers	18
outlier_rate	0.1059
zero_rate	0
alert: null_rate	28.6% null
alert: high_skew	skew=+10.07
alert: outliers	10.6% rows beyond 1.5 IQR

Fig 40.

Distribution of CntPeoplesFPG. Vertical dash marks the median.

Show data table

Histogram bins for CntPeoplesFPG (median: 4.0).
bin	count
1 – 131.7	166
131.7 – 262.4	1
262.4 – 393.1	1
393.1 – 523.8	0
523.8 – 654.5	0
654.5 – 785.2	1
785.2 – 915.8	0
915.8 – 1047	0
1047 – 1177	0
1177 – 1308	0
1308 – 1439	0
1439 – 1569	0
1569 – 1700	1

PercentHinduism numeric feature

Country-level share of population identifying as Hindu, expressed as a percentage. The distribution is dominated by zeros (zero_rate 0.559, median 0) with a long right tail to 82.4, producing extreme skew (6.90) and kurtosis (53.0); 43 of 238 rows (18.1%) flag as outliers, presumably the few Hindu-majority countries.

Treatment: Apply a log1p or zero-inflated transform before modelling, since most values are 0 with a heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[112]:

saturn.columns["PercentHinduism"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	106
min	0
max	82.4
mean	2.01
median	0
std	8.927
q1	0
q3	0.2745
iqr	0.2745
skew	6.898
kurtosis	53.02
n_outliers	43
outlier_rate	0.1807
zero_rate	0.5588
alert: high_skew	skew=+6.90
alert: outliers	18.1% rows beyond 1.5 IQR

Fig 41.

Distribution of PercentHinduism. Vertical dash marks the median.

Show data table

Histogram bins for PercentHinduism (median: 0.0).
bin	count
0 – 5.493	223
5.493 – 10.99	5
10.99 – 16.48	1
16.48 – 21.97	2
21.97 – 27.47	2
27.47 – 32.96	1
32.96 – 38.45	1
38.45 – 43.95	0
43.95 – 49.44	1
49.44 – 54.93	0
54.93 – 60.43	0
60.43 – 65.92	0
65.92 – 71.41	0
71.41 – 76.91	0
76.91 – 82.4	2

PercentOtherSmall numeric feature

A numeric share/percentage feature called PercentOtherSmall, with 238 rows and 199 unique values, no nulls, but 16.4% zeros and a long right tail (median 0.29, max 12.84). Skew of 5.05 and kurtosis 39.4 with 15 outliers (6.3%) signal an extremely heavy-tailed distribution. Despite the name, values exceed 1, so this is not bounded as a 0–1 proportion.

Treatment: Apply a log1p transform and consider a zero-inflation indicator before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[115]:

saturn.columns["PercentOtherSmall"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	199
min	0
max	12.84
mean	0.6962
median	0.2884
std	1.242
q1	0.01289
q3	0.9557
iqr	0.9428
skew	5.052
kurtosis	39.39
n_outliers	15
outlier_rate	0.06303
zero_rate	0.1639
alert: high_skew	skew=+5.05
alert: outliers	6.3% rows beyond 1.5 IQR

Fig 42.

Distribution of PercentOtherSmall. Vertical dash marks the median.

Show data table

Histogram bins for PercentOtherSmall (median: 0.2884398013359145).
bin	count
0 – 0.8563	170
0.8563 – 1.713	46
1.713 – 2.569	8
2.569 – 3.425	6
3.425 – 4.282	5
4.282 – 5.138	0
5.138 – 5.994	2
5.994 – 6.85	0
6.85 – 7.707	0
7.707 – 8.563	0
8.563 – 9.419	0
9.419 – 10.28	0
10.28 – 11.13	0
11.13 – 11.99	0
11.99 – 12.84	1

PercentBuddhism numeric feature

This column appears to be the percentage of Buddhists per country (or similar geographic unit), with 238 rows and 125 unique values. The distribution is extremely right-skewed (skew 4.59, kurtosis 20.4): nearly half the rows are zero (zero_rate 0.48), the median is 0.004%, yet the max reaches 88.74%. Saturn flagged 38 outliers (16% of rows), reflecting the handful of Buddhist-majority countries dominating the tail.

Treatment: Apply a log1p or similar transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[118]:

saturn.columns["PercentBuddhism"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	125
min	0
max	88.74
mean	3.701
median	0.004281
std	14.74
q1	0
q3	0.2949
iqr	0.2949
skew	4.592
kurtosis	20.42
n_outliers	38
outlier_rate	0.1597
zero_rate	0.479
alert: high_skew	skew=+4.59
alert: outliers	16.0% rows beyond 1.5 IQR

Fig 43.

Distribution of PercentBuddhism. Vertical dash marks the median.

Show data table

Histogram bins for PercentBuddhism (median: 0.004281117324954205).
bin	count
0 – 5.916	220
5.916 – 11.83	3
11.83 – 17.75	2
17.75 – 23.66	2
23.66 – 29.58	1
29.58 – 35.5	0
35.5 – 41.41	1
41.41 – 47.33	0
47.33 – 53.24	1
53.24 – 59.16	0
59.16 – 65.08	1
65.08 – 70.99	2
70.99 – 76.91	1
76.91 – 82.83	1
82.83 – 88.74	3

JPScaleCtry numeric feature

JPScaleCtry holds an integer 1–5 rating with only 5 unique values across 238 rows and no nulls, consistent with a Likert-style country-level scale. The distribution leans high (mean 3.62, median 4, Q1=3, Q3=5) with a left skew of -0.78, indicating most respondents cluster at the upper end. No outliers are flagged.

Treatment: Treat as an ordinal Likert feature; consider ordered encoding rather than raw numeric use.

anthropic:claude-opus-4-7 · confidence high

Out[121]:

saturn.columns["JPScaleCtry"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	5
min	1
max	5
mean	3.622
median	4
std	1.473
q1	3
q3	5
iqr	2
skew	-0.7839
kurtosis	-0.8065
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 44.

Distribution of JPScaleCtry. Vertical dash marks the median.

Show data table

Histogram bins for JPScaleCtry (median: 4.0).
bin	count
1 – 1.267	43
1.267 – 1.533	0
1.533 – 1.8	0
1.8 – 2.067	11
2.067 – 2.333	0
2.333 – 2.6	0
2.6 – 2.867	0
2.867 – 3.133	28
3.133 – 3.4	0
3.4 – 3.667	0
3.667 – 3.933	0
3.933 – 4.2	67
4.2 – 4.467	0
4.467 – 4.733	0
4.733 – 5	89

PercentChristianity numeric feature

This column reports the percentage of Christians per row (likely country or region), spanning the full 0.02% to 100% range across 238 nearly unique values. The distribution is strongly bimodal-feeling: the median (75.3%) sits far above the mean (58.2%), with a wide IQR from 11.7% to 90.9% and negative skew (-0.54), suggesting many heavily Christian populations alongside a substantial cluster of very low-share rows. No nulls, no zeros, and no statistical outliers despite the extreme spread.

Treatment: Use as-is or rescale to 0-1; consider pairing with other religion-share columns since values are bounded percentages.

anthropic:claude-opus-4-7 · confidence high

Out[124]:

saturn.columns["PercentChristianity"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	237
min	0.0165
max	100
mean	58.17
median	75.3
std	37.03
q1	11.72
q3	90.92
iqr	79.2
skew	-0.5364
kurtosis	-1.392
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 45.

Distribution of PercentChristianity. Vertical dash marks the median.

Show data table

Histogram bins for PercentChristianity (median: 75.30076398378219).
bin	count
0.0165 – 6.682	48
6.682 – 13.35	16
13.35 – 20.01	2
20.01 – 26.68	2
26.68 – 33.34	5
33.34 – 40.01	3
40.01 – 46.68	5
46.68 – 53.34	8
53.34 – 60.01	6
60.01 – 66.67	13
66.67 – 73.34	7
73.34 – 80	15
80 – 86.67	25
86.67 – 93.33	42
93.33 – 100	41

ISO3 categorical identifier

ISO3 is the standard three-letter country code, with all 238 values unique (AFG, ALB, DZA, ASM, ...) and zero nulls. Maximum entropy ratio (≈1.0) and a top_rate of 0.0042 confirm one row per country, making this a clean primary key for the table.

Treatment: use as the join key to merge country-level data; do not feed into models as a feature.

anthropic:claude-opus-4-7 · confidence high

Out[127]:

saturn.columns["ISO3"].stats

stat	value
n	238
nulls	0 (0.0%)
unique	238
top_value	AFG
top_rate	0.004202
cardinality	238
entropy	7.895
entropy_ratio	1
alert: long_tail	238 singleton categories

Fig 46.

Top values for ISO3.

Show data table

Top values for ISO3 (20 unique shown, of 238 total).
value	count	share
AFG	1	0.4%
ALB	1	0.4%
DZA	1	0.4%
ASM	1	0.4%
AND	1	0.4%
AGO	1	0.4%
AIA	1	0.4%
ATG	1	0.4%
ARG	1	0.4%
ARM	1	0.4%
ABW	1	0.4%
AUS	1	0.4%
AUT	1	0.4%
AZE	1	0.4%
BHS	1	0.4%
BHR	1	0.4%
BGD	1	0.4%
BRB	1	0.4%
BLR	1	0.4%
BEL	1	0.4%

Overview

Summary confidence: high

PoplPeoplesLR numeric feature

PercentUnknown numeric feature

ReligionPrimary categorical feature

SecurityLevel numeric feature

PercentNonReligious numeric feature

ISO2 categorical identifier

JPScaleImageURL categorical feature

PercentEthnicReligions numeric feature

RegionName categorical feature

CntPeoples numeric feature

PoplPeoplesFPG numeric feature

ROG3 categorical identifier

PercentEvangelical numeric feature

ROL3OfficialLanguage categorical feature

TranslationUnspecified numeric feature

TranslationNeeded numeric feature

TranslationStarted numeric feature

RegionCode numeric feature

Ctry categorical identifier

BibleNewTestament numeric feature

BibleComplete numeric feature

Window1040 categorical feature

JPScaleText categorical label

CntPrimaryLanguages numeric feature

OfficialLang categorical feature

BiblePortions numeric feature

ROG2 categorical feature

PercentIslam numeric feature

RLG3Primary numeric feature

Capital categorical identifier

Population numeric feature

CntPeoplesLR numeric feature

CntPeoplesFPG numeric feature

PercentHinduism numeric feature

PercentOtherSmall numeric feature

PercentBuddhism numeric feature

JPScaleCtry numeric feature

PercentChristianity numeric feature

ISO3 categorical identifier

How to cite