saturn·

archive api data sample

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/joshua-project/archive/api_data_sample.json

Saturn profiled 50 rows across 107 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/joshua-project/archive/api_data_sample.json",
    "--findings", "archive-api_data_sample.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is a 50-row, 107-column sample from the Joshua Project API describing Arab and related Muslim people groups across 41 countries. The dataset is dominated by one affinity bloc ('Arab World', 100%) and one religion ('Islam', 98%), so the interesting variation lies in geography, population size, and reachedness rather than in identity fields. Look first at Population and PopulationPGAC, which are heavily right-skewed (max 2.22M and 7.56M respectively, with multiple outliers) and at PCIslam, which is high but varies from 25% to 100%. JPScaleText shows that 76% of groups are classified 'Unreached', making that the most actionable signal alongside Continent/RegionName for where these groups sit. Note also the high null rates on language, Bible-translation, and nomadic descriptors (86–98% missing), which limits any analysis of those attributes.

citing: Population · PopulationPGAC · PCIslam · JPScaleText · Continent · RegionName · PrimaryLanguageName · LeastReached · PrimaryReligion · PeopleCluster · PrimaryLanguageDialect · NomadicTypeDescription

Out[4]:

saturn.schema() · 107 columns

column kind n null% unique alerts
ROL3 categorical 50 0.0% 10
PhotoCredits categorical 50 0.0% 5
PrimaryReligion categorical 50 0.0% 2 imbalance
Ctry categorical 50 0.0% 41 long_tail
RegionName categorical 50 0.0% 11
BibleYear categorical 50 92.0% 3 long_tail null_rate
RLG3PC numeric 50 0.0% 1 constant
Population numeric 50 0.0% 49 high_skew outliers
Resources unknown 50 0.0% skipped
LeastReachedPGAC categorical 50 0.0% 2
NumberLanguagesSpoken numeric 50 0.0% 2 high_skew
GSEC categorical 50 0.0% 2
AudioRecordings categorical 50 0.0% 1 imbalance
PercentAdherents categorical 50 0.0% 29 long_tail
ROP1 categorical 50 0.0% 1 imbalance
JPScalePGAC categorical 50 0.0% 2
Latitude numeric 50 0.0% 50
PeopNameInCountry categorical 50 0.0% 7 long_tail
Window1040 categorical 50 0.0% 2
PeopleGroupMapURL categorical 50 0.0% 17 long_tail
CountryURL categorical 50 0.0% 41 long_tail
PercentEvangelicalPC categorical 50 0.0% 3 long_tail imbalance
CountOfProvinces unknown 50 0.0% skipped
PercentEvangelicalPGAC categorical 50 0.0% 5
NomadicTypeDescription categorical 50 90.0% 1 null_rate imbalance
MapCredits categorical 50 0.0% 7
HasJesusFilm categorical 50 0.0% 2
HowReach categorical 50 0.0% 20 long_tail
PCIslam numeric 50 0.0% 30 high_skew outliers
NTYear categorical 50 42.0% 8 long_tail null_rate
RLG4 numeric 50 86.0% 1 null_rate constant
AffinityBloc categorical 50 0.0% 1 imbalance
NaturalName categorical 50 0.0% 7 long_tail
PercentChristianPGAC categorical 50 0.0% 5
PrimaryLanguageName categorical 50 0.0% 10
CountOfCountries numeric 50 0.0% 4
PeopleID2 numeric 50 0.0% 3 high_skew
Summary categorical 50 0.0% 21 long_tail
Obstacles categorical 50 0.0% 21 long_tail
ROP2 categorical 50 0.0% 3 long_tail imbalance
RLG3 numeric 50 0.0% 2 high_skew
PercentEvangelical categorical 50 0.0% 18 long_tail
LeastReached categorical 50 0.0% 2
Continent categorical 50 0.0% 6
JPScalePC categorical 50 0.0% 1 imbalance
JPScaleText categorical 50 0.0% 4
SecurityLevel numeric 50 0.0% 3
LRTop100 categorical 50 0.0% 1 imbalance
PrimaryReligionPGAC categorical 50 0.0% 1 imbalance
PCNonReligious numeric 50 2.0% 6 outliers
PhotoCreditURL categorical 50 4.0% 3
PhotoCreativeCommons categorical 50 0.0% 2
PrayForPG categorical 50 0.0% 21 long_tail
PeopleGroupPhotoURL categorical 50 0.0% 5
ROG2 categorical 50 0.0% 6
PhotoCCVersionText categorical 50 0.0% 2
Longitude numeric 50 0.0% 50 outliers
JPScaleImageURL categorical 50 0.0% 4
OfficialLang categorical 50 0.0% 21 long_tail
PhotoPermission categorical 50 0.0% 2 imbalance
PCHinduism numeric 50 2.0% 3 high_skew
PeopleID3 numeric 50 0.0% 5 high_skew outliers
PeopleID1 numeric 50 0.0% 1 constant
SpeakNationalLang unknown 50 0.0% skipped
PortionsYear categorical 50 16.0% 9 long_tail
PrimaryReligionPC categorical 50 0.0% 1 imbalance
PCUnknown numeric 50 2.0% 1 constant
ProfileTextExists categorical 50 0.0% 2
PCOtherSmall numeric 50 2.0% 3 high_skew outliers
BibleStatus numeric 50 0.0% 4
Frontier categorical 50 0.0% 2
MapAddress categorical 50 0.0% 17 long_tail
PeopleID3ROG3 categorical 50 0.0% 50 long_tail
ROP3 numeric 50 0.0% 5 high_skew outliers
PrimaryLanguageDialect categorical 50 98.0% 1 long_tail null_rate imbalance
JPScale numeric 50 0.0% 4 high_skew outliers
HasAudioRecordings categorical 50 0.0% 1 imbalance
PCBuddhism numeric 50 2.0% 1 constant
PeopNameAcrossCountries categorical 50 0.0% 5
PhotoCCVersionURL categorical 50 0.0% 2
MapCCVersionText categorical 50 0.0% 1 imbalance
PercentChristianPC categorical 50 0.0% 3 long_tail imbalance
Nomadic categorical 50 0.0% 2
PrayForChurch categorical 50 0.0% 9 long_tail
RLG3PGAC numeric 50 0.0% 1 constant
ISO3 categorical 50 0.0% 41 long_tail
NaturalPronunciation categorical 50 2.0% 6
PhotoAddress categorical 50 0.0% 5
RegionCode numeric 50 0.0% 11
LocationInCountry categorical 50 72.0% 13 long_tail null_rate
JF categorical 50 0.0% 2
PopulationPGAC numeric 50 0.0% 5 outliers
PeopleGroupMapExpandedURL categorical 50 0.0% 11 long_tail
TranslationNeedQuestionable unknown 50 0.0% skipped
Category categorical 50 0.0% 3
PhotoCopyright categorical 50 0.0% 2 imbalance
NTOnline categorical 50 18.0% 1 imbalance
LeastReachedPC categorical 50 0.0% 1 imbalance
ROG3 categorical 50 0.0% 41 long_tail
ReligionSubdivision categorical 50 86.0% 1 null_rate imbalance
PCEthnicReligions numeric 50 2.0% 3 high_skew outliers
PeopleCluster categorical 50 0.0% 3 long_tail imbalance
IndigenousCode categorical 50 0.0% 2
MapCreditURL categorical 50 0.0% 1 imbalance
MapCopyright categorical 50 0.0% 1 imbalance
MapCCVersionURL categorical 50 0.0% 1 imbalance
PeopleGroupURL categorical 50 0.0% 50 long_tail
Fig 1.
Population · Highly right-skewed group sizes — most groups are small but a few exceed a million.
Show data table
Histogram bins for Population (median: 46500.0).
bincount
200 – 3.175e+0540
3.175e+05 – 6.347e+054
6.347e+05 – 9.52e+052
9.52e+05 – 1.269e+060
1.269e+06 – 1.586e+062
1.586e+06 – 1.904e+061
1.904e+06 – 2.221e+061
Fig 2.
JPScaleText · 76% of groups are labelled 'Unreached'; only a small share are partially or superficially reached.
Show data table
Top values for JPScaleText (4 unique shown, of 4 total).
valuecountshare
Unreached3876.0%
Minimally Reached816.0%
Partially Reached36.0%
Superficially Reached12.0%
Fig 3.
RegionName · Geographic spread is concentrated in North Africa & Middle East (32%) with a long tail across 11 regions.
Show data table
Top values for RegionName (11 unique shown, of 11 total).
valuecountshare
Africa, North and Middle East1632.0%
Africa, East and Southern714.0%
Europe, Western612.0%
America, North and Caribbean510.0%
Africa, West and Central48.0%
Europe, Eastern and Eurasia48.0%
Asia, Southeast36.0%
Asia, South24.0%
Australia and Pacific12.0%
America, Latin12.0%
Asia, Central12.0%
Fig 4.
PCIslam · Islam share per group clusters near 100% but ranges down to 25% — watch the low-end outliers.
Show data table
Histogram bins for PCIslam (median: 95.99423076923074).
bincount
25 – 35.711
35.71 – 46.430
46.43 – 57.141
57.14 – 67.862
67.86 – 78.573
78.57 – 89.291
89.29 – 10042
Fig 5.
PrimaryLanguageName · Arabic dialects dominate, with Levantine, Gulf, and Omani Arabic accounting for most rows.
Show data table
Top values for PrimaryLanguageName (10 unique shown, of 10 total).
valuecountshare
Arabic, Levantine1836.0%
Arabic, Gulf1326.0%
Arabic, Omani816.0%
Arabic, Mesopotamian48.0%
Swahili24.0%
Tamajeq, Tayart12.0%
Arabic, Sudanese12.0%
English12.0%
Arabic, Moroccan12.0%
Arabic, Egyptian12.0%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
ROL3categorical0.0%
PhotoCreditscategorical0.0%
PrimaryReligioncategorical0.0%
Ctrycategorical0.0%
RegionNamecategorical0.0%
BibleYearcategorical92.0%
RLG3PCnumeric0.0%
Populationnumeric0.0%
Resourcesunknown0.0%
LeastReachedPGACcategorical0.0%
NumberLanguagesSpokennumeric0.0%
GSECcategorical0.0%
AudioRecordingscategorical0.0%
PercentAdherentscategorical0.0%
ROP1categorical0.0%
JPScalePGACcategorical0.0%
Latitudenumeric0.0%
PeopNameInCountrycategorical0.0%
Window1040categorical0.0%
PeopleGroupMapURLcategorical0.0%
CountryURLcategorical0.0%
PercentEvangelicalPCcategorical0.0%
CountOfProvincesunknown0.0%
PercentEvangelicalPGACcategorical0.0%
NomadicTypeDescriptioncategorical90.0%
MapCreditscategorical0.0%
HasJesusFilmcategorical0.0%
HowReachcategorical0.0%
PCIslamnumeric0.0%
NTYearcategorical42.0%
RLG4numeric86.0%
AffinityBloccategorical0.0%
NaturalNamecategorical0.0%
PercentChristianPGACcategorical0.0%
PrimaryLanguageNamecategorical0.0%
CountOfCountriesnumeric0.0%
PeopleID2numeric0.0%
Summarycategorical0.0%
Obstaclescategorical0.0%
ROP2categorical0.0%
RLG3numeric0.0%
PercentEvangelicalcategorical0.0%
LeastReachedcategorical0.0%
Continentcategorical0.0%
JPScalePCcategorical0.0%
JPScaleTextcategorical0.0%
SecurityLevelnumeric0.0%
LRTop100categorical0.0%
PrimaryReligionPGACcategorical0.0%
PCNonReligiousnumeric2.0%
PhotoCreditURLcategorical4.0%
PhotoCreativeCommonscategorical0.0%
PrayForPGcategorical0.0%
PeopleGroupPhotoURLcategorical0.0%
ROG2categorical0.0%
PhotoCCVersionTextcategorical0.0%
Longitudenumeric0.0%
JPScaleImageURLcategorical0.0%
OfficialLangcategorical0.0%
PhotoPermissioncategorical0.0%
PCHinduismnumeric2.0%
PeopleID3numeric0.0%
PeopleID1numeric0.0%
SpeakNationalLangunknown0.0%
PortionsYearcategorical16.0%
PrimaryReligionPCcategorical0.0%
PCUnknownnumeric2.0%
ProfileTextExistscategorical0.0%
PCOtherSmallnumeric2.0%
BibleStatusnumeric0.0%
Frontiercategorical0.0%
MapAddresscategorical0.0%
PeopleID3ROG3categorical0.0%
ROP3numeric0.0%
PrimaryLanguageDialectcategorical98.0%
JPScalenumeric0.0%
HasAudioRecordingscategorical0.0%
PCBuddhismnumeric2.0%
PeopNameAcrossCountriescategorical0.0%
PhotoCCVersionURLcategorical0.0%
MapCCVersionTextcategorical0.0%
PercentChristianPCcategorical0.0%
Nomadiccategorical0.0%
PrayForChurchcategorical0.0%
RLG3PGACnumeric0.0%
ISO3categorical0.0%
NaturalPronunciationcategorical2.0%
PhotoAddresscategorical0.0%
RegionCodenumeric0.0%
LocationInCountrycategorical72.0%
JFcategorical0.0%
PopulationPGACnumeric0.0%
PeopleGroupMapExpandedURLcategorical0.0%
TranslationNeedQuestionableunknown0.0%
Categorycategorical0.0%
PhotoCopyrightcategorical0.0%
NTOnlinecategorical18.0%
LeastReachedPCcategorical0.0%
ROG3categorical0.0%
ReligionSubdivisioncategorical86.0%
PCEthnicReligionsnumeric2.0%
PeopleClustercategorical0.0%
IndigenousCodecategorical0.0%
MapCreditURLcategorical0.0%
MapCopyrightcategorical0.0%
MapCCVersionURLcategorical0.0%
PeopleGroupURLcategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
RLG3PCPopulationNumberLanguagesSpokenLatitudePCIslamRLG4CountOfCountriesPeopleID2RLG3SecurityLevelPCNonReligiousLongitude
RLG3PC+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
Population+nan+1.00+nan-0.26+0.71+nan-0.28+0.16+nan+0.45-0.46+0.46
NumberLanguagesSpoken+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
Latitude+nan-0.26+nan+1.00-0.10+nan+0.16-0.18+nan-0.46+0.61-0.45
PCIslam+nan+0.71+nan-0.10+1.00+nan-0.66+0.54+nan+0.16-0.71-0.15
RLG4+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
CountOfCountries+nan-0.28+nan+0.16-0.66+nan+1.00-0.54+nan-0.42+0.68+0.14
PeopleID2+nan+0.16+nan-0.18+0.54+nan-0.54+1.00+nan+0.34-0.27-0.05
RLG3+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
SecurityLevel+nan+0.45+nan-0.46+0.16+nan-0.42+0.34+nan+1.00-0.21+0.47
PCNonReligious+nan-0.46+nan+0.61-0.71+nan+0.68-0.27+nan-0.21+1.00+0.08
Longitude+nan+0.46+nan-0.45-0.15+nan+0.14-0.05+nan+0.47+0.08+1.00

ROL3 categorical feature

ROL3 holds 3-letter codes (likely ISO 639-3 language tags such as 'eng', 'arz', 'ary', 'apc', 'afb') across 50 complete rows with 10 distinct values. Distribution is skewed toward Levantine/Gulf Arabic variants: 'apc' covers 36% and 'afb' another 13/50, while six codes appear only once or twice. Entropy ratio of 0.75 indicates moderate concentration rather than uniform spread.

Treatment: Group rare codes into an 'other' bucket, then one-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["ROL3"].stats

statvalue
n50
nulls0 (0.0%)
unique10
top_value apc
top_rate 0.36
cardinality 10
entropy 2.501
entropy_ratio 0.7527
Fig 8.
Top values for ROL3.
Show data table
Top values for ROL3 (10 unique shown, of 10 total).
valuecountshare
apc1836.0%
afb1326.0%
acx816.0%
acm48.0%
swh24.0%
thz12.0%
apd12.0%
eng12.0%
ary12.0%
arz12.0%

PhotoCredits categorical metadata

Attribution string for the image accompanying each row, naming photographer and source platform. Just 5 distinct credits cover all 50 rows, with 'Hashim Abdullah - Pixabay' alone accounting for 56% and the top three Pixabay/Flickr contributors covering 48 of 50 entries. Two credits ('Link Up Africa', 'Claudiovidri - Shutterstock') appear only once, suggesting a long tail of incidental sources.

Treatment: Retain as provenance metadata; drop from modelling features.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["PhotoCredits"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value Hashim Abdullah - Pixabay
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 9.
Top values for PhotoCredits.
Show data table
Top values for PhotoCredits (5 unique shown, of 5 total).
valuecountshare
Hashim Abdullah - Pixabay2856.0%
Hella Nijssen - Pixabay1224.0%
CharlesFred - Flickr816.0%
Link Up Africa12.0%
Claudiovidri - Shutterstock12.0%

PrimaryReligion categorical feature

Categorical column capturing the dominant religion of each record, with only two observed values across 50 rows. The distribution is severely imbalanced: Islam accounts for 49 of 50 entries (top_rate 0.98) and Christianity for just 1, yielding an entropy ratio of 0.14. With effectively no variance, this column carries almost no discriminative signal.

Treatment: Drop or collapse to a binary 'is_Islam' flag; near-constant for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["PrimaryReligion"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Islam
top_rate 0.98
cardinality 2
entropy 0.1414
entropy_ratio 0.1414
alert: imbalancetop value is 98.0% of rows
Fig 10.
Top values for PrimaryReligion.
Show data table
Top values for PrimaryReligion (2 unique shown, of 2 total).
valuecountshare
Islam4998.0%
Christianity12.0%

Ctry categorical feature

Country field with 41 distinct values across 50 rows and no nulls. The distribution is essentially flat — entropy ratio is 0.986 and the most common value, United Arab Emirates, appears just twice (4%). Nine countries tie at 2 occurrences each, the rest are singletons, hence the long_tail alert.

Treatment: Group rare countries into regions or an 'Other' bucket before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["Ctry"].stats

statvalue
n50
nulls0 (0.0%)
unique41
top_value United Arab Emirates
top_rate 0.04
cardinality 41
entropy 5.284
entropy_ratio 0.9862
alert: long_tail32 singleton categories
Fig 11.
Top values for Ctry.
Show data table
Top values for Ctry (20 unique shown, of 41 total).
valuecountshare
United Arab Emirates24.0%
Canada24.0%
Egypt24.0%
Kenya24.0%
Somalia24.0%
Kuwait24.0%
Oman24.0%
Saudi Arabia24.0%
Yemen24.0%
Niger12.0%
Sudan12.0%
Tanzania12.0%
Ukraine12.0%
Algeria12.0%
Australia12.0%
Austria12.0%
Bahrain12.0%
Brazil12.0%
Bulgaria12.0%
Sri Lanka12.0%

RegionName categorical feature

RegionName is a categorical geographic grouping with 11 distinct regions across 50 rows and no nulls. The distribution is uneven: 'Africa, North and Middle East' alone accounts for 32% (16/50), and the three African regions together dominate the column. Entropy ratio of 0.86 indicates spread is fairly even given the cardinality, but several regions ('Australia and Pacific', 'America, Latin') appear only once.

Treatment: one-hot or target-encode; consider grouping single-row regions to avoid sparse levels.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["RegionName"].stats

statvalue
n50
nulls0 (0.0%)
unique11
top_value Africa, North and Middle East
top_rate 0.32
cardinality 11
entropy 2.973
entropy_ratio 0.8595
Fig 12.
Top values for RegionName.
Show data table
Top values for RegionName (11 unique shown, of 11 total).
valuecountshare
Africa, North and Middle East1632.0%
Africa, East and Southern714.0%
Europe, Western612.0%
America, North and Caribbean510.0%
Africa, West and Central48.0%
Europe, Eastern and Eurasia48.0%
Asia, Southeast36.0%
Asia, South24.0%
Australia and Pacific12.0%
America, Latin12.0%
Asia, Central12.0%

BibleYear categorical metadata

BibleYear appears to be a metadata field capturing the publication year or year-range of a Bible edition, with values like "1890-2024", "1382-2020", and "2021". The column is almost entirely empty: 92% null with only 4 of 50 rows populated and 3 distinct values. Format is inconsistent, mixing single years with hyphenated ranges, which blocks numeric parsing.

Treatment: Drop or quarantine; null rate of 0.92 and mixed year/range formats make it unusable without manual curation.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["BibleYear"].stats

statvalue
n50
nulls46 (92.0%)
unique3
top_value 1890-2024
top_rate 0.5
cardinality 3
entropy 1.5
entropy_ratio 0.9464
alert: long_tail2 singleton categories
alert: null_rate92.0% null
Fig 13.
Top values for BibleYear.
Show data table
Top values for BibleYear (3 unique shown, of 3 total).
valuecountshare
1890-202424.0%
1382-202012.0%
202112.0%

RLG3PC numeric other

RLG3PC is a numeric column that is entirely constant: all 50 rows hold the value 6.0, with zero variance and only 1 unique value. There is no information for a model to learn from, and no nulls or outliers to caveat.

Treatment: Drop; constant column carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["RLG3PC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 6
max 6
mean 6
median 6
std 0
q1 6
q3 6
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 14.
Distribution of RLG3PC. Vertical dash marks the median.
Show data table
Histogram bins for RLG3PC (median: 6.0).
bincount
5.5 – 5.6430
5.643 – 5.7860
5.786 – 5.9290
5.929 – 6.07150
6.071 – 6.2140
6.214 – 6.3570
6.357 – 6.50

Population numeric feature

Population counts across 50 rows, ranging from 200 to 2,221,000 with a median of just 46,500 versus a mean of 264,074. The distribution is severely right-skewed (skew 2.52, kurtosis 5.78) and 12% of rows (6 values) flag as outliers, indicating a few very large populations dominate an otherwise small-town dataset. No nulls or zeros, and 49 of 50 values are unique.

Treatment: log-transform before any modelling to tame the right skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["Population"].stats

statvalue
n50
nulls0 (0.0%)
unique49
min 200
max 2.221e+06
mean 264,074
median 46,500
std 4.927e+05
q1 13,000
q3 272,500
iqr 259,500
skew 2.52
kurtosis 5.781
n_outliers 6
outlier_rate 0.12
zero_rate 0
alert: high_skewskew=+2.52
alert: outliers12.0% rows beyond 1.5 IQR
Fig 15.
Distribution of Population. Vertical dash marks the median.
Show data table
Histogram bins for Population (median: 46500.0).
bincount
200 – 3.175e+0540
3.175e+05 – 6.347e+054
6.347e+05 – 9.52e+052
9.52e+05 – 1.269e+060
1.269e+06 – 1.586e+062
1.586e+06 – 1.904e+061
1.904e+06 – 2.221e+061

Resources unknown other

The column is named "Resources" and contains 50 non-null entries, but saturn skipped profiling it so its kind is unknown and no descriptive stats (uniqueness, distribution, type) are available. Without further signals, its content and structure cannot be characterized from this evidence.

Treatment: Re-run profiling with type coercion or inspect raw values manually before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[37]:

saturn.columns["Resources"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

LeastReachedPGAC categorical feature

Binary Y/N flag indicating whether some 'least reached PGAC' condition was met, with no nulls across 50 rows. The split is nearly balanced (28 N, 22 Y; top_rate 0.56) and entropy_ratio of 0.99 confirms maximal informativeness for a binary feature.

Treatment: Encode as 0/1 boolean for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[39]:

saturn.columns["LeastReachedPGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.56
cardinality 2
entropy 0.9896
entropy_ratio 0.9896
Fig 16.
Top values for LeastReachedPGAC.
Show data table
Top values for LeastReachedPGAC (2 unique shown, of 2 total).
valuecountshare
N2856.0%
Y2244.0%

NumberLanguagesSpoken numeric feature

Counts the number of languages spoken, but with only 2 unique values across 50 rows it is effectively a binary indicator of monolingual vs bilingual. The mean of 1.04 and median of 1.0 show 48 of 50 records sit at 1, with just 2 outliers at 2.0 driving the extreme skew (4.69) and kurtosis (20.04).

Treatment: Recode as a binary multilingual flag; the raw count adds no information.

anthropic:claude-opus-4-7 · confidence high
Out[42]:

saturn.columns["NumberLanguagesSpoken"].stats

statvalue
n50
nulls0 (0.0%)
unique2
min 1
max 2
mean 1.04
median 1
std 0.1979
q1 1
q3 1
iqr 0
skew 4.695
kurtosis 20.04
n_outliers 2
outlier_rate 0.04
zero_rate 0
alert: high_skewskew=+4.69
Fig 17.
Distribution of NumberLanguagesSpoken. Vertical dash marks the median.
Show data table
Histogram bins for NumberLanguagesSpoken (median: 1.0).
bincount
1 – 1.14348
1.143 – 1.2860
1.286 – 1.4290
1.429 – 1.5710
1.571 – 1.7140
1.714 – 1.8570
1.857 – 22

GSEC categorical feature

GSEC is a binary categorical field with exactly two values, "1" and an empty string, split perfectly 25/25 across the 50 rows. The maximum entropy (1.0) confirms a balanced flag, but the empty string rather than "0" or null suggests the absent state is encoded as a blank string rather than a true missing value.

Treatment: Recode empty string to 0 (or NaN if it means missing) and treat as a binary indicator.

anthropic:claude-opus-4-7 · confidence high
Out[45]:

saturn.columns["GSEC"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value 1
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
Fig 18.
Top values for GSEC.
Show data table
Top values for GSEC (2 unique shown, of 2 total).
valuecountshare
12550.0%
2550.0%

AudioRecordings categorical metadata

AudioRecordings is a categorical flag that takes the single value 'Y' across all 50 rows, with zero nulls and entropy of 0. Because cardinality is 1 and top_rate is 1.0, the column carries no information and cannot discriminate between records.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[48]:

saturn.columns["AudioRecordings"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Y
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 19.
Top values for AudioRecordings.
Show data table
Top values for AudioRecordings (1 unique shown, of 1 total).
valuecountshare
Y50100.0%

PercentAdherents categorical feature

PercentAdherents holds numeric percentages stored as strings, with 29 distinct values across 50 rows and no nulls. The mode is "0.000" at 12% of rows, and entropy ratio 0.942 indicates a very flat distribution with a long tail of small-frequency values. The mix of fractional (0.200, 0.500) and whole-number (5.000, 6.000) entries suggests the values are raw percentages rather than proportions.

Treatment: Cast strings to float and treat as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence medium
Out[51]:

saturn.columns["PercentAdherents"].stats

statvalue
n50
nulls0 (0.0%)
unique29
top_value 0.000
top_rate 0.12
cardinality 29
entropy 4.576
entropy_ratio 0.942
alert: long_tail18 singleton categories
Fig 20.
Top values for PercentAdherents.
Show data table
Top values for PercentAdherents (20 unique shown, of 29 total).
valuecountshare
0.000612.0%
5.000510.0%
0.20036.0%
0.50036.0%
2.00036.0%
1.00024.0%
6.00024.0%
4.00024.0%
3.00024.0%
0.30024.0%
0.10024.0%
0.40012.0%
1.90012.0%
7.00012.0%
8.50012.0%
36.00012.0%
37.00012.0%
65.00012.0%
33.00012.0%
25.00012.0%

ROP1 categorical metadata

ROP1 is a categorical column holding a single constant value 'A001' across all 50 rows, with zero nulls and entropy of 0.0. It carries no information for modelling or segmentation since cardinality is 1 and top_rate is 1.0.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[54]:

saturn.columns["ROP1"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value A001
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 21.
Top values for ROP1.
Show data table
Top values for ROP1 (1 unique shown, of 1 total).
valuecountshare
A00150100.0%

JPScalePGAC categorical feature

A binary categorical field with only the values "1" and "2", split 28/22 across 50 rows. The near-maximal entropy ratio (0.99) indicates an almost balanced two-class distribution with no nulls. The column name suggests a Japanese PGA scale code, likely an ordinal seismic-intensity or rating bucket.

Treatment: Cast to categorical (or 0/1 indicator) before modelling.

anthropic:claude-opus-4-7 · confidence medium
Out[57]:

saturn.columns["JPScalePGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value 2
top_rate 0.56
cardinality 2
entropy 0.9896
entropy_ratio 0.9896
Fig 22.
Top values for JPScalePGAC.
Show data table
Top values for JPScalePGAC (2 unique shown, of 2 total).
valuecountshare
22856.0%
12244.0%

Latitude numeric feature

Numeric column holding geographic latitude in degrees, with all 50 values unique and no nulls. The range spans -33.87 to 59.11, consistent with worldwide coordinates, and the distribution is mildly left-skewed (-0.46) with a mean of 22.28 sitting below the median of 24.12. Only one outlier (2%) is flagged, suggesting one row sits far from the otherwise broad spread (IQR 26.18).

Treatment: Pair with longitude as a geospatial coordinate; consider binning or distance features rather than using as a raw scalar.

anthropic:claude-opus-4-7 · confidence high
Out[60]:

saturn.columns["Latitude"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min -33.87
max 59.11
mean 22.28
median 24.12
std 20.19
q1 9.247
q3 35.43
iqr 26.18
skew -0.4594
kurtosis 0.0487
n_outliers 1
outlier_rate 0.02
zero_rate 0
Fig 23.
Distribution of Latitude. Vertical dash marks the median.
Show data table
Histogram bins for Latitude (median: 24.115615886064248).
bincount
-33.87 – -20.582
-20.58 – -7.3020
-7.302 – 5.9819
5.981 – 19.269
19.26 – 32.5516
32.55 – 45.837
45.83 – 59.117

PeopNameInCountry categorical label

Categorical label naming the people group in-country, with only 7 distinct values across 50 rows and no nulls. The distribution is heavily concentrated on Arab variants: 'Arab' alone covers 54% of rows, and the top three Arab-prefixed labels account for 46 of 50 entries, leaving a long tail of singletons like 'Tuareg, Air' and 'Amri'. Entropy ratio of 0.65 confirms the imbalance flagged by the long_tail alert.

Treatment: Collapse rare singletons into an 'Other' bucket before any group-wise analysis.

anthropic:claude-opus-4-7 · confidence high
Out[63]:

saturn.columns["PeopNameInCountry"].stats

statvalue
n50
nulls0 (0.0%)
unique7
top_value Arab
top_rate 0.54
cardinality 7
entropy 1.835
entropy_ratio 0.6537
alert: long_tail4 singleton categories
Fig 24.
Top values for PeopNameInCountry.
Show data table
Top values for PeopNameInCountry (7 unique shown, of 7 total).
valuecountshare
Arab2754.0%
Arab, Arabic Gulf Spoken1122.0%
Arab, Omani816.0%
Tuareg, Air12.0%
Amri12.0%
Arab, Emirati12.0%
Arab, Levantine12.0%

Window1040 categorical feature

Window1040 is a binary Y/N flag, almost perfectly balanced with 26 'Y' and 24 'N' across 50 rows. Entropy ratio of 0.999 confirms a near-maximum-uncertainty split, and there are no nulls. The name suggests a windowed indicator (possibly a 1040-period rolling event flag), but the evidence does not confirm its semantics.

Treatment: Encode as a 0/1 boolean for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[66]:

saturn.columns["Window1040"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Y
top_rate 0.52
cardinality 2
entropy 0.9988
entropy_ratio 0.9988
Fig 25.
Top values for Window1040.
Show data table
Top values for Window1040 (2 unique shown, of 2 total).
valuecountshare
Y2652.0%
N2448.0%

PeopleGroupMapURL categorical metadata

URL pointing to a people-group map image hosted on joshuaproject.net, one per row. 48% of the 50 rows are empty strings, so the field is missing more often than populated, and a single map (m00007.png) accounts for 9 of the 26 non-blank entries. With 17 unique values across 50 rows and a long_tail alert, most distinct URLs appear only once.

Treatment: Treat empty strings as missing and drop from modelling; keep only as a display link.

anthropic:claude-opus-4-7 · confidence high
Out[69]:

saturn.columns["PeopleGroupMapURL"].stats

statvalue
n50
nulls0 (0.0%)
unique17
top_value
top_rate 0.48
cardinality 17
entropy 2.792
entropy_ratio 0.6832
alert: long_tail13 singleton categories
Fig 26.
Top values for PeopleGroupMapURL.
Show data table
Top values for PeopleGroupMapURL (17 unique shown, of 17 total).
valuecountshare
2448.0%
https://joshuaproject.net/assets/media/profiles/maps/m00007.png918.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375.png24.0%
https://joshuaproject.net/assets/media/profiles/maps/m00307.png24.0%
https://joshuaproject.net/assets/media/profiles/maps/m10208_ng.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m00005.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_tz.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_ae.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_ke.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_rp.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_ir.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_us.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_ae.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_ku.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_mu.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_sa.png12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_ym.png12.0%

CountryURL categorical foreign_key

URLs to country pages on joshuaproject.net, with the two-letter country code as the path suffix. With 41 unique values across 50 rows and entropy ratio 0.986, the column is near-unique; the most frequent URL (UAE) appears just twice (top_rate 0.04). The base domain is constant, so the country code is the only informative part.

Treatment: Extract the trailing country code and use that as the join/grouping key instead of the raw URL.

anthropic:claude-opus-4-7 · confidence high
Out[72]:

saturn.columns["CountryURL"].stats

statvalue
n50
nulls0 (0.0%)
unique41
top_value https://joshuaproject.net/countries/AE
top_rate 0.04
cardinality 41
entropy 5.284
entropy_ratio 0.9862
alert: long_tail32 singleton categories
Fig 27.
Top values for CountryURL.
Show data table
Top values for CountryURL (20 unique shown, of 41 total).
valuecountshare
https://joshuaproject.net/countries/AE24.0%
https://joshuaproject.net/countries/CA24.0%
https://joshuaproject.net/countries/EG24.0%
https://joshuaproject.net/countries/KE24.0%
https://joshuaproject.net/countries/SO24.0%
https://joshuaproject.net/countries/KU24.0%
https://joshuaproject.net/countries/MU24.0%
https://joshuaproject.net/countries/SA24.0%
https://joshuaproject.net/countries/YM24.0%
https://joshuaproject.net/countries/NG12.0%
https://joshuaproject.net/countries/SU12.0%
https://joshuaproject.net/countries/TZ12.0%
https://joshuaproject.net/countries/UP12.0%
https://joshuaproject.net/countries/AG12.0%
https://joshuaproject.net/countries/AS12.0%
https://joshuaproject.net/countries/AU12.0%
https://joshuaproject.net/countries/BA12.0%
https://joshuaproject.net/countries/BR12.0%
https://joshuaproject.net/countries/BU12.0%
https://joshuaproject.net/countries/CE12.0%

PercentEvangelicalPC categorical feature

Numeric-looking field stored as a categorical with only 3 distinct values across 50 rows, and 96% of rows share the single value '0.197'. The other two values ('0.103' and '0.265') each appear exactly once, giving an extreme imbalance and an entropy ratio of just 0.178. This looks like a principal-component or aggregate score that has been collapsed/repeated for nearly every record, leaving almost no signal.

Treatment: Drop or treat as constant — near-zero variance offers no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[75]:

saturn.columns["PercentEvangelicalPC"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value 0.197
top_rate 0.96
cardinality 3
entropy 0.2823
entropy_ratio 0.1781
alert: long_tail2 singleton categories
alert: imbalancetop value is 96.0% of rows
Fig 28.
Top values for PercentEvangelicalPC.
Show data table
Top values for PercentEvangelicalPC (3 unique shown, of 3 total).
valuecountshare
0.1974896.0%
0.10312.0%
0.26512.0%

CountOfProvinces unknown other

CountOfProvinces was skipped by the profiler, so no type, uniqueness, or distribution stats are available beyond a row count of 50 with no nulls. The name suggests an integer tally of provinces per record, but this cannot be confirmed from the evidence. No further signal is present to flag skew, duplicates, or range.

Treatment: Re-run the profiler on this column to recover type and distribution before deciding how to use it.

anthropic:claude-opus-4-7 · confidence low
Out[78]:

saturn.columns["CountOfProvinces"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

PercentEvangelicalPGAC categorical feature

Likely a percentage of evangelical adherents (PGAC denomination grouping) stored as strings rather than floats, with only 5 distinct values across 50 rows. The distribution is severely lumpy: '1.892' covers 56% of rows and the top three values ('1.892', '0.233', '0.023') account for 48 of 50 observations, suggesting these are imputed or default category codes rather than true continuous measurements.

Treatment: Cast to float and treat as a low-cardinality categorical or imputed flag rather than a continuous percentage.

anthropic:claude-opus-4-7 · confidence medium
Out[80]:

saturn.columns["PercentEvangelicalPGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value 1.892
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 29.
Top values for PercentEvangelicalPGAC.
Show data table
Top values for PercentEvangelicalPGAC (5 unique shown, of 5 total).
valuecountshare
1.8922856.0%
0.2331224.0%
0.023816.0%
0.20012.0%
0.00012.0%

NomadicTypeDescription categorical metadata

This appears to be a descriptive label for a nomadic lifestyle classification, but it carries almost no information in this sample. 90% of rows are null, and the 5 non-null rows all hold the single value 'Agro-Pastoralists' (top_rate 1.0, cardinality 1, entropy 0.0). As-is, the column cannot discriminate between records.

Treatment: Drop: 90% null and only one observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[83]:

saturn.columns["NomadicTypeDescription"].stats

statvalue
n50
nulls45 (90.0%)
unique1
top_value Agro-Pastoralists
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate90.0% null
alert: imbalancetop value is 100.0% of rows
Fig 30.
Top values for NomadicTypeDescription.
Show data table
Top values for NomadicTypeDescription (1 unique shown, of 1 total).
valuecountshare
Agro-Pastoralists510.0%

MapCredits categorical metadata

MapCredits holds attribution strings for the map associated with each row, citing sources like Joshua Project, GMI, ESRI, and Bethany World Prayer Center. Nearly half the rows (24 of 50) carry an empty string rather than a null, and a single credit to 'Bethany World Prayer Center' covers another 14 rows, leaving only 7 distinct values across the column. The dominance of blanks alongside non-null status is the main surprise — missingness is encoded as empty text, not NULL.

Treatment: Normalize empty strings to nulls and treat as provenance metadata; drop from modelling features.

anthropic:claude-opus-4-7 · confidence high
Out[86]:

saturn.columns["MapCredits"].stats

statvalue
n50
nulls0 (0.0%)
unique7
top_value
top_rate 0.48
cardinality 7
entropy 1.987
entropy_ratio 0.7077
Fig 31.
Top values for MapCredits.
Show data table
Top values for MapCredits (7 unique shown, of 7 total).
valuecountshare
2448.0%
Bethany World Prayer Center1428.0%
Location: IMB. Imagery: GMI, ESRI, Maxar, Earthstar Geographics, ESRI User Community. Design: Joshua Project.612.0%
People Group data: Omid. Map geography: UNESCO / GMI. Map Design: Joshua Project24.0%
Ethnic Peoples of Somalia24.0%
Location: Philippine Census 2020 / web research. Imagery: GMI, ESRI, Maxar, Earthstar Geographics, ESRI User Community. Design: Joshua Project.12.0%
Location: US Census Bureau. Imagery: GMI, ESRI, Maxar, Earthstar Geographics, ESRI User Community. Design: Joshua Project.12.0%

HasJesusFilm categorical feature

Binary Y/N flag indicating whether each record has an associated 'Jesus Film' resource. The column is complete (null_rate 0.0) with only 2 unique values, heavily skewed toward 'Y' at 82% (41 of 50), leaving just 9 'N' cases.

Treatment: Encode as boolean; expect limited discriminatory power given the 82/18 imbalance.

anthropic:claude-opus-4-7 · confidence high
Out[89]:

saturn.columns["HasJesusFilm"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Y
top_rate 0.82
cardinality 2
entropy 0.6801
entropy_ratio 0.6801
Fig 32.
Top values for HasJesusFilm.
Show data table
Top values for HasJesusFilm (2 unique shown, of 2 total).
valuecountshare
Y4182.0%
N918.0%

HowReach categorical free_text

HowReach holds free-text outreach suggestions, likely missionary engagement strategies for various people groups. The column is dominated by empty strings (62% top_rate, 31 of 50 rows blank), and every non-blank value is unique, yielding 20 distinct values across 50 rows with no nulls flagged. Entropy ratio of 0.60 plus the long_tail alert confirm this is essentially sparse prose, not a categorical variable.

Treatment: Treat blanks as missing and tokenize/embed the prose entries rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[92]:

saturn.columns["HowReach"].stats

statvalue
n50
nulls0 (0.0%)
unique20
top_value
top_rate 0.62
cardinality 20
entropy 2.572
entropy_ratio 0.5952
alert: long_tail19 singleton categories
Fig 33.
Top values for HowReach.
Show data table
Top values for HowReach (20 unique shown, of 20 total).
valuecountshare
3162.0%
Christ followers can take gospel recordings and Bible portions to the Air Tuareg people during their festivals. It would be helpful if these people spoke French since that is a key trade language in Niger.12.0%
Christ followers with either medical or veterinarian skills can open doors to the gospel by blessing the Amri people.12.0%
Christian believers with faith to move mountains can go to the Arabs in Tanzania and pray for miracles among those who are sick or needy. These Arabs need to see God's power over sickness and death.12.0%
As U.A.E.'s economy boomed, they had to import large numbers of expatriates to build their economy. It's very possible for Christ followers to share the Lord with the Muslims of Oman.12.0%
Pray the Lord will start a movement of Arab families experiencing God's blessings. Canadian believers can join Arabs in their celebrations, and offer them the JESUS Film and other appropriate materials. Canadian believers can probably share gospel materials with Arabs they meet while traveling to Gulf countries, so long as they do it discreetly.12.0%
Believers can reach out to Arab refugees in train stations with food, water and gospel materials.12.0%
Some of the Kenyan Arabs also believe in spirits, so they practice magic to gain their favor. They would welcome Christ's servants who come as his hands to heal the sick and spiritually stricken.12.0%
Christian workers who appreciate Arabic food and culture can befriend them and share the love of their best friend, Jesus Christ.12.0%
Radio and satellite TV efforts like SAT-7 provide all Arabic speakers with a gospel message in the privacy of their own homes. Gulf Arabs who respond can be sent Bibles and the JESUS Film.12.0%
It will take some very creative and dedicated groups to adopt a similar lifestyle as the nomadic Arabs and live close to them, befriend them, and share his blessings with them. For that reason, believers in Tehran should reach any Gulf Arabs they find in the city.12.0%
Perhaps many Arabs of the Persian Gulf region will discover the availability on the internet of the Jesus film, scripture, and other resources in the Arabic language. Pray especially that youth will discover these and download them and view them.12.0%
Christ followers with professional skills can live in Kuwait and have personal contact with them.12.0%
Radio broadcasts and sending them copies of the JESUS Film might help them to find their way to the cross.12.0%
Pray that some of the many Christian expatriates in Qatar will seize their opportunities to share Jesus Christ with indigenous Arabs. Pray for boldness, coupled with tact.12.0%
Christ followers with professional skills can live in Saudi Arabia and have personal contact with them.12.0%
Omanis and other Gulf Arabic speakers have an opportunity to hear the gospel in a country where there is freedom of religion. Christ followers should take advantage of that opportunity.12.0%
Gulf Arabs in Yemen cannot accept Christ's abundant life unless the Lord moves among them and sends workers.12.0%
The Omani standard is to accept others on their terms. For example, they view anything less than excessive generosity as rudeness. Even Christians are tolerated as long as they are not Muslim converts. Perhaps those who genuinely follow Christ who can live up to high social standards can be used of the Lord to reach Omani Arabs in Kenya.12.0%
Omani Arabs may be attracted to the poetry and stories in the Old Testament as well as stories in the four gospels.12.0%

PCIslam numeric feature

PCIslam appears to be a percentage measure of Islamic affiliation per record, ranging 25.0 to 100.0 with a median of 95.99 and mean 91.12. The distribution is heavily left-skewed (skew -2.65, kurtosis 7.39) with a tight IQR of 6.55 between Q1 92.93 and Q3 99.48, yet 8 outliers (16%) sit far below that cluster. Most observations are near-saturation while a small tail of low-share records pulls the mean down.

Treatment: Consider a reflected log or beta transform before modelling to tame the left skew and downweight the low-share outliers.

anthropic:claude-opus-4-7 · confidence high
Out[95]:

saturn.columns["PCIslam"].stats

statvalue
n50
nulls0 (0.0%)
unique30
min 25
max 100
mean 91.12
median 95.99
std 14.7
q1 92.93
q3 99.47
iqr 6.55
skew -2.648
kurtosis 7.388
n_outliers 8
outlier_rate 0.16
zero_rate 0
alert: high_skewskew=-2.65
alert: outliers16.0% rows beyond 1.5 IQR
Fig 34.
Distribution of PCIslam. Vertical dash marks the median.
Show data table
Histogram bins for PCIslam (median: 95.99423076923074).
bincount
25 – 35.711
35.71 – 46.430
46.43 – 57.141
57.14 – 67.862
67.86 – 78.573
78.57 – 89.291
89.29 – 10042

NTYear categorical free_text

NTYear appears to be a free-form annotation about a 'NT' year status, mixing a yes/no flag with single years (e.g. 2005, 1932, 2012) and year ranges (e.g. 1879-1989, 1990-2003). The format is inconsistent: 'Yes' dominates at 62% of non-null entries, while 42% of all rows are null and the remaining cells split across 7 heterogeneous values. This is effectively two or three different fields collapsed into one string column.

Treatment: Split into a boolean indicator and parsed year/year-range fields before use.

anthropic:claude-opus-4-7 · confidence high
Out[98]:

saturn.columns["NTYear"].stats

statvalue
n50
nulls21 (42.0%)
unique8
top_value Yes
top_rate 0.6207
cardinality 8
entropy 1.925
entropy_ratio 0.6416
alert: long_tail5 singleton categories
alert: null_rate42.0% null
Fig 35.
Top values for NTYear.
Show data table
Top values for NTYear (8 unique shown, of 8 total).
valuecountshare
Yes1836.0%
200548.0%
1879-198924.0%
1990-200312.0%
1978-202212.0%
1380-201112.0%
201212.0%
193212.0%

RLG4 numeric other

RLG4 is a numeric column that is effectively unusable: 86% of its 50 rows are null, and every one of the remaining values equals 20.0 (min, median, max, std all confirm this). With a single distinct value and no variance, it carries no information for modelling.

Treatment: Drop the column; it is constant where present and 86% null.

anthropic:claude-opus-4-7 · confidence high
Out[101]:

saturn.columns["RLG4"].stats

statvalue
n50
nulls43 (86.0%)
unique1
min 20
max 20
mean 20
median 20
std 0
q1 20
q3 20
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate86.0% null
alert: constantonly one distinct value
Fig 36.
Distribution of RLG4. Vertical dash marks the median.
Show data table
Histogram bins for RLG4 (median: 20.0).
bincount
19.5 – 19.70
19.7 – 19.90
19.9 – 20.17
20.1 – 20.30
20.3 – 20.50

AffinityBloc categorical metadata

AffinityBloc is a categorical grouping label, but every one of the 50 rows holds the same value, "Arab World". With cardinality 1 and entropy 0, this column carries no information for distinguishing records in this slice.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[104]:

saturn.columns["AffinityBloc"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Arab World
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 37.
Top values for AffinityBloc.
Show data table
Top values for AffinityBloc (1 unique shown, of 1 total).
valuecountshare
Arab World50100.0%

NaturalName categorical label

NaturalName is a low-cardinality categorical label, likely an ethnic or linguistic group identifier, with 7 distinct values across 50 rows and no nulls. The distribution is heavily concentrated: 'Arab' alone covers 54% of rows, and together with 'Gulf-spoken Arab' (11) and 'Omani Arab' (8) accounts for 46 of 50 records, leaving four singleton categories in a long tail. Entropy ratio of 0.65 confirms the imbalance.

Treatment: Group the four singleton categories into an 'Other' bucket before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[107]:

saturn.columns["NaturalName"].stats

statvalue
n50
nulls0 (0.0%)
unique7
top_value Arab
top_rate 0.54
cardinality 7
entropy 1.835
entropy_ratio 0.6537
alert: long_tail4 singleton categories
Fig 38.
Top values for NaturalName.
Show data table
Top values for NaturalName (7 unique shown, of 7 total).
valuecountshare
Arab2754.0%
Gulf-spoken Arab1122.0%
Omani Arab816.0%
Air Tuareg12.0%
Amri12.0%
Emirati Arab12.0%
Levantine Arab12.0%

PercentChristianPGAC categorical feature

This column appears to be a percentage-based metric (likely 'Percent Christian' from a PGAC indicator) stored as strings, with only 5 distinct values across 50 rows. The distribution is heavily concentrated: '14.741' accounts for 56% of records, followed by '0.935' (12 rows) and '0.066' (8 rows), suggesting these are repeated category-level constants rather than per-row measurements. The presence of just 5 unique values for what looks like a continuous percentage is suspicious and points to either aggregated/joined reference data or a coarse bucketing.

Treatment: Cast to numeric and verify whether values are constants from a join key before using as a feature.

anthropic:claude-opus-4-7 · confidence medium
Out[110]:

saturn.columns["PercentChristianPGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value 14.741
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 39.
Top values for PercentChristianPGAC.
Show data table
Top values for PercentChristianPGAC (5 unique shown, of 5 total).
valuecountshare
14.7412856.0%
0.9351224.0%
0.066816.0%
0.20012.0%
0.00012.0%

PrimaryLanguageName categorical feature

This is a categorical column naming a primary language, dominated by Arabic dialects across 10 distinct values in 50 rows. Levantine Arabic leads at 18/50 (36%), followed by Gulf (13) and Omani (8); non-Arabic entries are rare (Swahili 2, plus singletons for Tamajeq, English, etc.). Entropy ratio of 0.75 indicates moderate concentration without a single overwhelming class, and there are no nulls.

Treatment: Group rare non-Arabic and minor dialects into an 'Other' bucket before one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[113]:

saturn.columns["PrimaryLanguageName"].stats

statvalue
n50
nulls0 (0.0%)
unique10
top_value Arabic, Levantine
top_rate 0.36
cardinality 10
entropy 2.501
entropy_ratio 0.7527
Fig 40.
Top values for PrimaryLanguageName.
Show data table
Top values for PrimaryLanguageName (10 unique shown, of 10 total).
valuecountshare
Arabic, Levantine1836.0%
Arabic, Gulf1326.0%
Arabic, Omani816.0%
Arabic, Mesopotamian48.0%
Swahili24.0%
Tamajeq, Tayart12.0%
Arabic, Sudanese12.0%
English12.0%
Arabic, Moroccan12.0%
Arabic, Egyptian12.0%

CountOfCountries numeric feature

Numeric count of countries with only 4 unique values across 50 rows, ranging 1 to 28 with median 28 and mean 19.88. The distribution is heavily concentrated at the maximum (median equals max, Q3 equals max), indicating most rows hit the ceiling of 28 while a minority sit much lower. Negative kurtosis (-1.52) and mild left skew (-0.43) confirm a bimodal-like spread rather than a smooth distribution.

Treatment: Treat as a low-cardinality ordinal or bin into categorical buckets rather than as a continuous variable.

anthropic:claude-opus-4-7 · confidence high
Out[116]:

saturn.columns["CountOfCountries"].stats

statvalue
n50
nulls0 (0.0%)
unique4
min 1
max 28
mean 19.88
median 28
std 9.512
q1 12
q3 28
iqr 16
skew -0.4253
kurtosis -1.521
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 41.
Distribution of CountOfCountries. Vertical dash marks the median.
Show data table
Histogram bins for CountOfCountries (median: 28.0).
bincount
1 – 4.8572
4.857 – 8.7148
8.714 – 12.5712
12.57 – 16.430
16.43 – 20.290
20.29 – 24.140
24.14 – 2828

PeopleID2 numeric foreign_key

PeopleID2 is stored as numeric but behaves like a categorical key, with only 3 unique values across 50 rows and an IQR of 0 because Q1, median, and Q3 all equal 111. The mean (115.04) is pulled above the median by 2 outliers reaching up to 307, producing extreme skew (6.85) and kurtosis (44.93). No nulls or zeros are present.

Treatment: Treat as a categorical identifier (not a numeric feature) and left-join on it rather than aggregating.

anthropic:claude-opus-4-7 · confidence high
Out[119]:

saturn.columns["PeopleID2"].stats

statvalue
n50
nulls0 (0.0%)
unique3
min 111
max 307
mean 115
median 111
std 27.71
q1 111
q3 111
iqr 0
skew 6.847
kurtosis 44.93
n_outliers 2
outlier_rate 0.04
zero_rate 0
alert: high_skewskew=+6.85
Fig 42.
Distribution of PeopleID2. Vertical dash marks the median.
Show data table
Histogram bins for PeopleID2 (median: 111.0).
bincount
111 – 13949
139 – 1670
167 – 1950
195 – 2230
223 – 2510
251 – 2790
279 – 3071

Summary categorical free_text

Free-text ethnographic summaries describing people groups, with 21 unique values across 50 rows and a 60% top rate driven entirely by empty strings (30 of 50). The non-empty entries are long, prose paragraphs about Tuareg, Arab, and other groups, so this is descriptive content rather than a category. Entropy ratio of 0.61 and the long_tail alert confirm most non-empty values appear only once.

Treatment: Treat empty strings as missing and tokenize/embed the prose before any modelling.

anthropic:claude-opus-4-7 · confidence high
Out[122]:

saturn.columns["Summary"].stats

statvalue
n50
nulls0 (0.0%)
unique21
top_value
top_rate 0.6
cardinality 21
entropy 2.7
entropy_ratio 0.6146
alert: long_tail20 singleton categories
Fig 43.
Top values for Summary.
Show data table
Top values for Summary (20 unique shown, of 21 total).
valuecountshare
3060.0%
There are many Tuareg subgroups, including the Air Tuareg, named after their homeland in the Air Mountains. This area is part of the Sahel Desert, but the location is relatively wet, with enough water to sustain small herds. Though the Tuareg are known for being fierce desert warriors, the Air Tuareg hold a public festival which they open to outsiders. The festival features cultural events, camel rides and a market for locally made goods.12.0%
The Amri are one of the “Arabized” tribes in Sudan that have adopted Arab culture and the Islamic religion. Agriculture is the basis of the economy in most Arabized tribes. Sorghum and millet are their staple crops, including watermelons, gourds, okra, sesame, and cotton. They also raise livestock, making cheese and butter from cow and goat milk.12.0%
The (Coastal) Arabs live along the coasts of Tanzania and Kenya, in an area commonly known as the Coastal Belt. They are concentrated in some of the ancient settlements along the coast and in cities such as Dar es Salaam. These Arabs speak Arabiya, or Coast Arabic, which is an Arabic dialect. They also speak the regional language, Swahili. They interact with other Muslim communities but have their own culture and even their own cuisine.12.0%
U.A.E. society is one of the most diverse in the world, with workers from over 200 countries, including Oman. Omani Arabs go to the relatively prosperous UAE for jobs in the oil and service industries. They usually stay temporarily, though some are permanent.12.0%
Most of the Arabs in Canada are college students. They come to Canada to receive a good education in medicine and other high level studies. Their numbers are increasing, and Canadians are also beginning to attend universities in Arab countries. In addition to studying, they like to celebrate their own holidays. For example, Kuwaiti Arabs celebrate National Day, held on February 25-26. This is Kuwait's independence day, and some of them celebrate by building snowmen, something which they cannot do in hot, dry Kuwait.12.0%
There are two kinds of Arabs either passing through or living in Hungary. The poorer ones are trying to escape Iraq or Syria. Hungary is not their final destination; they are passing through in hopes of finding jobs in Germany or one of the Scandinavian countries. The other kind of Arab is much wealthier. There are Kuwaitis, Egyptian, and others purchasing land in places like Komlo and Pecs, Hungary.12.0%
The Kenyan Arabs live in fishing villages along the coasts of Tanzania and Kenya in an area known as the Coastal Belt. They are concentrated in some of the ancient settlements along the coast and in cities such as Mombasa and on Kenya's Lamu Islands. Their culture is still very similar to that of the first Arabs (desert nomads or Bedouins) and refer to their ancestors as the "true" Arabs. Virtually all Muslim, many believe that the Koran provides hope for a better life for now and for eternity.12.0%
Arab traders have been visiting the Philippines for about 2,000 years. Until around 1380 Syrian Arabs brought Christianity to the region along with pre-Islamic belief systems. Following 1380, Islam was the religion that most Arabs brought with them. Generally moving from the southern islands like Mindanao towards the northern ones, they converted the Filipinos to the Islamic religion. Many of these early Arabs married Filipina women. Those Filipinos with Arab parentage live primarily in Mindanao, while the more recent immigrants are living in Manila.12.0%
The Arab culture was developed by tribes of nomads and villagers who lived in the Arabian Desert. It was from there that Arab migrations throughout the Middle East and northern Africa began, leading to the expansion of the Arab world. Today, the region is home to a number of different types of Arabs. The Gulf Arabs (also known as the Saudi Arabs) live primarily along the southern edges of the Arabian Desert, though some have migrated to other Arabic-speaking countries like Egypt. They speak Arabiya or, as it is more commonly known, Gulf Arabic. Gulf Arabs in Egypt are usually from Saudi Arabia, a strict Islamic state. Alcohol and loose sexual behavior are strictly forbidden in Saudi Arabia, so they go to places like Egypt to indulge in these self-destructive activities.12.0%
Through the centuries, life has changed little for Gulf Arabs, who live in long tents made from woven goat or animal hair. Gulf Arabs live in the desert regions of Iran, herding their goats and sheep, traveling by camel from one oasis area to another. These nomadic people are very proud of their lifestyle and feel like it is a step down to farm or have any other job. Some of them have had to settle in Tehran.12.0%
Gulf Arabs are those from the countries bordering the Persian Gulf: Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, and the United Arab Emirates. The Arab states of the Persian Gulf share a regional culture that is sometimes referred to as "khaleeji (gulf) culture". They all speak Gulf Arabic and share similar music styles, cuisine, and dress. Most Arabs living near the Persian Gulf also trace their ancestry back to Arab tribes of either Najd (in what is now central Saudi Arabia) or Yemen. All the Arab States of the Persian Gulf have significant revenues from oil and gas and, with the exception of Saudi Arabia, have small local populations. Gulf Arabs today control much wealth.12.0%
The 19th century brought about a thriving trading community dominated by a couple of prominent merchant families that still hold much power in Kuwait today. Though there are many other Arabic speaking peoples in Kuwait, the native Kuwaitis are those who speak a Gulf language, of which Kuwaiti Arabic is a dialect. Since the development of the oil industry, many Kuwaiti Arab men are now bureaucrats, clerical technicians, industrialists, and other professionals. The oil economy has brought new communication systems, water systems and roads. The reformed educational system has led to one of the highest literacy rates in the region. Health care, affordable housing, and other social services have given Kuwaitis comfortable lives.12.0%
Gulf Arabic is the dialect of Arabic spoken mainly in the eastern half of Saudi Arabia and the nearby smaller countries such as Bahrain, Qatar and the UAE. It is also spoken by some in Yemen and Oman, two countries that border Saudi Arabia. Most Omanis speak the Omani dialect of Arabic, though some speak Gulf Arabic. Oman is a rural and under-populated nation on the Arabian Peninsula. Their economy has grown considerably, partly because of the discovery of oil deposits.12.0%
Qatar has been ruled as an absolute and hereditary emirate by the Al Thani family since the mid-19th century. Formerly one of the poorest Gulf states, it has become one of the region's wealthiest states due to its enormous oil and natural gas revenues. In 2010, Qatar had the world's highest GDP per capita, while the economy grew by 19%, the fastest in the world. With a small citizen population of fewer than 350,000 people, foreign workers outnumber native Qataris. Foreign expatriates come mainly from other Arab nations and the Indian subcontinent. Shari 'a (Islamic law) is the main source of Qatari legislation, and is applied to aspects of family law, inheritance, and certain criminal acts.12.0%
What is now Saudi Arabia is the birthplace of Islam and the Islamic prophet, Mohammed, who united Arabia under the banner of Islam and began to conquer lands as far west as Morocco and Spain and as far east as India. For hundreds of years, Arabia was inhabited by Bedouins. Little changed until 1902, when King Abdulaziz (aka, Ibn Saud) began to conquer the four regions of what is now Saudi Arabia. By 1932 the Kingdom of Saudi Arabia was founded by that same king. Saudi Arabia is also known as the "House of Saud" since it is ruled by one family.12.0%
In most cases Gulf Arabs have no need to leave their countries to live somewhere else as they are well provided for and so there are not many of them living in America. After World War II, many of the students from Gulf Arab countries came to America for higher education. The Gulf Arabs in America speak Arabic and English. Many live in California. Omanis are a small part of the overall population that speaks Gulf Arabic.12.0%
Gulf Arabic, also known as Arabiya, is the dialect of Arabic spoken mainly in the eastern half of Saudi Arabia and the nearby smaller countries such as Bahrain, Qatar and the UAE. It is also spoken by some in Yemen and Oman, two countries that border Saudi Arabia. Fewer people speak it in Yemen than one of the dialects collectively known as Yemeni Arabic.12.0%
Omani Arabs are one of the world's least-reached people groups. Most live in Oman in the Arabian Peninsula, but others live in places like Somalia or Kenya. Kenya is one of the only places Omanis live where there is freedom of religion and a Christian majority.12.0%
Omani Arabs were among the first people in the Middle East to accept Islam. Most Omani belong to the Ibadi sect of Islam, one of the religion's oldest and most traditional branches. Ibadi principles of puritanism (including reverence for the text of the Koran) and idealism have greatly influenced Arabs in neighboring countries as well. Family ties and religious traditions are strong.12.0%

Obstacles categorical free_text

Free-text commentary describing barriers to gospel outreach for various people groups, one paragraph per row. 30 of 50 rows (top_rate 0.6) are empty strings, and the remaining 20 entries are essentially unique, yielding 21 distinct values and entropy_ratio 0.61. Despite the categorical kind, the content is prose passages about Islam, identity, and mission access, not a bounded label set.

Treatment: Treat empty strings as missing and tokenize/embed the prose rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[125]:

saturn.columns["Obstacles"].stats

statvalue
n50
nulls0 (0.0%)
unique21
top_value
top_rate 0.6
cardinality 21
entropy 2.7
entropy_ratio 0.6146
alert: long_tail20 singleton categories
Fig 44.
Top values for Obstacles.
Show data table
Top values for Obstacles (20 unique shown, of 21 total).
valuecountshare
3060.0%
The Air Tuareg are a proud people. It will be hard for them to humbly accept that they are sinners in desperate need of a savior.12.0%
The majority of the Arabized tribes are not being ministered to by mission agencies. Bible portions, audio, and visual Bible resources exist in Arabic, the spoken language of Amri people in Sudan, however, they are not presently available to them.12.0%
Arabs in Tanzania are mostly Muslim. Many follow the teachings of the Koran because they believe it provides hope for a better life after death. In addition to their Islamic beliefs, some of the Arabs in Tanzania also believe in spirits. They try to appea12.0%
There is no spiritual alternative to Islam among the Gulf Arabs in Oman. They know nothing else.12.0%
There are two main obstacles to Arabs in Canada hearing and responding to the gospel. First of all, they identify with Islam as their heart religion, even if they don't practice it. Islam is a key part of their identity. Secondly, Arabs in Canada have large amounts of money, a situation which makes it hard for them to accept new ideas.12.0%
Islam offers a false hope that is presented to Muslims as the final, perfect answer to our spiritual problems. Muslims are taught that Christians follow a corrupted Bible.12.0%
Because of their strong adherence to Islam, Kenyan Arabs have been reluctant to accept the idea of Jesus as savior. They need the JESUS Film and radio broadcasts to be widely available.12.0%
The historical link between Arabs and the Islamic religion is very strong. Today, Arabs in the Philippines are Muslims.12.0%
As the prodigy of the founders of Islam, Gulf Arabs have a strong vested interest in remaining faithful to the Islamic political/religious system. They are among the most resistant people groups in the world. Gulf Arabs are often part of the most conservative forms of Islam like Wahhabism.12.0%
Gulf Arabs are very dedicated to the Islamic religious system that keeps them bound to works righteousness rather than the finished work of Christ.12.0%
Great wealth often lures people into a false sense of security, allowing them to think they don't need God.12.0%
Kuwaiti Arabs are so devoted to the Islamic religious system that they have not allowed themselves to consider the claims of Jesus Christ, the only one who can save mankind from sin and death.12.0%
The needs of the Omanis are mainly spiritual. They have very few chances to hear about the only Savior, and they have remained closed to ideas outside of their sect of Islam. Salvation by the grace of a sin-free Savior is a foreign and threatening concept12.0%
The great wealth of the Qatar Arabs is likely to blind them to their spiritual needs.12.0%
Those who speak Gulf Arabic in Saudi Arabia are usually not lacking in wealth, but they are completely closed to allowing Jesus Christ to transform their lives and give them life to the full. Instead, they are fully committed to Wahhabism, a conservative 12.0%
Islam is their religion and their identity. Most feel threatened by anyone suggesting that they need spiritual answers outside of Islam.12.0%
The Gulf Arabs in Yemen have very few opportunities to cling to the one who is the way, the truth, and the life. Few Christ followers remain in that land where civil war reigns.12.0%
devout, but one must remain loyal to the Islamic religious-political system. Giving allegiance to Jesus Christ would be unacceptable to Omani Arabs, even in Kenya.12.0%
Omani Arabs are averse to change in general, especially when they think it could threaten cultural traditions. The Omani Arabs are strongly committed to Islam. To follow Jesus in this environment would be difficult, as it would break their traditions of “puritanism.”12.0%

ROP2 categorical feature

ROP2 is a categorical column with only 3 distinct codes (C0013, C0219, C0019) across 50 rows and no nulls. The distribution is extremely lopsided: C0013 covers 96% of rows while the other two codes appear once each, yielding an entropy ratio of just 0.178. As a near-constant feature it carries almost no signal.

Treatment: Drop or collapse rare levels into 'other'; near-constant column unlikely to aid modelling.

anthropic:claude-opus-4-7 · confidence high
Out[128]:

saturn.columns["ROP2"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value C0013
top_rate 0.96
cardinality 3
entropy 0.2823
entropy_ratio 0.1781
alert: long_tail2 singleton categories
alert: imbalancetop value is 96.0% of rows
Fig 45.
Top values for ROP2.
Show data table
Top values for ROP2 (3 unique shown, of 3 total).
valuecountshare
C00134896.0%
C021912.0%
C001912.0%

RLG3 numeric feature

RLG3 is a near-constant numeric feature: 49 of 50 rows take the value 6.0 (median, Q1, and Q3 all equal 6.0), with a single outlier at 1.0 producing extreme negative skew (-6.86) and kurtosis (45.02). With only 2 unique values and an IQR of 0, this behaves more like a degenerate flag than a continuous measurement.

Treatment: Drop or binarize as is_not_6; near-zero variance makes it useless for most models.

anthropic:claude-opus-4-7 · confidence high
Out[131]:

saturn.columns["RLG3"].stats

statvalue
n50
nulls0 (0.0%)
unique2
min 1
max 6
mean 5.9
median 6
std 0.7071
q1 6
q3 6
iqr 0
skew -6.857
kurtosis 45.02
n_outliers 1
outlier_rate 0.02
zero_rate 0
alert: high_skewskew=-6.86
Fig 46.
Distribution of RLG3. Vertical dash marks the median.
Show data table
Histogram bins for RLG3 (median: 6.0).
bincount
1 – 1.7141
1.714 – 2.4290
2.429 – 3.1430
3.143 – 3.8570
3.857 – 4.5710
4.571 – 5.2860
5.286 – 649

PercentEvangelical categorical feature

PercentEvangelical appears to be a numeric share of evangelicals stored as strings, with 18 distinct values across 50 rows and no nulls. The distribution is heavily concentrated on small values: '0.000' and '0.500' tie for the mode at 8 occurrences each (16% top_rate), while values up to '2.500' form a long tail. High entropy_ratio (0.88) indicates the mass is spread fairly evenly across the small set of bins despite the long_tail alert.

Treatment: Cast strings to float and treat as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence medium
Out[134]:

saturn.columns["PercentEvangelical"].stats

statvalue
n50
nulls0 (0.0%)
unique18
top_value 0.000
top_rate 0.16
cardinality 18
entropy 3.686
entropy_ratio 0.884
alert: long_tail10 singleton categories
Fig 47.
Top values for PercentEvangelical.
Show data table
Top values for PercentEvangelical (18 unique shown, of 18 total).
valuecountshare
0.000816.0%
0.500816.0%
0.200612.0%
0.100510.0%
0.30048.0%
2.00048.0%
1.00036.0%
0.60024.0%
0.80012.0%
2.50012.0%
5.00012.0%
3.50012.0%
0.02012.0%
0.10512.0%
0.53212.0%
0.08112.0%
0.09012.0%
0.08012.0%

LeastReached categorical feature

Binary Y/N flag, likely indicating whether some 'least reached' status applies to each record. The class is imbalanced toward Y at 76% (38 of 50), with N covering the remaining 12. No nulls, and entropy ratio of 0.80 reflects the moderate skew.

Treatment: Encode as a 0/1 indicator; consider class-imbalance handling if used as a target.

anthropic:claude-opus-4-7 · confidence high
Out[137]:

saturn.columns["LeastReached"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Y
top_rate 0.76
cardinality 2
entropy 0.795
entropy_ratio 0.795
Fig 48.
Top values for LeastReached.
Show data table
Top values for LeastReached (2 unique shown, of 2 total).
valuecountshare
Y3876.0%
N1224.0%

Continent categorical feature

Categorical continent label with 6 distinct values across 50 rows and no nulls. Asia dominates at 38% (19 rows), followed by Africa (14) and Europe (10), while Australia and South America appear only once each. The skewed distribution and high entropy ratio (0.80) suggest reasonable spread but with a clear Asia/Africa concentration.

Treatment: One-hot encode; consider grouping Australia and South America into an 'Other' bucket given single-row counts.

anthropic:claude-opus-4-7 · confidence high
Out[140]:

saturn.columns["Continent"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value Asia
top_rate 0.38
cardinality 6
entropy 2.067
entropy_ratio 0.7996
Fig 49.
Top values for Continent.
Show data table
Top values for Continent (6 unique shown, of 6 total).
valuecountshare
Asia1938.0%
Africa1428.0%
Europe1020.0%
North America510.0%
Australia12.0%
South America12.0%

JPScalePC categorical metadata

JPScalePC is a categorical column that holds the single value "1" across all 50 rows, giving it cardinality 1 and zero entropy. With a top_rate of 1.0 and no nulls, it carries no information and likely represents a constant flag or scale parameter that was never varied in this slice.

Treatment: Drop before modelling; the column is constant.

anthropic:claude-opus-4-7 · confidence high
Out[143]:

saturn.columns["JPScalePC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value 1
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 50.
Top values for JPScalePC.
Show data table
Top values for JPScalePC (1 unique shown, of 1 total).
valuecountshare
150100.0%

JPScaleText categorical label

Categorical label describing a Joshua Project reach scale, with 4 distinct levels across 50 rows and no nulls. The distribution is heavily skewed: 'Unreached' covers 76% of rows, while 'Superficially Reached' appears just once, giving an entropy ratio of 0.54. Class imbalance will dominate any model trained on this field.

Treatment: Treat as ordinal categorical and address class imbalance (e.g., stratify or collapse rare levels) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[146]:

saturn.columns["JPScaleText"].stats

statvalue
n50
nulls0 (0.0%)
unique4
top_value Unreached
top_rate 0.76
cardinality 4
entropy 1.08
entropy_ratio 0.5402
Fig 51.
Top values for JPScaleText.
Show data table
Top values for JPScaleText (4 unique shown, of 4 total).
valuecountshare
Unreached3876.0%
Minimally Reached816.0%
Partially Reached36.0%
Superficially Reached12.0%

SecurityLevel numeric feature

SecurityLevel takes only 3 distinct integer values across 50 rows (min 0, max 2, median 2), so it reads as an ordinal category encoded as a number rather than a continuous measure. The distribution is bimodal-leaning: 42% of rows are zero while the median sits at 2, and the strongly negative kurtosis (-1.90) confirms a flat, multi-peaked shape with no outliers.

Treatment: Treat as an ordinal category (0/1/2) or one-hot encode before modelling rather than using as a continuous numeric.

anthropic:claude-opus-4-7 · confidence high
Out[149]:

saturn.columns["SecurityLevel"].stats

statvalue
n50
nulls0 (0.0%)
unique3
min 0
max 2
mean 1.12
median 2
std 0.9823
q1 0
q3 2
iqr 2
skew -0.2416
kurtosis -1.899
n_outliers 0
outlier_rate 0
zero_rate 0.42
Fig 52.
Distribution of SecurityLevel. Vertical dash marks the median.
Show data table
Histogram bins for SecurityLevel (median: 2.0).
bincount
0 – 0.285721
0.2857 – 0.57140
0.5714 – 0.85710
0.8571 – 1.1432
1.143 – 1.4290
1.429 – 1.7140
1.714 – 227

LRTop100 categorical feature

This is a categorical flag (likely 'is in LR Top 100') that takes the single value 'N' across all 50 rows. With cardinality 1 and entropy 0.0, it carries no information for any downstream model. The 'imbalance' alert here reflects total constancy rather than skew.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[152]:

saturn.columns["LRTop100"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value N
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 53.
Top values for LRTop100.
Show data table
Top values for LRTop100 (1 unique shown, of 1 total).
valuecountshare
N50100.0%

PrimaryReligionPGAC categorical metadata

This column records the primary religion classification (PGAC), but every one of the 50 rows holds the single value "Islam". With cardinality of 1 and entropy of 0.0, it carries no information for modelling or segmentation.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high
Out[155]:

saturn.columns["PrimaryReligionPGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Islam
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 54.
Top values for PrimaryReligionPGAC.
Show data table
Top values for PrimaryReligionPGAC (1 unique shown, of 1 total).
valuecountshare
Islam50100.0%

PCNonReligious numeric feature

Likely a percentage of non-religious population per row, with 50 records and only 6 distinct values. The distribution is dominated by zeros (zero_rate 0.755) with median and IQR both 0, yet a long right tail pushes the max to 10.0 and flags 12 outliers (24.5%). Skew of 1.89 and one missing value (null_rate 0.02) confirm a sparse, heavily right-skewed feature.

Treatment: Treat as sparse; consider a zero/non-zero binary flag plus log1p transform before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[158]:

saturn.columns["PCNonReligious"].stats

statvalue
n50
nulls1 (2.0%)
unique6
min 0
max 10
mean 1.255
median 0
std 2.45
q1 0
q3 0
iqr 0
skew 1.892
kurtosis 2.849
n_outliers 12
outlier_rate 0.2449
zero_rate 0.7551
alert: outliers24.5% rows beyond 1.5 IQR
Fig 55.
Distribution of PCNonReligious. Vertical dash marks the median.
Show data table
Histogram bins for PCNonReligious (median: 0.0).
bincount
0 – 1.42937
1.429 – 2.8571
2.857 – 4.2864
4.286 – 5.7145
5.714 – 7.1430
7.143 – 8.5711
8.571 – 101

PhotoCreditURL categorical metadata

This column holds photo credit URLs (Pixabay and Flickr links), but with only 3 unique values across 50 rows it functions as a coarse source tag rather than a per-record citation. The top URL covers 58.3% of non-null rows (28 of 50), suggesting the same stock photo is reused widely. Null rate is 4%.

Treatment: Drop or retain as a low-cardinality source label; not useful as a modelling feature.

anthropic:claude-opus-4-7 · confidence high
Out[161]:

saturn.columns["PhotoCreditURL"].stats

statvalue
n50
nulls2 (4.0%)
unique3
top_value https://pixabay.com/photos/people-students-walk-street-muslim-6284192/
top_rate 0.5833
cardinality 3
entropy 1.384
entropy_ratio 0.8735
Fig 56.
Top values for PhotoCreditURL.
Show data table
Top values for PhotoCreditURL (3 unique shown, of 3 total).
valuecountshare
https://pixabay.com/photos/people-students-walk-street-muslim-6284192/2856.0%
https://pixabay.com/photos/portrait-man-face-male-person-4695272/1224.0%
https://flickr.com/photos/charlesfred/5150706650816.0%

PhotoCreativeCommons categorical feature

Binary Y/N flag indicating whether a photo carries a Creative Commons license, with no nulls across 50 rows. The distribution is heavily skewed toward 'N' at 84% (42 of 50), leaving only 8 records flagged 'Y'.

Treatment: Encode as a boolean indicator; note class imbalance if used as a predictor.

anthropic:claude-opus-4-7 · confidence high
Out[164]:

saturn.columns["PhotoCreativeCommons"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 57.
Top values for PhotoCreativeCommons.
Show data table
Top values for PhotoCreativeCommons (2 unique shown, of 2 total).
valuecountshare
N4284.0%
Y816.0%

PrayForPG categorical free_text

Free-text prayer prompts for unreached people groups, with 60% of the 50 rows being empty strings and the remaining 20 entries each unique multi-sentence paragraphs. The dominance of the blank value (top_rate 0.60) coexists with very high textual diversity among non-blank rows, hence the long_tail alert and entropy_ratio of 0.61. No nulls are recorded, but the empty string is functioning as a missing-value sentinel.

Treatment: Treat empty strings as missing and tokenize/embed the remaining prose if used as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[167]:

saturn.columns["PrayForPG"].stats

statvalue
n50
nulls0 (0.0%)
unique21
top_value
top_rate 0.6
cardinality 21
entropy 2.7
entropy_ratio 0.6146
alert: long_tail20 singleton categories
Fig 58.
Top values for PrayForPG.
Show data table
Top values for PrayForPG (20 unique shown, of 21 total).
valuecountshare
3060.0%
Pray the festivals would flourish and become a means for outsiders to take Christian materials to the Air Tuareg people in Niger. Pray for the Lord to speak to Air Tuareg decision makers through dreams and visions. Pray for spiritual humility and a desire to come close to the Holy One.12.0%
Ask the Lord to call people who are willing to go to Sudan and share Christ with the Amri. Pray that the Christians of South Sudan will be compelled to take the gospel to the unreached peoples. Ask the Lord to raise up Christian medical teams to work among the Arabized tribes. Pray that Amri communities and families will be transformed with the gospel, growing roots downward and planting strong churches that will plant other churches.12.0%
Pray for the Lord to thrust out workers to the Arabs in Tanzania. Pray for Arab persons of peace to welcome Christ's ambassadors. Pray for Arabs in Tanzania to understand they can never gain God's favor apart from the finished work of Jesus Christ.12.0%
Ask the Holy Spirit to call people who are willing to share the love of Christ with Gulf Arabs. Ask the Lord of the harvest to open the doors of the region to the preaching of the gospel. Ask God to raise up prayer teams who will begin breaking up the soil through worship and intercession. Pray that strong local churches will be raised up among Gulf Arabs.12.0%
Pray that despite their wealth and relatively easy life that Arabs will knock, seek, and find the Savior. Pray for strong friendships to develop between Muslims and Canadian believers that will lead to opportunities to share Christ. Pray that Muslims will use the freedom of Canada to find out about Jesus.12.0%
Pray for Arab leaders to have dreams and visions of Jesus Christ, the only Savior. Pray for a Disciple-Making movement to emerge among Arabs in Hungary.12.0%
Pray for the Lord to thrust out workers to the Kenyan Arabs and for persons of peace to welcome them. Pray for a movement to Christ among the Arabs of Kenya this decade.12.0%
Pray for spiritual discernment and hunger among the Muslim Arabs in the Philippines. Pray for the Lord to send out Holy Spirit anointed workers to them. Pray for Arabs to have dreams of Jesus that will lead them to the cross. Pray for a disciple making movement to flourish among Arabs in the Philippines.12.0%
Ask God to raise up faithful prayer teams who will begin breaking up the soil through worship and intercession. Ask the Holy Spirit to call people who are willing to go and share the love of Christ with the Gulf Arabs in Egypt. Pray that strong local churches would be raised up among the Gulf Arabs in Egypt.12.0%
Pray for God to send dreams and revelations of himself to the nomadic Gulf Arabs so that their hearts will be open and ready to accept the good news when they hear it. Pray for God to call special people to this challenge to reach these people with the good news. Pray that Tehran would be the home of a Disciple Making Movement for Arabs.12.0%
Please pray wealthy Gulf Arabs will be given an awareness of how much they need forgiveness of sin, and will be led to the humility needed to accept the free gift of eternal life found through faith in Christ.12.0%
Pray that this will be the decade where there is an unstoppable movement to Christ among Kuwait's Muslims. Pray for Kuwaiti Muslims to understand that they cannot be saved apart from a sin-free Savior. Pray for the Holy Spirit to anoint and send out Christ's ambassadors to Kuwaitis.12.0%
Pray for open hearts and minds to the ways of Christ among the Gulf speaking Arabs in Oman. Pray for the Lord to raise up persons of peace to welcome Christ's ambassadors to Omani families. Pray for the Lord to send His appointed workers to take Christ to this highly unreached people group. Pray that there will be an unstoppable movement to Christ among every people group in Oman.12.0%
Pray the enormously wealthy Qatar Arabs will use their wealth to benefit many people, not merely themselves. Pray they will understand that riches are dangerous: they can prevent people from seeing their spiritual need. Pray they will be given the gift of perceiving their spiritual needs.12.0%
Pray for a Disciple Making Movement to flourish among these staunch Muslims. Pray for open hearts and minds to the life-changing ways of Jesus. Pray for believers to find success in getting the JESUS Film and other materials to Gulf speaking Saudis.12.0%
Pray for the Lord to thrust out workers who love the Lord and the people of Oman. Pray that Gulf Arabs will read the Bible and give their lives to Jesus Christ. Pray that God will give them dreams and visions leading them to salvation. Pray for Omani Arabs to embrace the only savior and share him with their friends and family in Oman.12.0%
Ask the Holy Spirit to call people who are willing to go and share the love of Christ with Gulf Arabs. Ask God to raise up prayer teams who will begin breaking up the soil in Yemen through worship and intercession. Pray for a strong church planting movement among Gulf Arabs. Pray for leaders among the Gulf Arabs to welcome Christ's ambassadors into their communities so they can share the riches of his blessings.12.0%
Ask God to give the few known Omani believers living in Kenya opportunities to share the gospel with their own people. Ask God to soften the hearts of the Omani Arab to the gospel as it is presented to them. Ask God to raise up prayer teams who will begin breaking up the soil through intercession. Pray that strong local churches will be raised up among the Omani Arab in Kenya that will lead to Discipleship Movements.12.0%
Pray that a strong movement to Jesus would bring whole Omani families and communities into a rich experience of God's blessings. Pray for the Lord to send dreams and visions to Omani Arab family leaders, opening entire clans to the grace of God. Pray for the few followers of Christ among the Omani Arabs to find each other and fellowship together. Pray they would be faithful witnesses to the goodness of Christ to their family and friends.12.0%

PeopleGroupPhotoURL categorical metadata

This column holds URLs to people-group profile photos hosted on joshuaproject.net, one per row with no nulls. Despite 50 rows, only 5 distinct images appear, and a single photo (p10375.jpg) covers 56% of records while the top three URLs account for 48 of 50 — suggesting many rows share the same people-group identity rather than being unique entities.

Treatment: Drop for modelling; retain as a display asset or join key to a people-group lookup.

anthropic:claude-opus-4-7 · confidence high
Out[170]:

saturn.columns["PeopleGroupPhotoURL"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value https://joshuaproject.net/assets/media/profiles/photos/p10375.jpg
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 59.
Top values for PeopleGroupPhotoURL.
Show data table
Top values for PeopleGroupPhotoURL (5 unique shown, of 5 total).
valuecountshare
https://joshuaproject.net/assets/media/profiles/photos/p10375.jpg2856.0%
https://joshuaproject.net/assets/media/profiles/photos/p10376.jpg1224.0%
https://joshuaproject.net/assets/media/profiles/photos/p10378.jpg816.0%
https://joshuaproject.net/assets/media/profiles/photos/p10208.jpg12.0%
https://joshuaproject.net/assets/media/profiles/photos/p10301.jpg12.0%

ROG2 categorical feature

ROG2 looks like a regional grouping code with 6 categories (ASI, AFR, EUR, NAR, AUS, LAM), fully populated across all 50 rows. Distribution is moderately concentrated — ASI leads at 38% (19/50) and AFR follows at 14, while AUS and LAM appear only once each. Entropy ratio of 0.80 indicates fairly even spread among the top regions but with thin tails.

Treatment: One-hot encode, but consider collapsing AUS and LAM into an 'Other' bucket given single-row support.

anthropic:claude-opus-4-7 · confidence high
Out[173]:

saturn.columns["ROG2"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value ASI
top_rate 0.38
cardinality 6
entropy 2.067
entropy_ratio 0.7996
Fig 60.
Top values for ROG2.
Show data table
Top values for ROG2 (6 unique shown, of 6 total).
valuecountshare
ASI1938.0%
AFR1428.0%
EUR1020.0%
NAR510.0%
AUS12.0%
LAM12.0%

PhotoCCVersionText categorical metadata

This is a categorical license-text field for photo Creative Commons versioning, with only 2 distinct values across 50 rows. 84% of entries are empty strings (42/50), and the remaining 8 carry 'CC BY-NC-SA 2.0' — there are no nulls, just blanks standing in for missing licenses. Entropy ratio of 0.63 reflects this binary, heavily imbalanced split.

Treatment: Recode empty strings to missing and collapse to a binary has_license flag.

anthropic:claude-opus-4-7 · confidence high
Out[176]:

saturn.columns["PhotoCCVersionText"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 61.
Top values for PhotoCCVersionText.
Show data table
Top values for PhotoCCVersionText (2 unique shown, of 2 total).
valuecountshare
4284.0%
CC BY-NC-SA 2.0816.0%

Longitude numeric feature

Geographic longitude coordinates spanning -118.3 to 151.2, covering most of the globe's east-west range with all 50 values unique. The distribution is left-skewed (skew -0.77) with a median of 39.4 sitting well above the mean of 27.5, and 7 outliers (14%) flag locations far from the central cluster around Europe/Asia.

Treatment: Pair with Latitude as a 2D geospatial feature; avoid scaling independently.

anthropic:claude-opus-4-7 · confidence high
Out[179]:

saturn.columns["Longitude"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min -118.3
max 151.2
mean 27.55
median 39.45
std 52.08
q1 10.72
q3 51.19
iqr 40.47
skew -0.7666
kurtosis 1.346
n_outliers 7
outlier_rate 0.14
zero_rate 0
alert: outliers14.0% rows beyond 1.5 IQR
Fig 62.
Distribution of Longitude. Vertical dash marks the median.
Show data table
Histogram bins for Longitude (median: 39.44568851784755).
bincount
-118.3 – -79.813
-79.81 – -41.313
-41.31 – -2.8062
-2.806 – 35.716
35.7 – 74.221
74.2 – 112.73
112.7 – 151.22

JPScaleImageURL categorical feature

This column holds URLs to Joshua Project gauge images (gauge-1 through gauge-4), almost certainly a visual encoding of an ordinal progress/status scale. Distribution is heavily skewed: 76% point to gauge-1.png, while gauge-3.png appears only once across 50 rows. With just 4 unique values and no nulls, it's a low-cardinality categorical masquerading as a URL.

Treatment: Extract the gauge digit (1-4) and treat as an ordinal feature rather than keeping the URL string.

anthropic:claude-opus-4-7 · confidence high
Out[182]:

saturn.columns["JPScaleImageURL"].stats

statvalue
n50
nulls0 (0.0%)
unique4
top_value https://joshuaproject.net/assets/img/gauge/gauge-1.png
top_rate 0.76
cardinality 4
entropy 1.08
entropy_ratio 0.5402
Fig 63.
Top values for JPScaleImageURL.
Show data table
Top values for JPScaleImageURL (4 unique shown, of 4 total).
valuecountshare
https://joshuaproject.net/assets/img/gauge/gauge-1.png3876.0%
https://joshuaproject.net/assets/img/gauge/gauge-2.png816.0%
https://joshuaproject.net/assets/img/gauge/gauge-4.png36.0%
https://joshuaproject.net/assets/img/gauge/gauge-3.png12.0%

OfficialLang categorical feature

This is a categorical column naming an official language, with 21 distinct values across 50 rows and no nulls. The distribution is heavily skewed: 'Arabic, Standard' alone covers 36% of rows, followed by English (8) and French (5), while a long tail of languages like Swahili, Ukrainian, Sinhala and Bulgarian appear only once. Entropy ratio of 0.77 confirms concentration at the top despite the wide vocabulary.

Treatment: Group rare languages into an 'Other' bucket before one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[185]:

saturn.columns["OfficialLang"].stats

statvalue
n50
nulls0 (0.0%)
unique21
top_value Arabic, Standard
top_rate 0.36
cardinality 21
entropy 3.39
entropy_ratio 0.7719
alert: long_tail17 singleton categories
Fig 64.
Top values for OfficialLang.
Show data table
Top values for OfficialLang (20 unique shown, of 21 total).
valuecountshare
Arabic, Standard1836.0%
English816.0%
French510.0%
Somali24.0%
Swahili12.0%
Ukrainian12.0%
German, Standard12.0%
Portuguese12.0%
Bulgarian12.0%
Sinhala12.0%
Spanish12.0%
Hungarian12.0%
Indonesian12.0%
Luxembourgish12.0%
Macedonian12.0%
Maltese12.0%
Malay12.0%
Urdu12.0%
Tagalog12.0%
Persian, Iranian12.0%

PhotoPermission categorical feature

Binary Y/N flag indicating whether photo permission has been granted, with no missing values across 50 rows. The distribution is severely imbalanced: 'N' covers 49 of 50 rows (top_rate 0.98) with only a single 'Y', yielding entropy_ratio of 0.14. With one positive case, this column carries almost no discriminative signal in the current sample.

Treatment: Drop or hold aside as a near-constant flag until more 'Y' cases accumulate.

anthropic:claude-opus-4-7 · confidence high
Out[188]:

saturn.columns["PhotoPermission"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.98
cardinality 2
entropy 0.1414
entropy_ratio 0.1414
alert: imbalancetop value is 98.0% of rows
Fig 65.
Top values for PhotoPermission.
Show data table
Top values for PhotoPermission (2 unique shown, of 2 total).
valuecountshare
N4998.0%
Y12.0%

PCHinduism numeric feature

PCHinduism appears to be a per-row count or percentage related to Hinduism, with 95.9% of values being zero and only 3 distinct values across 50 rows. The distribution is extremely sparse and right-skewed (skew 5.96, kurtosis 35.36), with a max of 6.0 standing out as an outlier against a median and IQR of 0. Effectively a near-constant feature with rare nonzero spikes.

Treatment: Binarize to zero/nonzero or drop, given 95.9% zeros and only 3 unique values.

anthropic:claude-opus-4-7 · confidence high
Out[191]:

saturn.columns["PCHinduism"].stats

statvalue
n50
nulls1 (2.0%)
unique3
min 0
max 6
mean 0.1633
median 0
std 0.8978
q1 0
q3 0
iqr 0
skew 5.957
kurtosis 35.36
n_outliers 2
outlier_rate 0.04082
zero_rate 0.9592
alert: high_skewskew=+5.96
Fig 66.
Distribution of PCHinduism. Vertical dash marks the median.
Show data table
Histogram bins for PCHinduism (median: 0.0).
bincount
0 – 0.857147
0.8571 – 1.7140
1.714 – 2.5711
2.571 – 3.4290
3.429 – 4.2860
4.286 – 5.1430
5.143 – 61

PeopleID3 numeric foreign_key

PeopleID3 is a numeric identifier-like field with only 5 unique values across 50 rows, clustered tightly around 10375-10376 (IQR of 1.0). The distribution is severely left-skewed (skew -5.59, kurtosis 31.2) because at least one value drops to 10208 while the bulk sits near the max of 10378, producing a 20% outlier rate. Despite being typed as numeric, the near-constant range and low cardinality suggest this behaves as a categorical key rather than a measurement.

Treatment: Treat as a categorical identifier; do not use as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[194]:

saturn.columns["PeopleID3"].stats

statvalue
n50
nulls0 (0.0%)
unique5
min 10,208
max 10,378
mean 1.037e+04
median 10,375
std 25.8
q1 10,375
q3 10,376
iqr 1
skew -5.593
kurtosis 31.24
n_outliers 10
outlier_rate 0.2
zero_rate 0
alert: high_skewskew=-5.59
alert: outliers20.0% rows beyond 1.5 IQR
Fig 67.
Distribution of PeopleID3. Vertical dash marks the median.
Show data table
Histogram bins for PeopleID3 (median: 10375.0).
bincount
1.021e+04 – 1.023e+041
1.023e+04 – 1.026e+040
1.026e+04 – 1.028e+040
1.028e+04 – 1.031e+041
1.031e+04 – 1.033e+040
1.033e+04 – 1.035e+040
1.035e+04 – 1.038e+0448

PeopleID1 numeric other

PeopleID1 is flagged as constant: every one of the 50 rows holds the value 10, with zero variance and a single unique value. Despite the 'ID' name, it carries no identifying information and cannot distinguish records. There is no null or outlier activity to interpret.

Treatment: Drop, constant column with no signal.

anthropic:claude-opus-4-7 · confidence high
Out[197]:

saturn.columns["PeopleID1"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 10
max 10
mean 10
median 10
std 0
q1 10
q3 10
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 68.
Distribution of PeopleID1. Vertical dash marks the median.
Show data table
Histogram bins for PeopleID1 (median: 10.0).
bincount
9.5 – 9.6430
9.643 – 9.7860
9.786 – 9.9290
9.929 – 10.0750
10.07 – 10.210
10.21 – 10.360
10.36 – 10.50

SpeakNationalLang unknown feature

Saturn skipped this column, so no profiling stats were computed beyond row count (50) and a null rate of 0.0. The name 'SpeakNationalLang' suggests a binary or categorical indicator of whether a respondent speaks the national language, but kind is 'unknown' and n_unique is missing, so the actual value distribution cannot be confirmed from this evidence.

Treatment: Re-profile or manually inspect to determine dtype before use; if binary, encode as 0/1.

anthropic:claude-opus-4-7 · confidence low
Out[200]:

saturn.columns["SpeakNationalLang"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

PortionsYear categorical feature

PortionsYear is a low-cardinality categorical with 9 unique values across 50 rows and a 16% null rate. Most entries are year ranges (e.g., "1939-2021" at 42.9% and "2009-2024"), but 4 rows contain the string "Yes", indicating a mixed/dirty schema where a boolean answer was recorded in a date-range field. Entropy ratio of 0.70 and a long-tail alert reflect many singleton ranges alongside two dominant values.

Treatment: Clean the type mismatch (separate "Yes" from year ranges), then parse ranges into start/end year numerics before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[202]:

saturn.columns["PortionsYear"].stats

statvalue
n50
nulls8 (16.0%)
unique9
top_value 1939-2021
top_rate 0.4286
cardinality 9
entropy 2.222
entropy_ratio 0.7009
alert: long_tail5 singleton categories
Fig 69.
Top values for PortionsYear.
Show data table
Top values for PortionsYear (9 unique shown, of 9 total).
valuecountshare
1939-20211836.0%
2009-20241326.0%
Yes48.0%
1868-196824.0%
1934-199812.0%
1927-196412.0%
1530-199512.0%
1902-195212.0%
1905-200712.0%

PrimaryReligionPC categorical feature

This column records the primary religion of a people-cluster (PC), but every one of the 50 rows holds the value "Islam". With cardinality 1 and entropy 0, it carries no information for distinguishing records in this slice.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[205]:

saturn.columns["PrimaryReligionPC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Islam
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 70.
Top values for PrimaryReligionPC.
Show data table
Top values for PrimaryReligionPC (1 unique shown, of 1 total).
valuecountshare
Islam50100.0%

PCUnknown numeric feature

PCUnknown is a numeric column that is effectively constant: every one of the 49 non-null observations is exactly 0, and 2% of rows are null. There is no variance, no spread, and no outliers, so the column carries no information as currently populated.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high
Out[208]:

saturn.columns["PCUnknown"].stats

statvalue
n50
nulls1 (2.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 71.
Distribution of PCUnknown. Vertical dash marks the median.
Show data table
Histogram bins for PCUnknown (median: 0.0).
bincount
-0.5 – -0.35710
-0.3571 – -0.21430
-0.2143 – -0.071430
-0.07143 – 0.0714349
0.07143 – 0.21430
0.2143 – 0.35710
0.3571 – 0.50

ProfileTextExists categorical feature

Binary Y/N flag indicating whether a profile text exists, with no nulls across 50 rows. The distribution is heavily imbalanced: 'Y' covers 45 of 50 (top_rate 0.9) versus only 5 'N', giving an entropy ratio of 0.47.

Treatment: Encode as a 0/1 boolean; watch for low signal given the 90/10 imbalance.

anthropic:claude-opus-4-7 · confidence high
Out[211]:

saturn.columns["ProfileTextExists"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Y
top_rate 0.9
cardinality 2
entropy 0.469
entropy_ratio 0.469
Fig 72.
Top values for ProfileTextExists.
Show data table
Top values for ProfileTextExists (2 unique shown, of 2 total).
valuecountshare
Y4590.0%
N510.0%

PCOtherSmall numeric feature

PCOtherSmall is a numeric count-like feature that is essentially zero for nearly everyone — 93.9% of the 49 non-null rows are 0 and only 3 distinct values appear. The distribution is extremely right-skewed (skew 6.41, kurtosis 40.46) with a max of 7 driving 3 outliers (6.1% outlier rate), so almost all signal lives in a tiny tail.

Treatment: Binarize to zero/non-zero or drop, since variance is concentrated in a handful of outliers.

anthropic:claude-opus-4-7 · confidence high
Out[214]:

saturn.columns["PCOtherSmall"].stats

statvalue
n50
nulls1 (2.0%)
unique3
min 0
max 7
mean 0.1837
median 0
std 1.014
q1 0
q3 0
iqr 0
skew 6.411
kurtosis 40.46
n_outliers 3
outlier_rate 0.06122
zero_rate 0.9388
alert: high_skewskew=+6.41
alert: outliers6.1% rows beyond 1.5 IQR
Fig 73.
Distribution of PCOtherSmall. Vertical dash marks the median.
Show data table
Histogram bins for PCOtherSmall (median: 0.0).
bincount
0 – 146
1 – 22
2 – 30
3 – 40
4 – 50
5 – 60
6 – 71

BibleStatus numeric feature

BibleStatus is a small-range integer code with only 4 distinct values spanning 2 to 5, mean 3.5 and median 4. The tight IQR (3 to 4) and absence of zeros or nulls suggest it's an ordinal status flag rather than a true numeric measurement. Mild left skew (-0.38) indicates most records sit at the higher end of the scale.

Treatment: Treat as an ordinal categorical and one-hot or ordinal-encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[217]:

saturn.columns["BibleStatus"].stats

statvalue
n50
nulls0 (0.0%)
unique4
min 2
max 5
mean 3.5
median 4
std 0.8631
q1 3
q3 4
iqr 1
skew -0.3848
kurtosis -0.6309
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 74.
Distribution of BibleStatus. Vertical dash marks the median.
Show data table
Histogram bins for BibleStatus (median: 4.0).
bincount
2 – 2.4298
2.429 – 2.8570
2.857 – 3.28613
3.286 – 3.7140
3.714 – 4.14325
4.143 – 4.5710
4.571 – 54

Frontier categorical feature

Frontier is a binary Y/N flag with no nulls across 50 rows. The distribution is heavily skewed toward 'N' at 84% (42 of 50), leaving only 8 'Y' cases, which limits its discriminative power.

Treatment: Encode as a 0/1 indicator; watch for class imbalance when using as a predictor.

anthropic:claude-opus-4-7 · confidence high
Out[220]:

saturn.columns["Frontier"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 75.
Top values for Frontier.
Show data table
Top values for Frontier (2 unique shown, of 2 total).
valuecountshare
N4284.0%
Y816.0%

MapAddress categorical metadata

MapAddress holds filenames of map images (e.g. 'm00007.png', 'm10375_ke.png'), with country/region suffixes appearing on some variants. Nearly half the rows (24/50) are empty strings and a single file 'm00007.png' covers 9 more, leaving a long tail of 15 other filenames at 1-2 occurrences each. Cardinality is 17 unique values with entropy ratio 0.68, so the column is dominated by the blank and one hero image.

Treatment: Treat empty string as missing and group the long tail before any categorical encoding.

anthropic:claude-opus-4-7 · confidence high
Out[223]:

saturn.columns["MapAddress"].stats

statvalue
n50
nulls0 (0.0%)
unique17
top_value
top_rate 0.48
cardinality 17
entropy 2.792
entropy_ratio 0.6832
alert: long_tail13 singleton categories
Fig 76.
Top values for MapAddress.
Show data table
Top values for MapAddress (17 unique shown, of 17 total).
valuecountshare
2448.0%
m00007.png918.0%
m10375.png24.0%
m00307.png24.0%
m10208_ng.png12.0%
m00005.png12.0%
m10375_tz.png12.0%
m10376_ae.png12.0%
m10375_ke.png12.0%
m10375_rp.png12.0%
m10376_ir.png12.0%
m10376_us.png12.0%
m10378_ae.png12.0%
m10378_ku.png12.0%
m10378_mu.png12.0%
m10378_sa.png12.0%
m10378_ym.png12.0%

PeopleID3ROG3 categorical identifier

PeopleID3ROG3 looks like a per-row identifier: every one of the 50 rows holds a distinct alphanumeric code (5 digits followed by 2 letters, e.g. '10208NG'), giving cardinality 50 and entropy_ratio 1.0. Top_rate is 0.02 because no value repeats, and there are no nulls. The long_tail alert simply reflects that uniqueness rather than any skew.

Treatment: drop from modelling features; retain as a join/lookup key.

anthropic:claude-opus-4-7 · confidence high
Out[226]:

saturn.columns["PeopleID3ROG3"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value 10208NG
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 77.
Top values for PeopleID3ROG3.
Show data table
Top values for PeopleID3ROG3 (20 unique shown, of 50 total).
valuecountshare
10208NG12.0%
10301SU12.0%
10375TZ12.0%
10375UP12.0%
10376AE12.0%
10376CA12.0%
10375AG12.0%
10375AS12.0%
10375AU12.0%
10375BA12.0%
10375BR12.0%
10375BU12.0%
10375CA12.0%
10375CE12.0%
10375CG12.0%
10375CU12.0%
10375EG12.0%
10375EI12.0%
10375GB12.0%
10375HA12.0%

ROP3 numeric feature

ROP3 is a near-constant numeric reading clustered tightly around 100425 (median) with an IQR of just 2, yet 20% of values are flagged as outliers and the minimum drops to 100161 versus a max of 100431. The extreme negative skew (-5.53) and kurtosis above 30 indicate a heavy left tail dragging the mean (100418.74) below the median. With only 5 unique values across 50 rows, this looks like a sensor or pressure-style measurement that is mostly stuck at one level with occasional sharp dips.

Treatment: Investigate the low-tail outliers and consider centering (subtract median) or binning before modelling, given the near-constant distribution.

anthropic:claude-opus-4-7 · confidence medium
Out[229]:

saturn.columns["ROP3"].stats

statvalue
n50
nulls0 (0.0%)
unique5
min 100,161
max 100,431
mean 1.004e+05
median 100,425
std 41.09
q1 100,425
q3 100,427
iqr 2
skew -5.53
kurtosis 30.52
n_outliers 10
outlier_rate 0.2
zero_rate 0
alert: high_skewskew=-5.53
alert: outliers20.0% rows beyond 1.5 IQR
Fig 78.
Distribution of ROP3. Vertical dash marks the median.
Show data table
Histogram bins for ROP3 (median: 100425.0).
bincount
1.002e+05 – 1.002e+051
1.002e+05 – 1.002e+050
1.002e+05 – 1.003e+050
1.003e+05 – 1.003e+051
1.003e+05 – 1.004e+050
1.004e+05 – 1.004e+050
1.004e+05 – 1.004e+0548

PrimaryLanguageDialect categorical metadata

PrimaryLanguageDialect is a categorical field that is effectively empty: 98% of the 50 rows are null, and the single non-null value is the string "Air" — which doesn't read like a language or dialect at all. With only 1 unique value, entropy is 0, so the column carries no signal in this sample and the lone value looks suspect.

Treatment: Drop from modelling; investigate the lone "Air" value as a possible data-entry error.

anthropic:claude-opus-4-7 · confidence high
Out[232]:

saturn.columns["PrimaryLanguageDialect"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Air
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 79.
Top values for PrimaryLanguageDialect.
Show data table
Top values for PrimaryLanguageDialect (1 unique shown, of 1 total).
valuecountshare
Air12.0%

JPScale numeric feature

JPScale is a small-integer ordinal feature with only 4 distinct values ranging from 1 to 4, where the bulk of records sit at 1 (median, Q1 and Q3 all equal 1.0, IQR 0.0). The distribution is heavily right-skewed (skew 2.29, kurtosis 4.44) and 24% of rows (12 of 50) flag as outliers simply because anything above 1 deviates from the dominant value. Mean 1.38 against std 0.81 confirms most mass is at the floor with a long thin tail toward 4.

Treatment: Treat as an ordinal/categorical scale (1-4) rather than continuous; one-hot or bin the rare 2-4 levels.

anthropic:claude-opus-4-7 · confidence high
Out[235]:

saturn.columns["JPScale"].stats

statvalue
n50
nulls0 (0.0%)
unique4
min 1
max 4
mean 1.38
median 1
std 0.8053
q1 1
q3 1
iqr 0
skew 2.29
kurtosis 4.437
n_outliers 12
outlier_rate 0.24
zero_rate 0
alert: high_skewskew=+2.29
alert: outliers24.0% rows beyond 1.5 IQR
Fig 80.
Distribution of JPScale. Vertical dash marks the median.
Show data table
Histogram bins for JPScale (median: 1.0).
bincount
1 – 1.42938
1.429 – 1.8570
1.857 – 2.2868
2.286 – 2.7140
2.714 – 3.1431
3.143 – 3.5710
3.571 – 43

HasAudioRecordings categorical metadata

This column is a flag indicating whether audio recordings exist, but every one of the 50 rows holds the value "Y". Cardinality is 1 and entropy is 0, so it carries no information for any downstream task.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[238]:

saturn.columns["HasAudioRecordings"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Y
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 81.
Top values for HasAudioRecordings.
Show data table
Top values for HasAudioRecordings (1 unique shown, of 1 total).
valuecountshare
Y50100.0%

PCBuddhism numeric feature

PCBuddhism appears to be a numeric feature (likely a per-capita or principal-component-style indicator for Buddhism) that carries no information in this sample: every one of the 49 non-null values is exactly 0.0, with a 2% null rate. The constant alert and zero_rate of 1.0 confirm there is no variance to model.

Treatment: Drop; constant column with no predictive signal.

anthropic:claude-opus-4-7 · confidence high
Out[241]:

saturn.columns["PCBuddhism"].stats

statvalue
n50
nulls1 (2.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 82.
Distribution of PCBuddhism. Vertical dash marks the median.
Show data table
Histogram bins for PCBuddhism (median: 0.0).
bincount
-0.5 – -0.35710
-0.3571 – -0.21430
-0.2143 – -0.071430
-0.07143 – 0.0714349
0.07143 – 0.21430
0.2143 – 0.35710
0.3571 – 0.50

PeopNameAcrossCountries categorical feature

This column appears to label ethnic or people-group identities across countries, with only 5 unique values across 50 rows and no nulls. The distribution is heavily skewed toward 'Arab' (28 of 50, top_rate 0.56), with 'Arab, Arabic Gulf Spoken' and 'Arab, Omani' as the next most common, while 'Tuareg, Air' and 'Amri' appear just once each. Entropy ratio of 0.69 confirms moderate concentration rather than uniform spread.

Treatment: Group rare categories or one-hot encode after consolidating the long tail.

anthropic:claude-opus-4-7 · confidence high
Out[244]:

saturn.columns["PeopNameAcrossCountries"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value Arab
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 83.
Top values for PeopNameAcrossCountries.
Show data table
Top values for PeopNameAcrossCountries (5 unique shown, of 5 total).
valuecountshare
Arab2856.0%
Arab, Arabic Gulf Spoken1224.0%
Arab, Omani816.0%
Tuareg, Air12.0%
Amri12.0%

PhotoCCVersionURL categorical metadata

This column appears to hold a Creative Commons license URL associated with a photo, but it is overwhelmingly empty: 42 of 50 rows are blank strings and only 8 carry the single license value 'https://creativecommons.org/licenses/by-nc-sa/2.0/'. With just 2 unique values and a top_rate of 0.84, it functions more as a binary licensed/unlicensed flag than a true URL field. Note that nulls are reported as 0.0 because the missing entries are empty strings rather than true nulls.

Treatment: Convert to a boolean has_license flag rather than treating as a URL.

anthropic:claude-opus-4-7 · confidence high
Out[247]:

saturn.columns["PhotoCCVersionURL"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 84.
Top values for PhotoCCVersionURL.
Show data table
Top values for PhotoCCVersionURL (2 unique shown, of 2 total).
valuecountshare
4284.0%
https://creativecommons.org/licenses/by-nc-sa/2.0/816.0%

MapCCVersionText categorical metadata

MapCCVersionText is a categorical column that contains a single value — the empty string — across all 50 rows. Cardinality is 1, entropy is 0, and null_rate is 0.0, so the field is technically populated but carries no information.

Treatment: Drop; constant empty value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[250]:

saturn.columns["MapCCVersionText"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 85.
Top values for MapCCVersionText.
Show data table
Top values for MapCCVersionText (1 unique shown, of 1 total).
valuecountshare
50100.0%

PercentChristianPC categorical feature

This appears to be a per-capita or principal-component-style 'PercentChristian' score, stored as strings with only 3 distinct values across 50 rows. It is overwhelmingly degenerate: the value '0.997' covers 48/50 rows (top_rate 0.96), with '0.116' and '1.344' each appearing once, yielding an entropy ratio of just 0.178. The two outlier values look anomalous relative to the dominant 0.997 and may be data-entry artefacts or genuine extremes worth investigating.

Treatment: Drop or treat as near-constant; inspect the two non-modal rows as potential outliers before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[253]:

saturn.columns["PercentChristianPC"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value 0.997
top_rate 0.96
cardinality 3
entropy 0.2823
entropy_ratio 0.1781
alert: long_tail2 singleton categories
alert: imbalancetop value is 96.0% of rows
Fig 86.
Top values for PercentChristianPC.
Show data table
Top values for PercentChristianPC (3 unique shown, of 3 total).
valuecountshare
0.9974896.0%
0.11612.0%
1.34412.0%

Nomadic categorical feature

Binary Y/N flag indicating whether a record is nomadic, with no nulls across 50 rows. The distribution is heavily imbalanced: 'N' covers 45 of 50 (top_rate 0.9) while 'Y' appears only 5 times, yielding low entropy_ratio of 0.47.

Treatment: Encode as a 0/1 indicator; watch for class imbalance given only 5 positives.

anthropic:claude-opus-4-7 · confidence high
Out[256]:

saturn.columns["Nomadic"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.9
cardinality 2
entropy 0.469
entropy_ratio 0.469
Fig 87.
Top values for Nomadic.
Show data table
Top values for Nomadic (2 unique shown, of 2 total).
valuecountshare
N4590.0%
Y510.0%

PrayForChurch categorical free_text

Free-text prayer prompts about Christian outreach to Arab/Muslim people groups, stored as a categorical but functionally a short-document field. 42 of 50 rows (top_rate 0.84) are empty strings and the remaining 8 are all unique long sentences, giving 9 distinct values and an entropy ratio of 0.35. The long_tail alert reflects this empty-vs-unique split rather than meaningful category structure.

Treatment: Treat empty strings as missing and tokenize/embed the remaining prose rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[259]:

saturn.columns["PrayForChurch"].stats

statvalue
n50
nulls0 (0.0%)
unique9
top_value
top_rate 0.84
cardinality 9
entropy 1.114
entropy_ratio 0.3515
alert: long_tail8 singleton categories
Fig 88.
Top values for PrayForChurch.
Show data table
Top values for PrayForChurch (9 unique shown, of 9 total).
valuecountshare
4284.0%
There are a couple of Arab believers in Canada. Pray that they will become stronger in their faith and in the fruit of the Holy Spirit so they can be prepared to give an answer when Muslims question them.12.0%
Pray that the followers of Christ among the Arabs in Hungary will exhibit Christ-like behavior that will draw Muslims to the Savior.12.0%
The few Christian believers from this background need to allow the Holy Spirit to shine brightly among their Muslim neighbors.12.0%
For the Gulf Arab wherever they live, a profession of faith in Jesus may cost a person his family, his honor, his job or even his life. Evangelization of this people group will be challenging due to the nature of their lifestyle and belief system. Prayer is the key to reaching them with the gospel.12.0%
Please pray for the few Gulf Arabs who identify themselves as Christians, and also for those followers of Christ who are secret believers. Pray they will find scripture and other resources which will lead them to a correct understanding of what it means to know and follow Christ.12.0%
Please pray for Christian Arabs in Qatar. Pray they will clearly understand that forgiveness of sin is God's gift to them as a result of their trust in Christ's death on the cross. Pray they will live in obedience to Christ's commands, in gratitude for what He has given them.12.0%
Pray that the few Omani Arab believers will yield to the Holy Spirit and reach out to others in a loving, Christ-honoring way.12.0%
As far as we know, there are almost no followers of Christ among the Omani Arabs and no fellowships of believers no matter where they live.12.0%

RLG3PGAC numeric metadata

RLG3PGAC is a numeric column that holds the constant value 6.0 across all 50 rows, with zero variance and no nulls. Since min, max, mean, and both quartiles are all 6.0, the column carries no information for modelling or analysis.

Treatment: Drop, constant column.

anthropic:claude-opus-4-7 · confidence high
Out[262]:

saturn.columns["RLG3PGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 6
max 6
mean 6
median 6
std 0
q1 6
q3 6
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 89.
Distribution of RLG3PGAC. Vertical dash marks the median.
Show data table
Histogram bins for RLG3PGAC (median: 6.0).
bincount
5.5 – 5.6430
5.643 – 5.7860
5.786 – 5.9290
5.929 – 6.07150
6.071 – 6.2140
6.214 – 6.3570
6.357 – 6.50

ISO3 categorical foreign_key

ISO3 holds three-letter country codes (ARE, CAN, EGY, KEN...), making it a country identifier. With 41 unique values across 50 rows and entropy ratio 0.986, it is near-uniform with only nine countries appearing twice; no nulls. The long_tail alert reflects that most countries appear exactly once.

Treatment: left-join on this id to enrich with country attributes.

anthropic:claude-opus-4-7 · confidence high
Out[265]:

saturn.columns["ISO3"].stats

statvalue
n50
nulls0 (0.0%)
unique41
top_value ARE
top_rate 0.04
cardinality 41
entropy 5.284
entropy_ratio 0.9862
alert: long_tail32 singleton categories
Fig 90.
Top values for ISO3.
Show data table
Top values for ISO3 (20 unique shown, of 41 total).
valuecountshare
ARE24.0%
CAN24.0%
EGY24.0%
KEN24.0%
SOM24.0%
KWT24.0%
OMN24.0%
SAU24.0%
YEM24.0%
NER12.0%
SDN12.0%
TZA12.0%
UKR12.0%
DZA12.0%
AUS12.0%
AUT12.0%
BHR12.0%
BRA12.0%
BGR12.0%
LKA12.0%

NaturalPronunciation categorical feature

Phonetic pronunciation guides for what appear to be Arabic-related labels, with most variants ending in 'AE-rub' (likely 'Arab'). The top value 'AE-rub' covers 55% of rows (27/50), and the top three values account for 46 of 50 entries, leaving three singleton long-tail spellings like 'AH-eer TWA-reg' and 'KEN-yun AE-rub'. Cardinality is just 6 with a 2% null rate, suggesting a controlled vocabulary rather than free text.

Treatment: One-hot encode or group the three singleton categories into 'other' before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[268]:

saturn.columns["NaturalPronunciation"].stats

statvalue
n50
nulls1 (2.0%)
unique6
top_value AE-rub
top_rate 0.551
cardinality 6
entropy 1.728
entropy_ratio 0.6686
Fig 91.
Top values for NaturalPronunciation.
Show data table
Top values for NaturalPronunciation (6 unique shown, of 6 total).
valuecountshare
AE-rub2754.0%
gulf AE-rub1122.0%
oh-MAH-nee AE-rub816.0%
AH-eer TWA-reg12.0%
em-ee-RAH-tee AE-rub12.0%
KEN-yun AE-rub12.0%

PhotoAddress categorical metadata

PhotoAddress holds JPG filenames (e.g., p10375.jpg), so it points to image assets associated with each row. With only 5 unique values across 50 rows and the top file p10375.jpg covering 56% of records, the same images are reused heavily rather than being row-specific. Entropy ratio of 0.69 confirms a skewed distribution dominated by three filenames, while two others appear just once each.

Treatment: Treat as a low-cardinality asset reference; join to an image table or drop unless image features are needed.

anthropic:claude-opus-4-7 · confidence high
Out[271]:

saturn.columns["PhotoAddress"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value p10375.jpg
top_rate 0.56
cardinality 5
entropy 1.611
entropy_ratio 0.694
Fig 92.
Top values for PhotoAddress.
Show data table
Top values for PhotoAddress (5 unique shown, of 5 total).
valuecountshare
p10375.jpg2856.0%
p10376.jpg1224.0%
p10378.jpg816.0%
p10208.jpg12.0%
p10301.jpg12.0%

RegionCode numeric foreign_key

RegionCode is stored as an integer but only takes 11 distinct values across 50 rows (min 1, max 12), so it is almost certainly a categorical region identifier rather than a true numeric quantity. The distribution is roughly centered (mean 7.28, median 7) with low skew (-0.09) and one flagged outlier, but those moments are not meaningful for a code. No nulls or zeros are present.

Treatment: Cast to categorical and one-hot or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[274]:

saturn.columns["RegionCode"].stats

statvalue
n50
nulls0 (0.0%)
unique11
min 1
max 12
mean 7.28
median 7
std 2.711
q1 6
q3 9
iqr 3
skew -0.08855
kurtosis -0.2134
n_outliers 1
outlier_rate 0.02
zero_rate 0
Fig 93.
Distribution of RegionCode. Vertical dash marks the median.
Show data table
Histogram bins for RegionCode (median: 7.0).
bincount
1 – 2.5714
2.571 – 4.1432
4.143 – 5.7141
5.714 – 7.28623
7.286 – 8.8574
8.857 – 10.4310
10.43 – 126

LocationInCountry categorical free_text

Free-text geographic descriptions of where a group is located within a country, ranging from single words like "Scattered" to multi-clause sentences naming provinces and landmarks. The column is 72% null and only 14 of 50 rows carry values, yet entropy_ratio is 0.99 with 13 unique strings across 14 non-nulls — essentially every response is bespoke. The top value "Widespread." appears just twice, so there is no usable category structure.

Treatment: Treat as free-text notes; geocode or NER-extract place names rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[277]:

saturn.columns["LocationInCountry"].stats

statvalue
n50
nulls36 (72.0%)
unique13
top_value Widespread.
top_rate 0.1429
cardinality 13
entropy 3.664
entropy_ratio 0.9903
alert: long_tail12 singleton categories
alert: null_rate72.0% null
Fig 94.
Top values for LocationInCountry.
Show data table
Top values for LocationInCountry (13 unique shown, of 13 total).
valuecountshare
Widespread.24.0%
Central, Agadez area12.0%
Primarily north12.0%
Widespread. Formerly Zanzibar, coastal areas.12.0%
Gulf Bedu or village peoples12.0%
Middle East, North Africa12.0%
Lamu and Garissa counties: Somali border toTana river mouth, along coast and inland.12.0%
Scattered12.0%
Mainly in Hormozgan Province and nearby Persian gulf islands; in east Fars, Bushehr, Kerman, and Yazd provinces; Khamseh nomads and other Arab nomadic groups in south central Iran.12.0%
Al Basrah Governorate, south of Basrah city, near Persian Gulf.12.0%
North, Ash Sharqiyah Province, from southeast Kuwait border inland, then east to Persian Gulf north of Al Damman; south, Yeman and Oman borders, Ash Sharqiyah and Najran provinces.12.0%
Scattered. Kilifi, Kwale, Lamu, and Tana River counties.12.0%
Mainly in Hajar Mountains highlands; a few coastal regions.12.0%

JF categorical feature

JF is a binary Y/N flag with no missing values across 50 rows. The distribution is imbalanced: 'Y' accounts for 41 of 50 records (top_rate 0.82) versus 9 'N's, yielding entropy of 0.68.

Treatment: Encode as a 0/1 indicator and account for the 82/18 class imbalance in modelling.

anthropic:claude-opus-4-7 · confidence high
Out[280]:

saturn.columns["JF"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value Y
top_rate 0.82
cardinality 2
entropy 0.6801
entropy_ratio 0.6801
Fig 95.
Top values for JF.
Show data table
Top values for JF (2 unique shown, of 2 total).
valuecountshare
Y4182.0%
N918.0%

PopulationPGAC numeric feature

PopulationPGAC appears to be a population count tied to some PGAC grouping, with values ranging from 101,000 to 7,562,600 across 50 rows. Only 5 unique values populate the column, so the 'numeric' framing is misleading — it behaves more like a coarse categorical bucket. The right-skew (1.03) and 26% outlier rate stem from a small number of rows carrying the largest population value far above the Q3 of 3,096,000.

Treatment: Treat as a categorical/ordinal bucket given only 5 unique values, or log-transform if kept numeric.

anthropic:claude-opus-4-7 · confidence medium
Out[283]:

saturn.columns["PopulationPGAC"].stats

statvalue
n50
nulls0 (0.0%)
unique5
min 101,000
max 7.563e+06
mean 3.402e+06
median 1.927e+06
std 2.427e+06
q1 1.927e+06
q3 3.096e+06
iqr 1.169e+06
skew 1.03
kurtosis -0.6488
n_outliers 13
outlier_rate 0.26
zero_rate 0
alert: outliers26.0% rows beyond 1.5 IQR
Fig 96.
Distribution of PopulationPGAC. Vertical dash marks the median.
Show data table
Histogram bins for PopulationPGAC (median: 1927100.0).
bincount
1.01e+05 – 1.167e+062
1.167e+06 – 2.233e+0628
2.233e+06 – 3.299e+068
3.299e+06 – 4.365e+060
4.365e+06 – 5.431e+060
5.431e+06 – 6.497e+060
6.497e+06 – 7.563e+0612

PeopleGroupMapExpandedURL categorical metadata

This column holds URLs to expanded people-group map PDFs hosted on joshuaproject.net, one per row. It is mostly empty: 38 of 50 rows (top_rate 0.76) are blank strings, leaving only 11 distinct values across the 50 records with a long tail of near-unique links. Despite null_rate being 0, the dominant value is an empty string, so true coverage is roughly a quarter of rows.

Treatment: Convert empty strings to nulls and treat as an optional reference link rather than a modelling feature.

anthropic:claude-opus-4-7 · confidence high
Out[286]:

saturn.columns["PeopleGroupMapExpandedURL"].stats

statvalue
n50
nulls0 (0.0%)
unique11
top_value
top_rate 0.76
cardinality 11
entropy 1.575
entropy_ratio 0.4554
alert: long_tail8 singleton categories
Fig 97.
Top values for PeopleGroupMapExpandedURL.
Show data table
Top values for PeopleGroupMapExpandedURL (11 unique shown, of 11 total).
valuecountshare
3876.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375.pdf24.0%
https://joshuaproject.net/assets/media/profiles/maps/m00307.pdf24.0%
https://joshuaproject.net/assets/media/profiles/maps/m10208_ng.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_tz.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_ae.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_ke.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10375_rp.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_ir.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10376_us.pdf12.0%
https://joshuaproject.net/assets/media/profiles/maps/m10378_mu.pdf12.0%

TranslationNeedQuestionable unknown other

The column 'TranslationNeedQuestionable' was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 with no nulls. The name suggests a boolean or flag indicating whether a translation need is in doubt, but this cannot be confirmed from the evidence. No distributional signals are present to flag.

Treatment: Re-profile or manually inspect to determine type before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[289]:

saturn.columns["TranslationNeedQuestionable"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

Category categorical label

A low-cardinality categorical with 3 distinct values ('1','2','3') across 50 rows and no nulls, likely a class label or category code. The distribution is imbalanced: '1' dominates at 68% (34/50) while '2' and '3' account for just 7 and 9 rows respectively, giving an entropy ratio of 0.77.

Treatment: Treat as a categorical class label and address the class imbalance (e.g., stratified splits or reweighting) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[291]:

saturn.columns["Category"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value 1
top_rate 0.68
cardinality 3
entropy 1.221
entropy_ratio 0.7702
Fig 98.
Top values for Category.
Show data table
Top values for Category (3 unique shown, of 3 total).
valuecountshare
13468.0%
3918.0%
2714.0%

PhotoCopyright categorical feature

PhotoCopyright is a binary Y/N flag, almost certainly indicating whether a photo carries a copyright restriction. The distribution is severely imbalanced: 49 of 50 rows are 'N' and only 1 is 'Y', giving an entropy ratio of just 0.14. With effectively no variance, this column carries little signal on its own.

Treatment: Drop or retain only as a rare-event indicator; near-constant at 98% 'N'.

anthropic:claude-opus-4-7 · confidence high
Out[294]:

saturn.columns["PhotoCopyright"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.98
cardinality 2
entropy 0.1414
entropy_ratio 0.1414
alert: imbalancetop value is 98.0% of rows
Fig 99.
Top values for PhotoCopyright.
Show data table
Top values for PhotoCopyright (2 unique shown, of 2 total).
valuecountshare
N4998.0%
Y12.0%

NTOnline categorical feature

NTOnline is a categorical flag that takes only the value 'Y' across all 41 non-null rows, with 18% of records null. Effectively a constant indicator with no discriminative information, plus a non-trivial missingness rate that may itself be the only signal.

Treatment: Drop as a zero-variance column, or replace with a binary is_null indicator if missingness is meaningful.

anthropic:claude-opus-4-7 · confidence high
Out[297]:

saturn.columns["NTOnline"].stats

statvalue
n50
nulls9 (18.0%)
unique1
top_value Y
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 100.
Top values for NTOnline.
Show data table
Top values for NTOnline (1 unique shown, of 1 total).
valuecountshare
Y4182.0%

LeastReachedPC categorical metadata

This is a categorical flag that takes the value "Y" for all 50 rows, giving cardinality 1 and entropy 0.0. With a single constant value it carries no information and cannot discriminate between records. The name suggests it once tracked a 'least reached PC' status, but here it is degenerate.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[300]:

saturn.columns["LeastReachedPC"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value Y
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 101.
Top values for LeastReachedPC.
Show data table
Top values for LeastReachedPC (1 unique shown, of 1 total).
valuecountshare
Y50100.0%

ROG3 categorical feature

ROG3 holds two-letter country codes (AE, CA, EG, KE, SO, etc.), making it a geographic categorical feature. With 41 unique values across 50 rows and a top rate of just 0.04, the column is almost a unique-per-row identifier — entropy ratio 0.9862 confirms an extremely flat, long-tail distribution. No nulls, but the sample is too thin for any country to dominate.

Treatment: Group rare countries into an 'Other' bucket or map to region before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[303]:

saturn.columns["ROG3"].stats

statvalue
n50
nulls0 (0.0%)
unique41
top_value AE
top_rate 0.04
cardinality 41
entropy 5.284
entropy_ratio 0.9862
alert: long_tail32 singleton categories
Fig 102.
Top values for ROG3.
Show data table
Top values for ROG3 (20 unique shown, of 41 total).
valuecountshare
AE24.0%
CA24.0%
EG24.0%
KE24.0%
SO24.0%
KU24.0%
MU24.0%
SA24.0%
YM24.0%
NG12.0%
SU12.0%
TZ12.0%
UP12.0%
AG12.0%
AS12.0%
AU12.0%
BA12.0%
BR12.0%
BU12.0%
CE12.0%

ReligionSubdivision categorical metadata

This column records a religious subdivision, but it is overwhelmingly empty: 86% of the 50 rows are null, and every one of the 7 populated rows is 'Sunni'. With cardinality of 1 and entropy of 0, the field carries no discriminative signal in this sample.

Treatment: Drop or collapse to a binary 'Sunni vs missing' indicator; otherwise non-informative.

anthropic:claude-opus-4-7 · confidence high
Out[306]:

saturn.columns["ReligionSubdivision"].stats

statvalue
n50
nulls43 (86.0%)
unique1
top_value Sunni
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate86.0% null
alert: imbalancetop value is 100.0% of rows
Fig 103.
Top values for ReligionSubdivision.
Show data table
Top values for ReligionSubdivision (1 unique shown, of 1 total).
valuecountshare
Sunni714.0%

PCEthnicReligions numeric feature

Numeric count-like column where 93.9% of the 49 non-null values are zero and only 3 distinct values appear. The distribution is highly right-skewed (skew 4.45, kurtosis 19.79) with a max of 10 against a median of 0, producing 3 outliers (6.1%). One null is present.

Treatment: Binarize to zero/non-zero or drop, since near-constant zeros carry little signal.

anthropic:claude-opus-4-7 · confidence high
Out[309]:

saturn.columns["PCEthnicReligions"].stats

statvalue
n50
nulls1 (2.0%)
unique3
min 0
max 10
mean 0.4082
median 0
std 1.719
q1 0
q3 0
iqr 0
skew 4.446
kurtosis 19.79
n_outliers 3
outlier_rate 0.06122
zero_rate 0.9388
alert: high_skewskew=+4.45
alert: outliers6.1% rows beyond 1.5 IQR
Fig 104.
Distribution of PCEthnicReligions. Vertical dash marks the median.
Show data table
Histogram bins for PCEthnicReligions (median: 0.0).
bincount
0 – 1.42946
1.429 – 2.8570
2.857 – 4.2860
4.286 – 5.7142
5.714 – 7.1430
7.143 – 8.5710
8.571 – 101

PeopleCluster categorical feature

Categorical grouping of people clusters, with only 3 distinct values across 50 rows and no nulls. The distribution is extremely imbalanced: 'Arab, Arabian' covers 96% of rows, leaving 'Tuareg' and 'Arab, Sudan' with a single record each, yielding a very low entropy ratio of 0.178.

Treatment: Drop or collapse rare levels; near-constant column offers little signal.

anthropic:claude-opus-4-7 · confidence high
Out[312]:

saturn.columns["PeopleCluster"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value Arab, Arabian
top_rate 0.96
cardinality 3
entropy 0.2823
entropy_ratio 0.1781
alert: long_tail2 singleton categories
alert: imbalancetop value is 96.0% of rows
Fig 105.
Top values for PeopleCluster.
Show data table
Top values for PeopleCluster (3 unique shown, of 3 total).
valuecountshare
Arab, Arabian4896.0%
Tuareg12.0%
Arab, Sudan12.0%

IndigenousCode categorical feature

Binary Y/N flag indicating Indigenous status, with 'N' dominating at 86% (43 of 50) versus 7 'Y' records. The column is fully populated with no nulls and only 2 distinct values, yielding a low entropy ratio of 0.58. The class imbalance is notable for any modelling use.

Treatment: Encode as a binary indicator and account for class imbalance in any downstream model.

anthropic:claude-opus-4-7 · confidence high
Out[315]:

saturn.columns["IndigenousCode"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value N
top_rate 0.86
cardinality 2
entropy 0.5842
entropy_ratio 0.5842
Fig 106.
Top values for IndigenousCode.
Show data table
Top values for IndigenousCode (2 unique shown, of 2 total).
valuecountshare
N4386.0%
Y714.0%

MapCreditURL categorical metadata

MapCreditURL appears to be a metadata field intended to hold a URL crediting a map's source, but every one of the 50 rows is an empty string. Cardinality is 1, entropy is 0, and the top value (empty) accounts for 100% of records, so the column carries no information.

Treatment: Drop; the column is constant (all empty strings).

anthropic:claude-opus-4-7 · confidence high
Out[318]:

saturn.columns["MapCreditURL"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 107.
Top values for MapCreditURL.
Show data table
Top values for MapCreditURL (1 unique shown, of 1 total).
valuecountshare
50100.0%

MapCopyright categorical metadata

MapCopyright is a categorical column that holds the single value "N" across all 50 rows, giving it zero entropy and a top_rate of 1.0. With cardinality of 1 and no nulls, it carries no information for any downstream model or comparison.

Treatment: Drop; constant column with a single value.

anthropic:claude-opus-4-7 · confidence high
Out[321]:

saturn.columns["MapCopyright"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value N
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 108.
Top values for MapCopyright.
Show data table
Top values for MapCopyright (1 unique shown, of 1 total).
valuecountshare
N50100.0%

MapCCVersionURL categorical metadata

This column appears to be a metadata field intended to hold a Creative Commons version URL for a map, but every one of the 50 rows contains an empty string. Cardinality is 1 with entropy of 0.0, so it carries no information whatsoever.

Treatment: Drop the column; it is constant (empty) across all rows.

anthropic:claude-opus-4-7 · confidence high
Out[324]:

saturn.columns["MapCCVersionURL"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 109.
Top values for MapCCVersionURL.
Show data table
Top values for MapCCVersionURL (1 unique shown, of 1 total).
valuecountshare
50100.0%

PeopleGroupURL categorical identifier

This column holds Joshua Project people-group URLs, with each row pointing to a /people_groups/{id}/{country} path. All 50 rows are unique (cardinality 50, entropy_ratio 1.0, top_rate 0.02) and there are no nulls, so it functions as a per-row identifier rather than a categorical feature. The URL stems repeat (e.g. 10375 appears across TZ, UP, AG, AS, AU, BA), suggesting the same people group is tracked across multiple country codes.

Treatment: Drop from modelling; retain as a row-level link or parse out the group id and country code into separate keys.

anthropic:claude-opus-4-7 · confidence high
Out[327]:

saturn.columns["PeopleGroupURL"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://joshuaproject.net/people_groups/10208/NG
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 110.
Top values for PeopleGroupURL.
Show data table
Top values for PeopleGroupURL (20 unique shown, of 50 total).
valuecountshare
https://joshuaproject.net/people_groups/10208/NG12.0%
https://joshuaproject.net/people_groups/10301/SU12.0%
https://joshuaproject.net/people_groups/10375/TZ12.0%
https://joshuaproject.net/people_groups/10375/UP12.0%
https://joshuaproject.net/people_groups/10376/AE12.0%
https://joshuaproject.net/people_groups/10376/CA12.0%
https://joshuaproject.net/people_groups/10375/AG12.0%
https://joshuaproject.net/people_groups/10375/AS12.0%
https://joshuaproject.net/people_groups/10375/AU12.0%
https://joshuaproject.net/people_groups/10375/BA12.0%
https://joshuaproject.net/people_groups/10375/BR12.0%
https://joshuaproject.net/people_groups/10375/BU12.0%
https://joshuaproject.net/people_groups/10375/CA12.0%
https://joshuaproject.net/people_groups/10375/CE12.0%
https://joshuaproject.net/people_groups/10375/CG12.0%
https://joshuaproject.net/people_groups/10375/CU12.0%
https://joshuaproject.net/people_groups/10375/EG12.0%
https://joshuaproject.net/people_groups/10375/EI12.0%
https://joshuaproject.net/people_groups/10375/GB12.0%
https://joshuaproject.net/people_groups/10375/HA12.0%

How to cite

click to copy

BibTeX
@misc{saturn-archive-api-data-sample-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: archive api data sample},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/archive-api_data_sample}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: archive api data sample. Source: /home/coolhand/html/datavis/data_trove/joshua-project/archive/api_data_sample.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/archive-api_data_sample