Summary confidence: high
This is the Joshua Project people-groups dataset: 16,382 rows and 109 columns describing ethnic groups by country, with demographics, language, religion mix, and Christian-engagement indicators. The shape is dominated by categorical and text fields — Continent, AffinityBloc, PrimaryReligion, and the JPScale 'reachedness' rating give the cleanest first read on who is in the file. Population is extremely long-tailed (median 20,000 but max ~913M and skew ~91), so any size analysis should use logs or quantiles rather than means. Religion-share columns like PCIslam, PCHinduism, and PCBuddhism are mostly zero with a minority of groups at very high percentages, which tells you religion is effectively single-dominant per group. Watch out for several columns with very high null rates (RLG4 96%, NomadicTypeDescription 98%, PrimaryLanguageDialect 92%, NTOnline 29%) and many near-duplicate URL/ID fields that won't add analytic value.
citing: Continent · AffinityBloc · PrimaryReligion · JPScaleText · Population · PCIslam · PCHinduism · RegionName · LeastReached · Frontier