Summary confidence: high
This dataset is a Glottolog languoid catalog with 23,740 rows and 16 columns describing languages, dialects, and families along with geographic and endangerment metadata. The `level` field splits the records into three classes — dialect (10,920), language (8,481), and family (4,339) — making it the natural primary lens. Endangerment `status` is dominated by 'safe' (~79.9%), but the remaining categories flag thousands of vulnerable to extinct languages worth investigating. Geography is concentrated: `country_ids` is led by PG (874), ID (695), and NG (480), and `family_id` is heavily skewed toward atla1278 (4,663) and aust1307 (3,850). Note that `iso639P3code`, `latitude`, and `longitude` are ~66% null, so spatial analysis will only cover about a third of rows.
citing: level · status · country_ids · family_id · latitude · longitude · iso639P3code · child_dialect_count · bookkeeping