Summary confidence: high
This dataset catalogues 19,401 world languages, each identified by a unique Glottocode and name, with attributes like geographic coordinates, macroarea, language family, ISO code, and phoneme count. Two things stand out for closer inspection: phoneme_count is missing for 88.8% of rows and is heavily right-skewed (mean ~38, max 231), so any analysis of phonological inventories will rely on a small subsample with notable outliers. Latitude and longitude are also null for 59.1% of rows, which will limit mapping coverage. On the categorical side, macroarea is well-distributed across six regions but dominated by Africa (32%), while the status column is uninformative since every language is labelled 'living'.
citing: phoneme_count · latitude · longitude · macroarea · status · name · glottocode