Summary confidence: high
This dataset contains 76,475 rows of linguistic feature observations, all sourced from WALS (World Atlas of Language Structures). Each row pairs a language (2,659 unique language IDs) with one of 192 typological features (e.g., 'Order of Object and Verb') and a categorical value encoded as both a short code (value_name) and a small integer (value). The most common features cluster around word order and negation, with 'Order of Object and Verb' being the top feature at 1,518 rows. Worth a closer look: the `value` column is highly skewed (skew 3.49, kurtosis 16.4) with ~3.2% outliers reaching up to 28, suggesting most features have only a few possible values but a handful have many categories. The `source` column is constant ('WALS') and can be ignored as a variable.
citing: row_count · column_count · columns.value_name.n_unique · columns.value_name.top_values · columns.language_id.n_unique · columns.language_id.top_values · columns.source.top_value · columns.source.top_rate · columns.value.skew · columns.value.kurtosis · columns.value.outlier_rate · columns.value.max · columns.feature_id.n_unique · columns.feature_name.top_values · columns.feature_name.top_rate