This dataset is a comprehensive catalogue of the world's languoids from Glottolog, covering 23,740 entries that span dialects (10,920), languages (8,481), and language families (4,339). The most striking pattern is in endangerment status: while the majority (18,965) are marked 'safe', nearly 4,800 entries are endangered, extinct, or vulnerable — worth examining closely against family and geographic distribution. A second area of interest is the highly skewed child-count columns: most languoids have zero children (74% for dialects, 82% for languages), but a handful of nodes have hundreds or even thousands of descendants, suggesting a very uneven tree structure. Geographic coverage is also notably incomplete, with latitude and longitude missing for 66% of rows, limiting spatial analysis to a subset of the data.
saturn
/home/coolhand/html/datavis/data_trove/data/linguistic/glottolog_languoid.csv 23,740 rows sample n=23,740 seed 42 2026-06-22T00:19:32+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/data/linguistic/glottolog_languoid.csv |
| Total rows | 23,740 |
| Profiled sample | 23,740 |
| Columns | 16 |
| Generated | 2026-06-22T00:19:32+00:00 |
Show data table
| column | kind | null % |
|---|---|---|
| id | text | 0.0% |
| family_id | categorical | 1.8% |
| parent_id | text | 1.8% |
| name | text | 0.0% |
| bookkeeping | categorical | 0.0% |
| level | categorical | 0.0% |
| status | categorical | 0.0% |
| latitude | numeric | 66.5% |
| longitude | numeric | 66.5% |
| iso639P3code | text | 66.4% |
| description | unknown | 0.0% |
| markup_description | unknown | 0.0% |
| child_family_count | numeric | 0.0% |
| child_language_count | numeric | 0.0% |
| child_dialect_count | numeric | 0.0% |
| country_ids | categorical | 64.2% |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.
This column contains ISO 639-3 language codes — standardised three-letter identifiers for individual languages (e.g., 'aiz', 'kbt', 'mij'). Every non-null value is exactly 3 characters long (min=3, max=3, mean=3.0) and appears only once, yielding 7,968 unique codes across 7,968 vocabulary entries with zero duplicates — consistent with a lookup or reference table of distinct languages. The striking concern is a 66.44% null rate, meaning two-thirds of the 23,740 rows carry no code at all, which likely reflects languages or entries not yet mapped to an ISO 639-3 identifier.
This column contains Glottolog language family or clade identifiers — fixed 8-character alphanumeric codes (e.g. 'book1242', 'pidg1258') linking each row to a parent node in a linguistic taxonomy. Every value is exactly 8 characters long (len_min=8, len_max=8) and all are single tokens (one_word_rate=1.0), consistent with Glottolog's standardised glottocode format. The duplicate rate is notably high at 68.5%, with 'book1242' appearing 399 times, indicating many child languages share the same parent grouping — expected for a hierarchical classification. Null rate is low at 1.81%, suggesting most records have a resolvable parent.
This column is a structured language or dialect identifier, following the Glottolog-style 8-character code pattern (4 letters + 4 digits, e.g. 'tibe1272', 'east2553'). All 23,740 values are exactly 8 characters long (len_min/max/mean/median all equal 8) with zero nulls and zero duplicates, making it a perfect primary key. The top_words sample aligns precisely with Glottolog languoid codes, suggesting this is a linguistic database. No anomalies — the column is unusually clean.
This column contains names of linguistic or geographic entities — likely language varieties, dialects, or regional groupings, as evidenced by top words such as 'nuclear', 'central', 'western', 'eastern', 'northern', 'southern', 'language', and 'sign'. All 23,740 rows are unique with zero nulls and zero duplicates, making this a perfect natural-language identifier. Surprisingly, 69.5% of values are single words despite a mean length of ~9.95 characters, suggesting a mix of short atomic names and multi-word descriptors (up to 58 characters), with a vocabulary of 17,915 distinct tokens across the corpus.
This column counts the number of child dialects associated with each record in what appears to be a linguistic or language-taxonomy dataset. The distribution is extreme: 74.4% of rows have zero child dialects, the median is 0, and the IQR is just 0–1, yet the mean is 3.39 and the maximum reaches 2369—producing a skew of 42.2 and a kurtosis of 2159.3. Nearly 18% of rows (4,272) are flagged as outliers, indicating a small number of language/dialect nodes act as hubs with enormous fan-out while the vast majority are leaf nodes.
This column counts the number of children or family members associated with a record, with 90.8% of values being exactly zero and a median of 0.0 — meaning the vast majority of subjects have no associated family/child count. The distribution is extraordinarily right-skewed (skew = 44.4, kurtosis = 2352.9) with a max of 859.0 and 2,179 outliers (9.2% of rows), suggesting a small subset of records have implausibly large counts that warrant investigation for data-entry errors or aggregation anomalies.
This column counts the number of child languages associated with a record, likely in a linguistic or taxonomy dataset. The distribution is extremely right-skewed (skew = 41.86, kurtosis = 2115.08): 81.7% of rows have a value of zero, the median is 0.0, and the IQR is 0.0, yet the mean is ~2.0 and the max reaches 1,435. This implies a small number of parent-language nodes dominate with very large child counts, while most entries are leaf nodes with no children. Over 18% of rows (4,339) are flagged as outliers, reinforcing the extreme concentration at zero with a long, sparse upper tail.
This column contains geographic latitude values for records in the dataset, spanning from -55.27° (southern South America or similar) to 73.14° (Arctic latitudes), consistent with global coverage. The most striking issue is a 66.54% null rate — two-thirds of rows lack a latitude value, which severely limits spatial analysis. The distribution is mildly right-skewed (skew 0.54) with a mean of ~8.2° and median of ~6.3°, suggesting a concentration of records in tropical/equatorial regions. Only 129 outliers are flagged (1.6%), so the non-null values themselves appear geographically plausible.
This column represents geographic longitude, covering nearly the full global range from –178.785° to 179.306°. The null rate of 66.54% is a critical alert — two-thirds of records lack a coordinate, which may indicate missing geolocation data for a large subset of entities. The IQR of ~117° and std of ~81° confirm values are broadly distributed across hemispheres, and the mean (51.27°) skewing toward positive (Eastern) longitudes with a median of 47.72° suggests a concentration in Europe/Asia. Only 13 outliers are flagged despite the extreme range, consistent with legitimate global coordinates rather than data entry errors.
This column is a boolean flag (stored as strings 'True'/'False') indicating whether a record has some bookkeeping status applied — likely a soft-delete, correction, or administrative override marker. The distribution is severely imbalanced: 98.3% of rows are 'False' (23,341) versus only 1.7% 'True' (399), which triggered the imbalance alert. The entropy of 0.123 confirms near-zero information variance, meaning this flag is rarely set. Analysts should treat the 'True' minority with care as it may mark records to exclude or handle separately.
This column contains ISO 3166-1 alpha-2 country codes, functioning as a categorical geographic identifier for each record. The most striking issue is its 64.24% null rate, meaning nearly two-thirds of the 23,740 rows carry no country information. Among populated rows, Papua New Guinea ('PG') dominates at 10.29% of non-null values, followed by Indonesia ('ID') and Nigeria ('NG') — a developing-world skew that may reflect data collection bias. The 680-unique-value count is anomalously high for standard two-letter ISO codes (only ~250 exist), suggesting some values may be multi-code concatenations, malformed entries, or non-standard codes.
This column contains textual descriptions with zero null values across 23,740 rows. The profile was skipped by saturn, so no uniqueness, length, or token statistics are available — the column's exact nature (short labels vs. long free text) cannot be determined from the evidence. The 'skipped' alert and absent stats dict suggest the profiler either timed out or excluded this column type from deep analysis.
This column contains markup or formatted description text (likely HTML, Markdown, or similar) across 23,740 rows with zero nulls. The profiler skipped detailed analysis — indicated by the 'skipped' alert — so no uniqueness, length, or content statistics are available. Without further inspection, the content type and cardinality are unknown, but the name strongly implies free-form or templated descriptive text with embedded formatting syntax.
This column classifies linguistic entities into one of three hierarchical levels: 'dialect', 'language', and 'family', suggesting the dataset concerns a linguistic taxonomy or inventory (e.g., Ethnologue-style data). With only 3 unique values across 23,740 rows and zero nulls, it is a clean, fully populated categorical field. Notably, 'dialect' is the modal class at ~46% (10,920), followed by 'language' at ~35.7% (8,481) and 'family' at ~18.3% (4,339) — the class imbalance is moderate but not severe. The entropy ratio of 0.943 is surprisingly high for a 3-class variable, indicating near-uniform distribution across categories.
This column represents a conservation or endangerment status classification, most likely for languages or species, with 6 distinct ordinal categories ranging from 'safe' to 'extinct'. The distribution is heavily skewed: 'safe' dominates at nearly 80% of records (18,965 of 23,740), while the threatened categories collectively account for only ~20%. The low entropy ratio (0.445) confirms this imbalance, and analysts building classification models should expect a significant class imbalance problem.
This column represents language family identifiers, using Glottolog-style codes (e.g., 'atla1278' = Atlantic-Congo, 'aust1307' = Austronesian, 'indo1319' = Indo-European). With only 287 unique values across 23,740 rows it acts as a low-to-medium cardinality grouping key for linguistic families. The top value 'atla1278' is notably dominant at 20.0% of rows (4,663 records), and the top two families together account for roughly 36% of the dataset, signalling a heavily skewed distribution toward large African and Pacific language families.
Numeric correlation
Show data table
| latitude | longitude | child_family_count | child_language_count | child_dialect_count | |
|---|---|---|---|---|---|
| latitude | +1.00 | -0.31 | +0.03 | +0.01 | +0.06 |
| longitude | -0.31 | +1.00 | -0.03 | -0.04 | -0.05 |
| child_family_count | +0.03 | -0.03 | +1.00 | +0.96 | +0.74 |
| child_language_count | +0.01 | -0.04 | +0.96 | +1.00 | +0.69 |
| child_dialect_count | +0.06 | -0.05 | +0.74 | +0.69 | +1.00 |
id text
Show data table
| chars | count |
|---|---|
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 23740 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
Sample values (first 10)
- abbe1238
- sanm1298
- thur1255
- suar1238
- kukn1238
- yagu1244
- labu1252
- uist1237
- suku1258
- arak1251
family_id categorical
Show data table
| value | count | share |
|---|---|---|
| atla1278 | 4663 | 19.6% |
| aust1307 | 3850 | 16.2% |
| indo1319 | 2201 | 9.3% |
| sino1245 | 1666 | 7.0% |
| afro1255 | 1259 | 5.3% |
| nucl1709 | 762 | 3.2% |
| pama1250 | 598 | 2.5% |
| aust1305 | 503 | 2.1% |
| book1242 | 399 | 1.7% |
| otom1299 | 338 | 1.4% |
| mand1469 | 303 | 1.3% |
| sign1238 | 259 | 1.1% |
| drav1251 | 255 | 1.1% |
| cent2225 | 251 | 1.1% |
| turk1311 | 229 | 1.0% |
| taik1256 | 223 | 0.9% |
| nilo1247 | 201 | 0.8% |
| ural1272 | 185 | 0.8% |
| japo1237 | 179 | 0.8% |
| tupi1275 | 157 | 0.7% |
Top values (rank 1–20)
- atla1278 — 4,663
- aust1307 — 3,850
- indo1319 — 2,201
- sino1245 — 1,666
- afro1255 — 1,259
- nucl1709 — 762
- pama1250 — 598
- aust1305 — 503
- book1242 — 399
- otom1299 — 338
- mand1469 — 303
- sign1238 — 259
- drav1251 — 255
- cent2225 — 251
- turk1311 — 229
- taik1256 — 223
- nilo1247 — 201
- ural1272 — 185
- japo1237 — 179
- tupi1275 — 157
parent_id text
Show data table
| chars | count |
|---|---|
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 23311 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 8 | 0 |
Sample values (first 10)
- abee1242
- cent2144
- yuag1237
- mnon1258
- pama1253
- raic1241
- kenh1234
- uygh1240
- taih1244
- book1242
name text
Show data table
| chars | count |
|---|---|
| 1 – 2 | 54 |
| 2 – 4 | 496 |
| 4 – 5 | 4648 |
| 5 – 7 | 3145 |
| 7 – 8 | 4510 |
| 8 – 10 | 1373 |
| 10 – 11 | 1069 |
| 11 – 12 | 1997 |
| 12 – 14 | 1023 |
| 14 – 15 | 1768 |
| 15 – 17 | 644 |
| 17 – 18 | 848 |
| 18 – 20 | 340 |
| 20 – 21 | 281 |
| 21 – 22 | 518 |
| 22 – 24 | 224 |
| 24 – 25 | 298 |
| 25 – 27 | 101 |
| 27 – 28 | 140 |
| 28 – 30 | 46 |
| 30 – 31 | 44 |
| 31 – 32 | 69 |
| 32 – 34 | 24 |
| 34 – 35 | 34 |
| 35 – 37 | 12 |
| 37 – 38 | 9 |
| 38 – 39 | 4 |
| 39 – 41 | 2 |
| 41 – 42 | 3 |
| 42 – 44 | 3 |
| 44 – 45 | 5 |
| 45 – 47 | 1 |
| 47 – 48 | 2 |
| 48 – 49 | 1 |
| 49 – 51 | 0 |
| 51 – 52 | 2 |
| 52 – 54 | 0 |
| 54 – 55 | 0 |
| 55 – 57 | 0 |
| 57 – 58 | 2 |
Sample values (first 10)
- Abbey-Ve
- San Martín Itunyoso Triqui
- Thuri
- Asabano
- Kukna
- Yagua
- Labuan
- Uis Tasae
- Sukurase
- Araki (Iran)
bookkeeping categorical
Show data table
| value | count | share |
|---|---|---|
| False | 23341 | 98.3% |
| True | 399 | 1.7% |
Top values (rank 1–20)
- False — 23,341
- True — 399
level categorical
Show data table
| value | count | share |
|---|---|---|
| dialect | 10920 | 46.0% |
| language | 8481 | 35.7% |
| family | 4339 | 18.3% |
Top values (rank 1–20)
- dialect — 10,920
- language — 8,481
- family — 4,339
status categorical
Show data table
| value | count | share |
|---|---|---|
| safe | 18965 | 79.9% |
| definitely endangered | 1814 | 7.6% |
| vulnerable | 1194 | 5.0% |
| extinct | 889 | 3.7% |
| critically endangered | 465 | 2.0% |
| severely endangered | 413 | 1.7% |
Top values (rank 1–20)
- safe — 18,965
- definitely endangered — 1,814
- vulnerable — 1,194
- extinct — 889
- critically endangered — 465
- severely endangered — 413
latitude numeric
Show data table
| bin | count |
|---|---|
| -55.27 – -52.06 | 5 |
| -52.06 – -48.85 | 1 |
| -48.85 – -45.64 | 1 |
| -45.64 – -42.43 | 4 |
| -42.43 – -39.22 | 7 |
| -39.22 – -36.01 | 16 |
| -36.01 – -32.8 | 29 |
| -32.8 – -29.59 | 26 |
| -29.59 – -26.38 | 48 |
| -26.38 – -23.17 | 78 |
| -23.17 – -19.96 | 125 |
| -19.96 – -16.75 | 141 |
| -16.75 – -13.54 | 281 |
| -13.54 – -10.33 | 256 |
| -10.33 – -7.121 | 495 |
| -7.121 – -3.911 | 788 |
| -3.911 – -0.7005 | 681 |
| -0.7005 – 2.51 | 379 |
| 2.51 – 5.72 | 469 |
| 5.72 – 8.93 | 664 |
| 8.93 – 12.14 | 710 |
| 12.14 – 15.35 | 303 |
| 15.35 – 18.56 | 387 |
| 18.56 – 21.77 | 233 |
| 21.77 – 24.98 | 318 |
| 24.98 – 28.19 | 373 |
| 28.19 – 31.4 | 167 |
| 31.4 – 34.61 | 144 |
| 34.61 – 37.82 | 179 |
| 37.82 – 41.03 | 113 |
| 41.03 – 44.24 | 138 |
| 44.24 – 47.45 | 79 |
| 47.45 – 50.66 | 78 |
| 50.66 – 53.87 | 76 |
| 53.87 – 57.08 | 46 |
| 57.08 – 60.29 | 21 |
| 60.29 – 63.5 | 41 |
| 63.5 – 66.71 | 23 |
| 66.71 – 69.93 | 14 |
| 69.93 – 73.14 | 6 |
longitude numeric
Show data table
| bin | count |
|---|---|
| -178.8 – -169.8 | 13 |
| -169.8 – -160.9 | 4 |
| -160.9 – -151.9 | 10 |
| -151.9 – -143 | 11 |
| -143 – -134 | 10 |
| -134 – -125.1 | 17 |
| -125.1 – -116.1 | 124 |
| -116.1 – -107.2 | 47 |
| -107.2 – -98.21 | 78 |
| -98.21 – -89.26 | 280 |
| -89.26 – -80.31 | 59 |
| -80.31 – -71.36 | 235 |
| -71.36 – -62.41 | 218 |
| -62.41 – -53.45 | 150 |
| -53.45 – -44.5 | 60 |
| -44.5 – -35.55 | 40 |
| -35.55 – -26.6 | 0 |
| -26.6 – -17.64 | 4 |
| -17.64 – -8.692 | 105 |
| -8.692 – 0.2605 | 275 |
| 0.2605 – 9.213 | 444 |
| 9.213 – 18.17 | 751 |
| 18.17 – 27.12 | 322 |
| 27.12 – 36.07 | 430 |
| 36.07 – 45.02 | 228 |
| 45.02 – 53.97 | 126 |
| 53.97 – 62.93 | 35 |
| 62.93 – 71.88 | 80 |
| 71.88 – 80.83 | 210 |
| 80.83 – 89.78 | 208 |
| 89.78 – 98.74 | 269 |
| 98.74 – 107.7 | 457 |
| 107.7 – 116.6 | 239 |
| 116.6 – 125.6 | 502 |
| 125.6 – 134.5 | 316 |
| 134.5 – 143.5 | 598 |
| 143.5 – 152.4 | 667 |
| 152.4 – 161.4 | 123 |
| 161.4 – 170.4 | 186 |
| 170.4 – 179.3 | 12 |
iso639P3code text
Show data table
| chars | count |
|---|---|
| 2 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 7968 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 3 | 0 |
| 3 – 4 | 0 |
Sample values (first 10)
- aau
- mat
- twx
- sui
- ktr
- ygw
- key
- tui
- ssk
- agg
description unknown
markup_description unknown
child_family_count numeric
Show data table
| bin | count |
|---|---|
| 0 – 21.48 | 23588 |
| 21.48 – 42.95 | 93 |
| 42.95 – 64.43 | 27 |
| 64.43 – 85.9 | 6 |
| 85.9 – 107.4 | 4 |
| 107.4 – 128.9 | 3 |
| 128.9 – 150.3 | 2 |
| 150.3 – 171.8 | 2 |
| 171.8 – 193.3 | 1 |
| 193.3 – 214.8 | 1 |
| 214.8 – 236.2 | 0 |
| 236.2 – 257.7 | 0 |
| 257.7 – 279.2 | 1 |
| 279.2 – 300.7 | 2 |
| 300.7 – 322.1 | 1 |
| 322.1 – 343.6 | 1 |
| 343.6 – 365.1 | 0 |
| 365.1 – 386.6 | 0 |
| 386.6 – 408 | 2 |
| 408 – 429.5 | 1 |
| 429.5 – 451 | 0 |
| 451 – 472.5 | 0 |
| 472.5 – 493.9 | 0 |
| 493.9 – 515.4 | 0 |
| 515.4 – 536.9 | 0 |
| 536.9 – 558.4 | 0 |
| 558.4 – 579.8 | 1 |
| 579.8 – 601.3 | 0 |
| 601.3 – 622.8 | 0 |
| 622.8 – 644.2 | 0 |
| 644.2 – 665.7 | 0 |
| 665.7 – 687.2 | 0 |
| 687.2 – 708.7 | 2 |
| 708.7 – 730.2 | 0 |
| 730.2 – 751.6 | 0 |
| 751.6 – 773.1 | 0 |
| 773.1 – 794.6 | 0 |
| 794.6 – 816.1 | 1 |
| 816.1 – 837.5 | 0 |
| 837.5 – 859 | 1 |
child_language_count numeric
Show data table
| bin | count |
|---|---|
| 0 – 35.88 | 23547 |
| 35.88 – 71.75 | 121 |
| 71.75 – 107.6 | 37 |
| 107.6 – 143.5 | 6 |
| 143.5 – 179.4 | 3 |
| 179.4 – 215.2 | 4 |
| 215.2 – 251.1 | 3 |
| 251.1 – 287 | 2 |
| 287 – 322.9 | 2 |
| 322.9 – 358.8 | 0 |
| 358.8 – 394.6 | 1 |
| 394.6 – 430.5 | 1 |
| 430.5 – 466.4 | 0 |
| 466.4 – 502.2 | 1 |
| 502.2 – 538.1 | 1 |
| 538.1 – 574 | 2 |
| 574 – 609.9 | 1 |
| 609.9 – 645.8 | 0 |
| 645.8 – 681.6 | 0 |
| 681.6 – 717.5 | 1 |
| 717.5 – 753.4 | 2 |
| 753.4 – 789.2 | 0 |
| 789.2 – 825.1 | 0 |
| 825.1 – 861 | 0 |
| 861 – 896.9 | 0 |
| 896.9 – 932.8 | 0 |
| 932.8 – 968.6 | 0 |
| 968.6 – 1004 | 1 |
| 1004 – 1040 | 0 |
| 1040 – 1076 | 0 |
| 1076 – 1112 | 0 |
| 1112 – 1148 | 0 |
| 1148 – 1184 | 0 |
| 1184 – 1220 | 0 |
| 1220 – 1256 | 0 |
| 1256 – 1292 | 2 |
| 1292 – 1327 | 0 |
| 1327 – 1363 | 0 |
| 1363 – 1399 | 1 |
| 1399 – 1435 | 1 |
child_dialect_count numeric
Show data table
| bin | count |
|---|---|
| 0 – 59.23 | 23575 |
| 59.23 – 118.5 | 99 |
| 118.5 – 177.7 | 24 |
| 177.7 – 236.9 | 18 |
| 236.9 – 296.1 | 4 |
| 296.1 – 355.4 | 2 |
| 355.4 – 414.6 | 0 |
| 414.6 – 473.8 | 1 |
| 473.8 – 533 | 1 |
| 533 – 592.2 | 1 |
| 592.2 – 651.5 | 1 |
| 651.5 – 710.7 | 2 |
| 710.7 – 769.9 | 0 |
| 769.9 – 829.1 | 1 |
| 829.1 – 888.4 | 0 |
| 888.4 – 947.6 | 3 |
| 947.6 – 1007 | 0 |
| 1007 – 1066 | 0 |
| 1066 – 1125 | 1 |
| 1125 – 1184 | 1 |
| 1184 – 1244 | 0 |
| 1244 – 1303 | 0 |
| 1303 – 1362 | 1 |
| 1362 – 1421 | 0 |
| 1421 – 1481 | 0 |
| 1481 – 1540 | 0 |
| 1540 – 1599 | 1 |
| 1599 – 1658 | 0 |
| 1658 – 1718 | 0 |
| 1718 – 1777 | 0 |
| 1777 – 1836 | 1 |
| 1836 – 1895 | 1 |
| 1895 – 1954 | 0 |
| 1954 – 2014 | 0 |
| 2014 – 2073 | 0 |
| 2073 – 2132 | 0 |
| 2132 – 2191 | 0 |
| 2191 – 2251 | 1 |
| 2251 – 2310 | 0 |
| 2310 – 2369 | 1 |
country_ids categorical
Show data table
| value | count | share |
|---|---|---|
| PG | 874 | 3.7% |
| ID | 695 | 2.9% |
| NG | 480 | 2.0% |
| AU | 432 | 1.8% |
| IN | 356 | 1.5% |
| MX | 297 | 1.3% |
| CN | 271 | 1.1% |
| BR | 263 | 1.1% |
| US | 247 | 1.0% |
| CM | 196 | 0.8% |
| PH | 177 | 0.7% |
| CD | 156 | 0.7% |
| VU | 118 | 0.5% |
| SD | 99 | 0.4% |
| PE | 97 | 0.4% |
| TZ | 93 | 0.4% |
| MY | 90 | 0.4% |
| TD | 88 | 0.4% |
| RU | 83 | 0.3% |
| CO | 82 | 0.3% |
Top values (rank 1–20)
- PG — 874
- ID — 695
- NG — 480
- AU — 432
- IN — 356
- MX — 297
- CN — 271
- BR — 263
- US — 247
- CM — 196
- PH — 177
- CD — 156
- VU — 118
- SD — 99
- PE — 97
- TZ — 93
- MY — 90
- TD — 88
- RU — 83
- CO — 82