saturn

/home/coolhand/html/datavis/data_trove/data/linguistic/glottolog_languoid.csv 23,740 rows sample n=23,740 seed 42 2026-06-22T00:19:32+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/linguistic/glottolog_languoid.csv
Total rows23,740
Profiled sample23,740
Columns16
Generated2026-06-22T00:19:32+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
idtext0.0%
family_idcategorical1.8%
parent_idtext1.8%
nametext0.0%
bookkeepingcategorical0.0%
levelcategorical0.0%
statuscategorical0.0%
latitudenumeric66.5%
longitudenumeric66.5%
iso639P3codetext66.4%
descriptionunknown0.0%
markup_descriptionunknown0.0%
child_family_countnumeric0.0%
child_language_countnumeric0.0%
child_dialect_countnumeric0.0%
country_idscategorical64.2%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is a comprehensive catalogue of the world's languoids from Glottolog, covering 23,740 entries that span dialects (10,920), languages (8,481), and language families (4,339). The most striking pattern is in endangerment status: while the majority (18,965) are marked 'safe', nearly 4,800 entries are endangered, extinct, or vulnerable — worth examining closely against family and geographic distribution. A second area of interest is the highly skewed child-count columns: most languoids have zero children (74% for dialects, 82% for languages), but a handful of nodes have hundreds or even thousands of descendants, suggesting a very uneven tree structure. Geographic coverage is also notably incomplete, with latitude and longitude missing for 66% of rows, limiting spatial analysis to a subset of the data.

iso639P3code high anthropic:default

This column contains ISO 639-3 language codes — standardised three-letter identifiers for individual languages (e.g., 'aiz', 'kbt', 'mij'). Every non-null value is exactly 3 characters long (min=3, max=3, mean=3.0) and appears only once, yielding 7,968 unique codes across 7,968 vocabulary entries with zero duplicates — consistent with a lookup or reference table of distinct languages. The striking concern is a 66.44% null rate, meaning two-thirds of the 23,740 rows carry no code at all, which likely reflects languages or entries not yet mapped to an ISO 639-3 identifier.

parent_id high anthropic:default

This column contains Glottolog language family or clade identifiers — fixed 8-character alphanumeric codes (e.g. 'book1242', 'pidg1258') linking each row to a parent node in a linguistic taxonomy. Every value is exactly 8 characters long (len_min=8, len_max=8) and all are single tokens (one_word_rate=1.0), consistent with Glottolog's standardised glottocode format. The duplicate rate is notably high at 68.5%, with 'book1242' appearing 399 times, indicating many child languages share the same parent grouping — expected for a hierarchical classification. Null rate is low at 1.81%, suggesting most records have a resolvable parent.

id high anthropic:default

This column is a structured language or dialect identifier, following the Glottolog-style 8-character code pattern (4 letters + 4 digits, e.g. 'tibe1272', 'east2553'). All 23,740 values are exactly 8 characters long (len_min/max/mean/median all equal 8) with zero nulls and zero duplicates, making it a perfect primary key. The top_words sample aligns precisely with Glottolog languoid codes, suggesting this is a linguistic database. No anomalies — the column is unusually clean.

name high anthropic:default

This column contains names of linguistic or geographic entities — likely language varieties, dialects, or regional groupings, as evidenced by top words such as 'nuclear', 'central', 'western', 'eastern', 'northern', 'southern', 'language', and 'sign'. All 23,740 rows are unique with zero nulls and zero duplicates, making this a perfect natural-language identifier. Surprisingly, 69.5% of values are single words despite a mean length of ~9.95 characters, suggesting a mix of short atomic names and multi-word descriptors (up to 58 characters), with a vocabulary of 17,915 distinct tokens across the corpus.

child_dialect_count high anthropic:default

This column counts the number of child dialects associated with each record in what appears to be a linguistic or language-taxonomy dataset. The distribution is extreme: 74.4% of rows have zero child dialects, the median is 0, and the IQR is just 0–1, yet the mean is 3.39 and the maximum reaches 2369—producing a skew of 42.2 and a kurtosis of 2159.3. Nearly 18% of rows (4,272) are flagged as outliers, indicating a small number of language/dialect nodes act as hubs with enormous fan-out while the vast majority are leaf nodes.

child_family_count high anthropic:default

This column counts the number of children or family members associated with a record, with 90.8% of values being exactly zero and a median of 0.0 — meaning the vast majority of subjects have no associated family/child count. The distribution is extraordinarily right-skewed (skew = 44.4, kurtosis = 2352.9) with a max of 859.0 and 2,179 outliers (9.2% of rows), suggesting a small subset of records have implausibly large counts that warrant investigation for data-entry errors or aggregation anomalies.

child_language_count high anthropic:default

This column counts the number of child languages associated with a record, likely in a linguistic or taxonomy dataset. The distribution is extremely right-skewed (skew = 41.86, kurtosis = 2115.08): 81.7% of rows have a value of zero, the median is 0.0, and the IQR is 0.0, yet the mean is ~2.0 and the max reaches 1,435. This implies a small number of parent-language nodes dominate with very large child counts, while most entries are leaf nodes with no children. Over 18% of rows (4,339) are flagged as outliers, reinforcing the extreme concentration at zero with a long, sparse upper tail.

latitude high anthropic:default

This column contains geographic latitude values for records in the dataset, spanning from -55.27° (southern South America or similar) to 73.14° (Arctic latitudes), consistent with global coverage. The most striking issue is a 66.54% null rate — two-thirds of rows lack a latitude value, which severely limits spatial analysis. The distribution is mildly right-skewed (skew 0.54) with a mean of ~8.2° and median of ~6.3°, suggesting a concentration of records in tropical/equatorial regions. Only 129 outliers are flagged (1.6%), so the non-null values themselves appear geographically plausible.

longitude high anthropic:default

This column represents geographic longitude, covering nearly the full global range from –178.785° to 179.306°. The null rate of 66.54% is a critical alert — two-thirds of records lack a coordinate, which may indicate missing geolocation data for a large subset of entities. The IQR of ~117° and std of ~81° confirm values are broadly distributed across hemispheres, and the mean (51.27°) skewing toward positive (Eastern) longitudes with a median of 47.72° suggests a concentration in Europe/Asia. Only 13 outliers are flagged despite the extreme range, consistent with legitimate global coordinates rather than data entry errors.

bookkeeping high anthropic:default

This column is a boolean flag (stored as strings 'True'/'False') indicating whether a record has some bookkeeping status applied — likely a soft-delete, correction, or administrative override marker. The distribution is severely imbalanced: 98.3% of rows are 'False' (23,341) versus only 1.7% 'True' (399), which triggered the imbalance alert. The entropy of 0.123 confirms near-zero information variance, meaning this flag is rarely set. Analysts should treat the 'True' minority with care as it may mark records to exclude or handle separately.

country_ids medium anthropic:default

This column contains ISO 3166-1 alpha-2 country codes, functioning as a categorical geographic identifier for each record. The most striking issue is its 64.24% null rate, meaning nearly two-thirds of the 23,740 rows carry no country information. Among populated rows, Papua New Guinea ('PG') dominates at 10.29% of non-null values, followed by Indonesia ('ID') and Nigeria ('NG') — a developing-world skew that may reflect data collection bias. The 680-unique-value count is anomalously high for standard two-letter ISO codes (only ~250 exist), suggesting some values may be multi-code concatenations, malformed entries, or non-standard codes.

description low anthropic:default

This column contains textual descriptions with zero null values across 23,740 rows. The profile was skipped by saturn, so no uniqueness, length, or token statistics are available — the column's exact nature (short labels vs. long free text) cannot be determined from the evidence. The 'skipped' alert and absent stats dict suggest the profiler either timed out or excluded this column type from deep analysis.

markup_description low anthropic:default

This column contains markup or formatted description text (likely HTML, Markdown, or similar) across 23,740 rows with zero nulls. The profiler skipped detailed analysis — indicated by the 'skipped' alert — so no uniqueness, length, or content statistics are available. Without further inspection, the content type and cardinality are unknown, but the name strongly implies free-form or templated descriptive text with embedded formatting syntax.

level high anthropic:default

This column classifies linguistic entities into one of three hierarchical levels: 'dialect', 'language', and 'family', suggesting the dataset concerns a linguistic taxonomy or inventory (e.g., Ethnologue-style data). With only 3 unique values across 23,740 rows and zero nulls, it is a clean, fully populated categorical field. Notably, 'dialect' is the modal class at ~46% (10,920), followed by 'language' at ~35.7% (8,481) and 'family' at ~18.3% (4,339) — the class imbalance is moderate but not severe. The entropy ratio of 0.943 is surprisingly high for a 3-class variable, indicating near-uniform distribution across categories.

status high anthropic:default

This column represents a conservation or endangerment status classification, most likely for languages or species, with 6 distinct ordinal categories ranging from 'safe' to 'extinct'. The distribution is heavily skewed: 'safe' dominates at nearly 80% of records (18,965 of 23,740), while the threatened categories collectively account for only ~20%. The low entropy ratio (0.445) confirms this imbalance, and analysts building classification models should expect a significant class imbalance problem.

family_id high anthropic:default

This column represents language family identifiers, using Glottolog-style codes (e.g., 'atla1278' = Atlantic-Congo, 'aust1307' = Austronesian, 'indo1319' = Indo-European). With only 287 unique values across 23,740 rows it acts as a low-to-medium cardinality grouping key for linguistic families. The top value 'atla1278' is notably dominant at 20.0% of rows (4,663 records), and the top two families together account for roughly 36% of the dataset, signalling a heavily skewed distribution toward large African and Pacific language families.

Numeric correlation

Show data table
Pearson correlation across 5 numeric columns (values clipped to 2 decimals).
latitudelongitudechild_family_countchild_language_countchild_dialect_count
latitude+1.00-0.31+0.03+0.01+0.06
longitude-0.31+1.00-0.03-0.04-0.05
child_family_count+0.03-0.03+1.00+0.96+0.74
child_language_count+0.01-0.04+0.96+1.00+0.69
child_dialect_count+0.06-0.05+0.74+0.69+1.00

id text

100.0% of rows are unique strings 100.0% rows are a single word 95th-percentile length under 20 chars
rows23,740
null0 (0.0%)
unique23,740
len_min8
len_max8
len_mean8.000
len_median8.000
len_p958.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean86.111
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for id (mean: 8.0).
charscount
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 823740
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
Sample values (first 10)
  1. abbe1238
  2. sanm1298
  3. thur1255
  4. suar1238
  5. kukn1238
  6. yagu1244
  7. labu1252
  8. uist1237
  9. suku1258
  10. arak1251

family_id categorical

rows23,740
null429 (1.8%)
unique287
top_valueatla1278
top_rate0.200
cardinality287
entropy4.886
entropy_ratio0.598
Show data table
Top values for family_id (20 unique shown, of 287 total).
valuecountshare
atla1278466319.6%
aust1307385016.2%
indo131922019.3%
sino124516667.0%
afro125512595.3%
nucl17097623.2%
pama12505982.5%
aust13055032.1%
book12423991.7%
otom12993381.4%
mand14693031.3%
sign12382591.1%
drav12512551.1%
cent22252511.1%
turk13112291.0%
taik12562230.9%
nilo12472010.8%
ural12721850.8%
japo12371790.8%
tupi12751570.7%
Top values (rank 1–20)
  1. atla1278 — 4,663
  2. aust1307 — 3,850
  3. indo1319 — 2,201
  4. sino1245 — 1,666
  5. afro1255 — 1,259
  6. nucl1709 — 762
  7. pama1250 — 598
  8. aust1305 — 503
  9. book1242 — 399
  10. otom1299 — 338
  11. mand1469 — 303
  12. sign1238 — 259
  13. drav1251 — 255
  14. cent2225 — 251
  15. turk1311 — 229
  16. taik1256 — 223
  17. nilo1247 — 201
  18. ural1272 — 185
  19. japo1237 — 179
  20. tupi1275 — 157

parent_id text

100.0% rows are a single word 95th-percentile length under 20 chars 68.5% duplicate strings
rows23,740
null429 (1.8%)
unique7,338
len_min8
len_max8
len_mean8.000
len_median8.000
len_p958.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates15,973
duplicate_rate0.685
vocab_size7,189
readability_flesch_mean91.187
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for parent_id (mean: 8.0).
charscount
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 823311
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 80
Sample values (first 10)
  1. abee1242
  2. cent2144
  3. yuag1237
  4. mnon1258
  5. pama1253
  6. raic1241
  7. kenh1234
  8. uygh1240
  9. taih1244
  10. book1242

name text

100.0% of rows are unique strings 69.5% rows are a single word
rows23,740
null0 (0.0%)
unique23,740
len_min1
len_max58
len_mean9.950
len_median8.000
len_p9522.000
word_mean1.398
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size17,915
readability_flesch_mean42.625
emoji_rate0.000
url_rate0.000
one_word_rate0.695
allcaps_rate1.68e-04
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 9.950126368997473).
charscount
1 – 254
2 – 4496
4 – 54648
5 – 73145
7 – 84510
8 – 101373
10 – 111069
11 – 121997
12 – 141023
14 – 151768
15 – 17644
17 – 18848
18 – 20340
20 – 21281
21 – 22518
22 – 24224
24 – 25298
25 – 27101
27 – 28140
28 – 3046
30 – 3144
31 – 3269
32 – 3424
34 – 3534
35 – 3712
37 – 389
38 – 394
39 – 412
41 – 423
42 – 443
44 – 455
45 – 471
47 – 482
48 – 491
49 – 510
51 – 522
52 – 540
54 – 550
55 – 570
57 – 582
Sample values (first 10)
  1. Abbey-Ve
  2. San Martín Itunyoso Triqui
  3. Thuri
  4. Asabano
  5. Kukna
  6. Yagua
  7. Labuan
  8. Uis Tasae
  9. Sukurase
  10. Araki (Iran)

bookkeeping categorical

top value is 98.3% of rows
rows23,740
null0 (0.0%)
unique2
top_valueFalse
top_rate0.983
cardinality2
entropy0.123
entropy_ratio0.123
Show data table
Top values for bookkeeping (2 unique shown, of 2 total).
valuecountshare
False2334198.3%
True3991.7%
Top values (rank 1–20)
  1. False — 23,341
  2. True — 399

level categorical

rows23,740
null0 (0.0%)
unique3
top_valuedialect
top_rate0.460
cardinality3
entropy1.494
entropy_ratio0.943
Show data table
Top values for level (3 unique shown, of 3 total).
valuecountshare
dialect1092046.0%
language848135.7%
family433918.3%
Top values (rank 1–20)
  1. dialect — 10,920
  2. language — 8,481
  3. family — 4,339

status categorical

rows23,740
null0 (0.0%)
unique6
top_valuesafe
top_rate0.799
cardinality6
entropy1.150
entropy_ratio0.445
Show data table
Top values for status (6 unique shown, of 6 total).
valuecountshare
safe1896579.9%
definitely endangered18147.6%
vulnerable11945.0%
extinct8893.7%
critically endangered4652.0%
severely endangered4131.7%
Top values (rank 1–20)
  1. safe — 18,965
  2. definitely endangered — 1,814
  3. vulnerable — 1,194
  4. extinct — 889
  5. critically endangered — 465
  6. severely endangered — 413

latitude numeric

66.5% null
rows23,740
null15,797 (66.5%)
unique7,798
min-55.275
max73.135
mean8.170
median6.306
std18.962
q1-5.137
q319.336
iqr24.472
skew0.540
kurtosis0.301
n_outliers129
outlier_rate0.016
zero_rate0.000
Show data table
Histogram bins for latitude (median: 6.30619).
bincount
-55.27 – -52.065
-52.06 – -48.851
-48.85 – -45.641
-45.64 – -42.434
-42.43 – -39.227
-39.22 – -36.0116
-36.01 – -32.829
-32.8 – -29.5926
-29.59 – -26.3848
-26.38 – -23.1778
-23.17 – -19.96125
-19.96 – -16.75141
-16.75 – -13.54281
-13.54 – -10.33256
-10.33 – -7.121495
-7.121 – -3.911788
-3.911 – -0.7005681
-0.7005 – 2.51379
2.51 – 5.72469
5.72 – 8.93664
8.93 – 12.14710
12.14 – 15.35303
15.35 – 18.56387
18.56 – 21.77233
21.77 – 24.98318
24.98 – 28.19373
28.19 – 31.4167
31.4 – 34.61144
34.61 – 37.82179
37.82 – 41.03113
41.03 – 44.24138
44.24 – 47.4579
47.45 – 50.6678
50.66 – 53.8776
53.87 – 57.0846
57.08 – 60.2921
60.29 – 63.541
63.5 – 66.7123
66.71 – 69.9314
69.93 – 73.146

longitude numeric

66.5% null
rows23,740
null15,797 (66.5%)
unique7,757
min-178.785
max179.306
mean51.270
median47.724
std81.138
q17.235
q3124.122
iqr116.887
skew-0.483
kurtosis-0.774
n_outliers13
outlier_rate1.64e-03
zero_rate0.000
Show data table
Histogram bins for longitude (median: 47.7236).
bincount
-178.8 – -169.813
-169.8 – -160.94
-160.9 – -151.910
-151.9 – -14311
-143 – -13410
-134 – -125.117
-125.1 – -116.1124
-116.1 – -107.247
-107.2 – -98.2178
-98.21 – -89.26280
-89.26 – -80.3159
-80.31 – -71.36235
-71.36 – -62.41218
-62.41 – -53.45150
-53.45 – -44.560
-44.5 – -35.5540
-35.55 – -26.60
-26.6 – -17.644
-17.64 – -8.692105
-8.692 – 0.2605275
0.2605 – 9.213444
9.213 – 18.17751
18.17 – 27.12322
27.12 – 36.07430
36.07 – 45.02228
45.02 – 53.97126
53.97 – 62.9335
62.93 – 71.8880
71.88 – 80.83210
80.83 – 89.78208
89.78 – 98.74269
98.74 – 107.7457
107.7 – 116.6239
116.6 – 125.6502
125.6 – 134.5316
134.5 – 143.5598
143.5 – 152.4667
152.4 – 161.4123
161.4 – 170.4186
170.4 – 179.312

iso639P3code text

100.0% of rows are unique strings 100.0% rows are a single word 66.4% null 95th-percentile length under 20 chars
rows23,740
null15,772 (66.4%)
unique7,968
len_min3
len_max3
len_mean3.000
len_median3.000
len_p953.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size7,968
readability_flesch_mean119.528
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for iso639P3code (mean: 3.0).
charscount
2 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 37968
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 40
Sample values (first 10)
  1. aau
  2. mat
  3. twx
  4. sui
  5. ktr
  6. ygw
  7. key
  8. tui
  9. ssk
  10. agg

description unknown

no profiler for kind=unknown
rows23,740
null0 (0.0%)

markup_description unknown

no profiler for kind=unknown
rows23,740
null0 (0.0%)

child_family_count numeric

skew=+44.40 9.2% rows beyond 1.5 IQR
rows23,740
null0 (0.0%)
unique88
min0.000
max859.000
mean0.879
median0.000
std13.204
q10.000
q30.000
iqr0.000
skew44.398
kurtosis2,353
n_outliers2,179
outlier_rate0.092
zero_rate0.908
Show data table
Histogram bins for child_family_count (median: 0.0).
bincount
0 – 21.4823588
21.48 – 42.9593
42.95 – 64.4327
64.43 – 85.96
85.9 – 107.44
107.4 – 128.93
128.9 – 150.32
150.3 – 171.82
171.8 – 193.31
193.3 – 214.81
214.8 – 236.20
236.2 – 257.70
257.7 – 279.21
279.2 – 300.72
300.7 – 322.11
322.1 – 343.61
343.6 – 365.10
365.1 – 386.60
386.6 – 4082
408 – 429.51
429.5 – 4510
451 – 472.50
472.5 – 493.90
493.9 – 515.40
515.4 – 536.90
536.9 – 558.40
558.4 – 579.81
579.8 – 601.30
601.3 – 622.80
622.8 – 644.20
644.2 – 665.70
665.7 – 687.20
687.2 – 708.72
708.7 – 730.20
730.2 – 751.60
751.6 – 773.10
773.1 – 794.60
794.6 – 816.11
816.1 – 837.50
837.5 – 8591

child_language_count numeric

skew=+41.86 18.3% rows beyond 1.5 IQR
rows23,740
null0 (0.0%)
unique126
min0.000
max1,435
mean1.996
median0.000
std23.408
q10.000
q30.000
iqr0.000
skew41.859
kurtosis2,115
n_outliers4,339
outlier_rate0.183
zero_rate0.817
Show data table
Histogram bins for child_language_count (median: 0.0).
bincount
0 – 35.8823547
35.88 – 71.75121
71.75 – 107.637
107.6 – 143.56
143.5 – 179.43
179.4 – 215.24
215.2 – 251.13
251.1 – 2872
287 – 322.92
322.9 – 358.80
358.8 – 394.61
394.6 – 430.51
430.5 – 466.40
466.4 – 502.21
502.2 – 538.11
538.1 – 5742
574 – 609.91
609.9 – 645.80
645.8 – 681.60
681.6 – 717.51
717.5 – 753.42
753.4 – 789.20
789.2 – 825.10
825.1 – 8610
861 – 896.90
896.9 – 932.80
932.8 – 968.60
968.6 – 10041
1004 – 10400
1040 – 10760
1076 – 11120
1112 – 11480
1148 – 11840
1184 – 12200
1220 – 12560
1256 – 12922
1292 – 13270
1327 – 13630
1363 – 13991
1399 – 14351

child_dialect_count numeric

skew=+42.22 18.0% rows beyond 1.5 IQR
rows23,740
null0 (0.0%)
unique164
min0.000
max2,369
mean3.389
median0.000
std36.799
q10.000
q31.000
iqr1.000
skew42.219
kurtosis2,159
n_outliers4,272
outlier_rate0.180
zero_rate0.744
Show data table
Histogram bins for child_dialect_count (median: 0.0).
bincount
0 – 59.2323575
59.23 – 118.599
118.5 – 177.724
177.7 – 236.918
236.9 – 296.14
296.1 – 355.42
355.4 – 414.60
414.6 – 473.81
473.8 – 5331
533 – 592.21
592.2 – 651.51
651.5 – 710.72
710.7 – 769.90
769.9 – 829.11
829.1 – 888.40
888.4 – 947.63
947.6 – 10070
1007 – 10660
1066 – 11251
1125 – 11841
1184 – 12440
1244 – 13030
1303 – 13621
1362 – 14210
1421 – 14810
1481 – 15400
1540 – 15991
1599 – 16580
1658 – 17180
1718 – 17770
1777 – 18361
1836 – 18951
1895 – 19540
1954 – 20140
2014 – 20730
2073 – 21320
2132 – 21910
2191 – 22511
2251 – 23100
2310 – 23691

country_ids categorical

64.2% null
rows23,740
null15,250 (64.2%)
unique680
top_valuePG
top_rate0.103
cardinality680
entropy6.493
entropy_ratio0.690
Show data table
Top values for country_ids (20 unique shown, of 680 total).
valuecountshare
PG8743.7%
ID6952.9%
NG4802.0%
AU4321.8%
IN3561.5%
MX2971.3%
CN2711.1%
BR2631.1%
US2471.0%
CM1960.8%
PH1770.7%
CD1560.7%
VU1180.5%
SD990.4%
PE970.4%
TZ930.4%
MY900.4%
TD880.4%
RU830.3%
CO820.3%
Top values (rank 1–20)
  1. PG — 874
  2. ID — 695
  3. NG — 480
  4. AU — 432
  5. IN — 356
  6. MX — 297
  7. CN — 271
  8. BR — 263
  9. US — 247
  10. CM — 196
  11. PH — 177
  12. CD — 156
  13. VU — 118
  14. SD — 99
  15. PE — 97
  16. TZ — 93
  17. MY — 90
  18. TD — 88
  19. RU — 83
  20. CO — 82