saturn

/home/coolhand/html/datavis/data_trove/data/linguistic/phoible.csv 105,484 rows sample n=105,484 seed 42 2026-06-22T00:14:33+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/linguistic/phoible.csv
Total rows105,484
Profiled sample105,484
Columns49
Generated2026-06-22T00:14:33+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
InventoryIDnumeric0.0%
Glottocodetext0.0%
ISO6393text0.0%
LanguageNametext0.0%
SpecificDialectcategorical0.0%
GlyphIDtext0.0%
Phonemetext0.0%
Allophonestext0.0%
Marginalcategorical0.0%
SegmentClasscategorical0.0%
Sourcecategorical0.0%
tonecategorical0.0%
stresscategorical0.0%
syllabiccategorical0.0%
shortcategorical0.0%
longcategorical0.0%
consonantalcategorical0.0%
sonorantcategorical0.0%
continuantcategorical0.0%
delayedReleasecategorical0.0%
approximantcategorical0.0%
tapcategorical0.0%
trillcategorical0.0%
nasalcategorical0.0%
lateralcategorical0.0%
labialcategorical0.0%
roundcategorical0.0%
labiodentalcategorical0.0%
coronalcategorical0.0%
anteriorcategorical0.0%
distributedcategorical0.0%
stridentcategorical0.0%
dorsalcategorical0.0%
highcategorical0.0%
lowcategorical0.0%
frontcategorical0.0%
backcategorical0.0%
tensecategorical0.0%
retractedTongueRootcategorical0.0%
advancedTongueRootcategorical0.0%
periodicGlottalSourcecategorical0.0%
epilaryngealSourcecategorical0.0%
spreadGlottiscategorical0.0%
constrictedGlottiscategorical0.0%
fortiscategorical0.0%
leniscategorical0.0%
raisedLarynxEjectivecategorical0.0%
loweredLarynxImplosivecategorical0.0%
clickcategorical0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is PHOIBLE, a cross-linguistic phonological inventory database containing 105,484 phoneme-level records spanning roughly 2,177 languages and dialects, each row describing a single phoneme and its distinctive feature values. The most immediate thing to examine is the breakdown by SegmentClass: consonants dominate (~68.5%), followed by vowels (~29.5%) and tones (~2%), which shapes how almost every other feature distributes. A second focus is the Source column, which reveals that data comes from eight different linguistic databases ('ph' alone accounts for 34%), meaning coverage and coding conventions are uneven across the corpus and could introduce systematic biases in any cross-linguistic comparison.

GlyphID high anthropic:default

GlyphID is a categorical identifier column containing Unicode code point values (e.g., '006D' = 'm', '0069' = 'i', '0061' = 'a') stored as uppercase hexadecimal strings, almost certainly representing character-level glyph references in a typography, OCR, or text-rendering dataset. With only 3,142 unique values across 105,484 rows, the duplicate rate is extremely high at 97.0%, which is expected for a glyph/character frequency table but means this column functions as a low-cardinality label rather than a unique identifier despite its name. All values are fully uppercase (allcaps_rate 1.0), single-token (one_word_rate 1.0), and short (median length 4 characters), consistent with 4–6 character hex codes. The top values map to common Latin lowercase letters, suggesting the underlying corpus is predominantly Latin-script text.

LanguageName high anthropic:default

This column contains human language names (e.g., 'Dutch', 'Chechen', 'Bengali', 'Iron Ossetic'), functioning as a categorical label drawn from a linguistically diverse vocabulary of 2,716 unique values across 105,484 rows. The duplicate rate of 97.4% (102,768 duplicates) confirms it is a low-cardinality repeating label, not a free-text field. Notably, 13.1% of values are all-caps, suggesting some entries use ISO-style abbreviations or codes alongside full names. The distribution is uneven — the top value 'Iron Ossetic' appears 444 times while many languages appear rarely — indicating a long-tail spread across minority and regional languages.

Allophones high anthropic:default

This column contains allophone representations from a phonology or linguistics dataset, storing the phonetic variants of phonemes (e.g., 'm', 'j', 'w', 's') as very short strings with a mean length of ~2 characters. Strikingly, 53,580 of 105,484 rows (roughly 50.8%) carry the sentinel value 'NA', indicating no allophone recorded, which dominates the duplicate rate of 93.5%. The 6,892 unique values across a vocabulary of only 1,263 words suggest a modest set of phonetic symbols combined in small clusters, consistent with IPA or similar notation.

Glottocode high anthropic:default

This column contains Glottocodes — the standardized 8-character language identifiers used by the Glottolog database (e.g., 'kham1282', 'dutc1256'), confirmed by the near-uniform length of 8 characters (mean 7.999, median 8.0) and the structured alphanumeric format. With only 2,177 unique codes across 105,484 rows, the duplicate rate is extremely high at 97.9%, meaning each language code recurs on average ~48 times — consistent with a dataset where many observations (e.g., words, features, speakers) are annotated per language. The top code 'kham1282' (Kham) appears 622 times, suggesting uneven language coverage in the dataset.

ISO6393 high anthropic:default

This column contains ISO 639-3 three-letter language codes, a standardised identifier for natural languages. Every value is exactly 3 characters long (min, mean, max all equal 3) with zero nulls, confirming strict conformity to the standard. The duplicate rate is extremely high at 98.0%, meaning 2,095 distinct codes repeat across 105,484 rows — expected behaviour for a language tag applied to many records. The most frequent code 'mis' (miscellaneous/unattested language) appearing 828 times may warrant attention, as it signals a non-trivial share of records with unidentified languages.

Phoneme high anthropic:default

This column contains phoneme symbols from a linguistic or speech dataset, with 3,142 unique phoneme strings across 105,484 rows. Values are overwhelmingly single characters (len_median 1.0, len_mean 1.5, len_max 11), consistent with IPA or ARPABET-style phoneme notation. The duplicate rate is 97.0% (102,342 duplicates), which is expected given a finite phoneme inventory repeated across many words or utterances. The vocab_size of 1,339 distinct tokens against only 3,142 unique values suggests multi-character phoneme strings (e.g., digraphs or diacritics) are also present alongside single-character ones.

advancedTongueRoot high anthropic:default

This column encodes the Advanced Tongue Root (ATR) phonological feature, a binary articulatory distinction marked as '+' (present), '-' (absent), or '0' (neutral/not applicable), typical of linguistic phoneme inventories. The distribution is severely imbalanced: '-' dominates at 97.87% (103,238 of 105,484 rows), '0' appears in only 2,235 rows, and '+' is nearly absent with just 11 occurrences. The near-zero entropy ratio (0.094) confirms that '+ATR' is a vanishingly rare feature in this dataset, which would make any model predicting '+' extremely difficult to train without resampling.

epilaryngealSource high anthropic:default

This column encodes whether an epilaryngeal phonation source is present, using a ternary scheme: absent ('-'), present ('+')), or a neutral/ambiguous state ('0'). The distribution is severely imbalanced: the '-' (absent) class dominates at 97.9% of 105,484 rows, while '+' (present) appears in only 31 records (~0.03%), making positive-class detection extremely challenging. The near-zero entropy (0.147) confirms almost no informational variance in this column as-is.

loweredLarynxImplosive high anthropic:default

This column encodes a phonological feature — specifically whether a sound has a lowered larynx implosive articulation — using a symbolic notation system ('+', '-', and combinations). The distribution is severely imbalanced: 97.3% of the 105,484 rows carry the default/absent value '-', with only 2,150 instances of '0', 716 of '+', and fewer than 10 combined entries. The near-zero entropy ratio (0.088) confirms this column carries almost no information for most records, which would make it a very weak standalone predictor.

raisedLarynxEjective high anthropic:default

This column encodes the presence or absence of a raised-larynx ejective phonological feature, using a diacritic notation system common in linguistic databases (e.g., IPA or feature matrices). The dominant value is '-' (absent/not applicable), appearing in 96.4% of 105,484 rows, creating severe class imbalance with an entropy ratio of only 0.10. Minority values include '0' (2,150 rows), '+' (1,573 rows), and compound sequences like '-,+' and '+,-', suggesting some segments receive multi-valued or composite feature annotations. The near-absence of nulls (0.0%) indicates complete annotation coverage, but the extreme skew means this feature will contribute negligible signal in most modelling contexts without deliberate resampling or grouping.

retractedTongueRoot high anthropic:default

This column encodes the phonetic feature 'retracted tongue root' (RTR), a binary or multi-valued linguistic annotation used in phonological databases to mark vowel or consonant articulation. The dominant value is '-' (absence of RTR) at 97.44% of 105,484 rows, making the feature extremely rare in this dataset — only ~2,696 tokens show any positive marking. The compound values ('-,+', '-,-,+', etc.) suggest per-segment annotation strings rather than single-token labels, indicating a sequence or multi-segment scope.

short high anthropic:default

This column appears to encode a directional or sign indicator with only 4 distinct values: '-', '0', '+', and the compound '-,+'. It is severely imbalanced: the dominant value '-' accounts for 97,764% of rows (103,125 of 105,484), while '+' appears just 204 times and the mixed label '-,+' only 5 times. The near-zero entropy ratio (0.082) confirms the column carries very little information variance, and any model trained on it will be overwhelmed by the '-' class.

stress medium anthropic:default

This column is a binary categorical flag likely indicating the presence or absence of stress (e.g., a linguistic or physiological stress marker), encoded as '-' for the negative/absent case and '0' for the positive/present case. The distribution is severely imbalanced: '-' accounts for 97.96% of all 105,484 rows (103,334 occurrences) versus only 2,150 rows for '0'. The entropy ratio of 0.144 confirms near-minimum information content, meaning the positive class is rare and models trained on this column will need class-balancing strategies.

tap medium anthropic:default

This column appears to encode a directional or signed-change indicator, with values like '-', '+', '0', and compound sequences such as '-,+' and '-,-,+' that suggest a history of sign changes or transitions. The dominant value '-' accounts for 96.7% of all 105,484 rows, producing severe class imbalance (entropy ratio of only 0.10), meaning positive or mixed-direction events are rare signals. The compound multi-value entries ('-,+', '-,-,+') are notable — they imply the field can store ordered sequences of transitions, not just a single state. No nulls are present.

tone high anthropic:default

This column is a binary tone indicator with only two values: '0' (neutral/absent) and '+' (positive), across 105,484 rows with no nulls. The distribution is severely imbalanced — '0' accounts for 97.96% of records (103,334) while '+' appears in just 2,150 rows (~2%). The entropy of 0.144 (out of a maximum of 1.0 for a binary variable) confirms the near-total dominance of a single class, which would make any model trained on this label prone to predicting '0' exclusively.

trill high anthropic:default

This column encodes a phonological feature — specifically the presence or absence of a trill articulation — using a small symbol vocabulary of 6 distinct values. The dominant value is "-" (absent/negative), which accounts for 96.15% of 105,484 rows, producing an extremely low entropy of 0.276 and triggering an imbalance alert. The minority values ("0", "+", and compound sequences like "-,+") together cover fewer than 4,000 rows, suggesting trill is a rare feature in this phonological dataset. The compound values ("-,+", "-,-,+", "+,-") with counts of 26, 8, and 2 respectively hint at multi-segment or allophonic annotation, but are statistically negligible.

InventoryID high anthropic:default

InventoryID is a numeric foreign key referencing an inventory dimension table, with exactly 3,020 distinct values spanning 1–3,020 across 105,484 rows, implying heavy repeated use of each ID (average ~35 rows per ID). The distribution is remarkably flat and symmetric (skew ≈ −0.002, kurtosis ≈ −1.15, zero outliers), consistent with a well-populated lookup identifier rather than a measured quantity.

Marginal high anthropic:default

This column is a boolean flag indicating whether a record is classified as 'marginal', with three distinct string values: FALSE, NA, and TRUE. The dominant value is FALSE (78.9% of 105,484 rows), while TRUE is strikingly rare at only 1,347 records (≈1.3%). The presence of 'NA' as a literal string value in 20,874 rows (≈19.8%) is noteworthy — these are not system nulls (null_rate is 0.0) but encoded string missings, which must be handled explicitly rather than via standard null imputation.

SegmentClass high anthropic:default

SegmentClass is a phonological category label classifying 105,484 linguistic segments into exactly three classes: consonant, vowel, and tone. Consonants dominate at 68.5% (72,282 occurrences), vowels account for 29.4% (31,052), and tones are a small minority at just 2.0% (2,150) — a distribution consistent with natural language phoneme inventories but with tones notably underrepresented, suggesting the dataset skews toward non-tonal languages or tonal markings are partially absent. Zero nulls and perfect coverage make this a clean, reliable label.

Source high anthropic:default

This column identifies the source database or linguistic inventory from which each phonological record was drawn, with 8 distinct source codes across 105,484 rows and no nulls. The top source 'ph' dominates at 34.4% (36,274 rows), likely referring to PHOIBLE or a similar phoneme database, followed by 'ea', 'upsid', 'er', 'saphon', 'aa', 'spa', and 'ra'. The high entropy ratio of 0.899 indicates a relatively even spread across sources despite 'ph' leading — no single source overwhelmingly controls the data. Analysts should be aware that cross-source comparisons may introduce systematic coding differences, as each source may apply distinct phonological conventions.

SpecificDialect high anthropic:default

This column captures a specific dialect designation for each record in what appears to be a linguistic or language survey dataset. The dominant signal is extreme missingness-by-label: 71.9% of rows carry the value 'NA' and a further 7,692 rows (≈7.3%) are empty strings, meaning roughly 79% of records lack a meaningful dialect value. Despite 546 unique values, the effective entropy ratio is only 0.33, confirming that real dialect labels are thinly and unevenly spread across the remaining ~22,000 rows.

anterior medium anthropic:default

This column captures an anterior anatomical or directional finding coded as a signed categorical variable, with 6 distinct values across 105,484 rows and zero nulls. The dominant value '0' (64.8% of rows) likely indicates absence or neutral status, while '+' (24.4%) and '-' (10.8%) denote positive/negative findings. Surprisingly, a small number of rows (9, 5, and 3) contain compound multi-value strings like '-,+', '+,-', and '-,-,+', suggesting occasional data-entry concatenation errors or multi-event encoding that deviates from the expected single-symbol schema.

approximant high anthropic:default

This column captures whether a phoneme or linguistic segment is classified as an approximant, using a compact symbolic encoding where '-' means absent and '+' means present. The dominant value is '-' at 55.9% of rows (58,966), with '+' accounting for 41.9% (44,266), giving a near-binary distribution with very low entropy (1.12). Surprising are the small number of compound values ('-,+', '-,-,+', '+,-') totalling just 102 rows, suggesting a minority of segments carry ambiguous or multi-valued approximant classifications — possibly transcription artifacts or multi-segment bundles.

back medium anthropic:default

This column encodes directional movement or change signals for a 'back' dimension, using a compact notation of '+', '-', and '0' tokens — likely representing price/value movement sequences (e.g., up, down, flat) over a lookback window. The dominant value is '0' at 46.7% (49,270 rows), followed by '-' at 39,749 and '+' at 15,547, indicating a strong bearish/negative skew relative to positive signals. Compound multi-step sequences like '+,-' (511) and '-,+' (367) exist but are rare, and a handful of three-step sequences appear at the tail, suggesting variable-length encoding that could cause parsing issues if treated as a simple label.

click high anthropic:default

This column captures click direction or interaction type, encoded as symbolic tokens ('-', '0', '+') with combination values ('+,-', '-,+'). The dominant value is '-' at 68.2% of 105,484 rows, while positive-click signals ('+', '+,-', '-,+') account for only ~311 rows combined (~0.3%), suggesting a heavily imbalanced outcome or event flag. The presence of multi-value strings like '+,-' and '-,+' implies ordered sequences of click events were sometimes collapsed into a single cell rather than normalized — this warrants investigation before modelling.

consonantal high anthropic:default

This column encodes a binary phonological feature — consonantal — marking whether a segment is consonantal (+) or not (-), a standard distinctive feature in linguistics datasets. The dominant value is '+' at 60.9% (64,257 rows), with '-' covering 37.0% (39,041 rows). The near-zero third category '0' (2,151 rows) suggests underspecified segments, while the composite values '+,-' (34) and '-,+' (1) are anomalous multi-value entries that likely reflect data quality issues or ambiguous annotations.

constrictedGlottis high anthropic:default

This column captures a clinical or phonetic annotation for constricted glottis, encoded as presence/absence symbols across 105,484 records with no nulls and only 7 unique values. The dominant value is '-' (absent/negative), appearing in 94.5% of rows (99,727), while '+' (present) accounts for just 3,383 rows — indicating a rare positive condition. The compound values ('+,-', '-,+', '+,-,-', '-,-,+') suggest multi-observation or multi-segment sequences for a tiny fraction of records, but with counts of 141, 93, 1, and 1 respectively, these are near-negligible. The extremely low entropy (0.372, entropy ratio 0.132) confirms the column is heavily imbalanced toward the negative class.

continuant high anthropic:default

This column encodes a phonological feature — whether a speech sound is a continuant (airflow continues through the vocal tract) — using '+', '-', and '0' (unspecified/N-A) notation standard in distinctive feature theory. The dominant values are '+' (54.9%, 57,952 rows) and '-' (44,585 rows), which together account for ~97% of records. Surprisingly, 7 of the 9 unique values are composite strings like '-,+' or '-,-,+', suggesting some rows encode sequences or multi-segment entries rather than single-segment features, which is atypical and may indicate data quality or schema inconsistency. Entropy ratio of 0.37 reflects moderate concentration driven almost entirely by the binary +/- split.

coronal high anthropic:default

This column encodes coronal articulation features in what appears to be a phonological or linguistic dataset, using binary '+'/'-' notation common in distinctive feature theory. The dominant value is '-' (62.8% of 105,484 rows), with '+' covering another 35.0%, making the two primary values account for ~98% of observations. The remaining values ('+,-', '-,+', '-,-,+', '+,-,+') suggest multi-segment or compound entries, which are rare (≤87 occurrences) but worth flagging as potential encoding inconsistencies or multi-value cells that may need splitting.

delayedRelease high anthropic:default

This column encodes a delayed-release flag or classification for 105,484 records using a small set of symbolic tokens: '0' (no delay, 55% of rows), '-' (negative/early shift, 26%), '+' (positive/late shift, 19%), and compound combinations thereof. The compound values ('-,+', '0,-,+', '+,-', '0,0,-,+') suggest the column is sometimes multi-valued — packed as comma-separated strings — which is structurally inconsistent with a simple categorical and implies set-like semantics. No nulls are present, and the entropy ratio of 0.52 reflects a moderately skewed but non-trivial distribution dominated by the '0' class.

distributed medium anthropic:default

This column appears to encode a distribution or change-direction indicator, using a compact symbolic notation: '0' (no change), '-' (decrease), '+' (increase), and comma-separated sequences for multi-step or compound movements. The dominant value '0' covers 66% of 105,484 rows, with '-' (21.1%) and '+' (12.5%) accounting for most of the remainder. Surprisingly, 11 distinct values arise from combinations of these three symbols, suggesting some records capture sequences of directional changes rather than a single state. Entropy ratio of 0.37 confirms moderate but uneven information content, heavily concentrated in the single '0' class.

dorsal high anthropic:default

This column captures a dorsal-surface marking or pattern indicator for biological specimens (likely fish or reptiles), encoding the presence/absence of a feature using '+', '-', and '0' symbols — sometimes as comma-separated sequences indicating multiple zones or segments. The dominant values are '+' (51.7%, n=54,535) and '-' (n=47,052), together accounting for ~96.3% of rows, with 'neutral/absent' coded as '0' (n=2,160). Surprisingly, 11 of the 13 categories are multi-value strings like '-,+' or '-,-,+', suggesting some records encode ordered sequences of dorsal sub-regions rather than a single binary state, creating an implicit structural inconsistency that warrants normalization.

fortis high anthropic:default

This column is a three-valued categorical flag, likely representing a directional or strength indicator with values '-', '0', and '+'. The dominant value is '-' at 68.1% (71,867 rows), '0' accounts for 31.5% (33,202 rows), and '+' is strikingly rare at only 415 occurrences (~0.4%), creating a heavily imbalanced distribution. The near-absence of '+' values is a notable surprise and may indicate a rare positive condition, signal, or classification outcome worth investigating for class imbalance before modelling.

front medium anthropic:default

This column appears to encode a directional or signed-change indicator for a 'front' measurement, using values '0', '-', and '+' as atomic tokens that can be combined into sequences (e.g., '-,+', '+,-,-'). The dominant value '0' accounts for 46.75% of rows, with '-' (34,225) and '+' (20,683) making up most of the remainder — notably, negative changes outnumber positive ones by ~1.65:1. The compound sequence values ('-,+', '+,-', etc.) suggest multi-step event chains are encoded as a single string, which is an unusual encoding pattern that may require parsing before modelling.

high medium anthropic:default

This column encodes the direction of price/value movement at a 'high' point — likely a signal or pattern indicator from a time-series or financial dataset, using tokens '0' (no movement), '+' (up), and '-' (down), including multi-step sequences like '-,+' or '+,-,+'. The dominant value is '0' at 46.7% (49,247 rows), followed by '+' at 35,559 and '-' at 19,156, suggesting a strong asymmetry between upward and downward signals. The presence of compound sequence values (e.g., '+,-,+', '-,+,+') with very low frequency — some appearing only once or twice — indicates these multi-step patterns are rare edge cases that may need consolidation or separate treatment.

labial high anthropic:default

This column captures labial articulation features in a phonological or linguistic dataset, encoding presence/absence/neutrality of a labial feature per segment (or per segment sequence). The dominant value is '-' (absent) at 68.2% of 105,484 rows, with '+' (present) at 26.8%, suggesting most segments are non-labial. Surprisingly, ~2.9% of values are compound strings like '-,+', '+,-', '-,-,+', indicating multi-segment bundles packed into a single cell rather than a flat atomic feature, which creates a parsing challenge and implies the column is not fully normalized.

labiodental high anthropic:default

This column encodes labiodental phonetic feature annotations, likely from a linguistic or phonological dataset where each row represents a speech sound or segment. The dominant value '0' (74,124 rows, 70.3%) indicates the feature is absent or neutral, while '+' and '-' mark positive and negative feature values respectively — a standard binary distinctive-feature notation. Notably, multi-valued strings like '+,-', '-,+', and '+,+,-' appear (60 rows total), suggesting a small number of segments carry conflicting or composite annotations, which may indicate data entry inconsistency or deliberate underspecification. Entropy ratio of 0.39 confirms moderate imbalance with '0' dominating.

lateral high anthropic:default

This column encodes lateral direction or laterality coding, likely indicating the side(s) of a clinical finding, anatomical measurement, or test result, using '-' (e.g., left or negative), '+' (right or positive), and '0' (midline or neutral). The dominant value '-' accounts for 93.8% of all 105,484 rows, producing a very low entropy ratio of 0.134, meaning the column is heavily skewed toward a single class. Surprisingly, some values represent sequences of laterality codes (e.g., '-,+', '-,-,+'), suggesting multi-segment or bilateral recordings encoded as comma-delimited strings rather than separate fields. No nulls are present.

lenis high anthropic:default

This column represents a ternary linguistic or phonological feature, almost certainly a 'lenis' (weak/voiced) marker with values '-' (absent/irrelevant), '0' (neutral), and '+' (present/lenis). The dominant value is '-' at 68.1% (71,866 rows), while the positive lenis marker '+' is strikingly rare at only 416 occurrences (~0.4%), creating a heavily imbalanced distribution. No nulls exist across all 105,484 rows, and entropy is moderate at 0.93 bits, well below the theoretical maximum, confirming the skew toward the negative class.

long medium anthropic:default

This column encodes the sign or direction of longitude (or a signed numeric field), with only 6 distinct values across 105,484 rows. The dominant value '-' accounts for 89.9% of records, suggesting the dataset is heavily skewed toward negative longitudes (e.g., Western Hemisphere coordinates). The compound values '-,+', '+,-', and '-,-,+' appear in just 104 rows combined, hinting at rare multi-value or malformed entries that warrant inspection before modelling.

InventoryID numeric

rows105,484
null0 (0.0%)
unique3,020
min1.000
max3,020
mean1,479
median1,464
std843.111
q1769.000
q32,237
iqr1,468
skew-2.40e-03
kurtosis-1.146
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for InventoryID (median: 1464.0).
bincount
1 – 76.472729
76.47 – 151.92947
151.9 – 227.42742
227.4 – 302.92385
302.9 – 378.42399
378.4 – 453.82530
453.8 – 529.32270
529.3 – 604.82323
604.8 – 680.32533
680.3 – 755.83039
755.8 – 831.22896
831.2 – 906.72609
906.7 – 982.22693
982.2 – 10582513
1058 – 11332637
1133 – 12092514
1209 – 12842898
1284 – 13603314
1360 – 14353634
1435 – 15102993
1510 – 15863172
1586 – 16613290
1661 – 17372686
1737 – 18123001
1812 – 18881779
1888 – 19632046
1963 – 20391878
2039 – 21141864
2114 – 21902685
2190 – 22653387
2265 – 23413096
2341 – 24163142
2416 – 24923264
2492 – 25673557
2567 – 26432966
2643 – 27181975
2718 – 27941761
2794 – 28691659
2869 – 29451855
2945 – 30201823

Glottocode text

100.0% rows are a single word 95th-percentile length under 20 chars 97.9% duplicate strings
rows105,484
null0 (0.0%)
unique2,177
len_min2
len_max8
len_mean7.999
len_median8.000
len_p958.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates103,307
duplicate_rate0.979
vocab_size2,168
readability_flesch_mean94.148
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for Glottocode (mean: 7.998919267377043).
charscount
2 – 219
2 – 20
2 – 20
2 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 70
7 – 70
7 – 70
7 – 70
7 – 70
7 – 70
7 – 80
8 – 80
8 – 80
8 – 8105465
Sample values (first 10)
  1. lakk1252
  2. lazz1240
  3. kham1282
  4. cebu1242
  5. west2456
  6. gurd1238
  7. copi1238
  8. kham1282
  9. gata1239
  10. paez1247

ISO6393 text

100.0% rows are a single word 95th-percentile length under 20 chars 98.0% duplicate strings
rows105,484
null0 (0.0%)
unique2,095
len_min3
len_max3
len_mean3.000
len_median3.000
len_p953.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates103,389
duplicate_rate0.980
vocab_size2,086
readability_flesch_mean119.528
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for ISO6393 (mean: 3.0).
charscount
2 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 3105484
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 40
Sample values (first 10)
  1. lbe
  2. lzz
  3. khg
  4. ceb
  5. xwl
  6. gdj
  7. cce
  8. khg
  9. gaq
  10. pbb

LanguageName text

84.3% rows are a single word 13.1% rows are all-caps 95th-percentile length under 20 chars 97.4% duplicate strings
rows105,484
null0 (0.0%)
unique2,716
len_min2
len_max79
len_mean7.822
len_median7.000
len_p9516.000
word_mean1.201
word_median1.000
n_empty0
n_duplicates102,768
duplicate_rate0.974
vocab_size2,670
readability_flesch_mean53.181
emoji_rate0.000
url_rate0.000
one_word_rate0.843
allcaps_rate0.131
boilerplate_rate0.000
Show data table
Character-length distribution for LanguageName (mean: 7.822219483523567).
charscount
2 – 43915
4 – 629541
6 – 834409
8 – 1014985
10 – 127358
12 – 144849
14 – 153897
15 – 172893
17 – 191292
19 – 21993
21 – 23198
23 – 25224
25 – 27218
27 – 2939
29 – 3193
31 – 33130
33 – 3564
35 – 3723
37 – 3937
39 – 4057
40 – 420
42 – 4452
44 – 460
46 – 4823
48 – 500
50 – 520
52 – 5420
54 – 560
56 – 5840
58 – 600
60 – 6287
62 – 640
64 – 660
66 – 670
67 – 690
69 – 710
71 – 730
73 – 750
75 – 770
77 – 7947
Sample values (first 10)
  1. Lak
  2. Laz
  3. Kami Tibetan
  4. Cebuano
  5. western xwla
  6. Kurtjar
  7. Copi
  8. Kham Tibetan
  9. Gtaʔ
  10. Paez

SpecificDialect categorical

rows105,484
null0 (0.0%)
unique546
top_valueNA
top_rate0.719
cardinality546
entropy2.969
entropy_ratio0.326
Show data table
Top values for SpecificDialect (20 unique shown, of 546 total).
valuecountshare
NA7580771.9%
76927.3%
W21200.1%
Lezgian (Güne)960.1%
Santa920.1%
Central Pakistan830.1%
Babungo (Grassfields Bantu, Ring)820.1%
Scottish Gaelic (Lewis)820.1%
Tangari810.1%
Kanga760.1%
Kufa750.1%
Skolt Saami (Suõʹnnʼjel)750.1%
Standard Hindi (as spoken in Varanasi, Lucknow, Delhi etc.)740.1%
Standard (eastern)740.1%
Guovdageaidnu740.1%
Nuosu (Black Yi)740.1%
Northern Qiang (Yadu)730.1%
Bangladeshi Standard (spoken in Dhaka and other urban aread of Bangladesh)720.1%
Standard Italian700.1%
Chechen (Ploskost)700.1%
Top values (rank 1–20)
  1. NA — 75,807
  2. — 7,692
  3. W2 — 120
  4. Lezgian (Güne) — 96
  5. Santa — 92
  6. Central Pakistan — 83
  7. Babungo (Grassfields Bantu, Ring) — 82
  8. Scottish Gaelic (Lewis) — 82
  9. Tangari — 81
  10. Kanga — 76
  11. Kufa — 75
  12. Skolt Saami (Suõʹnnʼjel) — 75
  13. Standard Hindi (as spoken in Varanasi, Lucknow, Delhi etc.) — 74
  14. Standard (eastern) — 74
  15. Guovdageaidnu — 74
  16. Nuosu (Black Yi) — 74
  17. Northern Qiang (Yadu) — 73
  18. Bangladeshi Standard (spoken in Dhaka and other urban aread of Bangladesh) — 72
  19. Standard Italian — 70
  20. Chechen (Ploskost) — 70

GlyphID text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 97.0% duplicate strings
rows105,484
null0 (0.0%)
unique3,142
len_min4
len_max54
len_mean6.503
len_median4.000
len_p9514.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates102,342
duplicate_rate0.970
vocab_size1,343
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Show data table
Character-length distribution for GlyphID (mean: 6.503033635432862).
charscount
4 – 567114
5 – 60
6 – 80
8 – 90
9 – 1028726
10 – 120
12 – 130
13 – 140
14 – 156559
15 – 160
16 – 180
18 – 190
19 – 202225
20 – 220
22 – 230
23 – 240
24 – 25401
25 – 260
26 – 280
28 – 290
29 – 30267
30 – 320
32 – 330
33 – 340
34 – 35104
35 – 360
36 – 380
38 – 390
39 – 405
40 – 420
42 – 430
43 – 440
44 – 4570
45 – 460
46 – 480
48 – 490
49 – 501
50 – 520
52 – 530
53 – 5412
Sample values (first 10)
  1. 03C7
  2. 0075
  3. 006E+0064+007A
  4. 0072
  5. 02E6
  6. 0268
  7. 0064+0324+026E+0324
  8. 0075
  9. 0069+0303
  10. 0061+0303

Phoneme text

100.0% rows are a single word 95th-percentile length under 20 chars 97.0% duplicate strings
rows105,484
null0 (0.0%)
unique3,142
len_min1
len_max11
len_mean1.501
len_median1.000
len_p953.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates102,342
duplicate_rate0.970
vocab_size1,339
readability_flesch_mean114.439
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.75e-03
boilerplate_rate0.000
Show data table
Character-length distribution for Phoneme (mean: 1.5006067270865724).
charscount
1 – 167114
1 – 20
2 – 20
2 – 20
2 – 228726
2 – 20
2 – 30
3 – 30
3 – 36559
3 – 40
4 – 40
4 – 40
4 – 42225
4 – 40
4 – 50
5 – 50
5 – 5401
5 – 60
6 – 60
6 – 60
6 – 6267
6 – 60
6 – 70
7 – 70
7 – 7104
7 – 80
8 – 80
8 – 80
8 – 85
8 – 80
8 – 90
9 – 90
9 – 970
9 – 100
10 – 100
10 – 100
10 – 101
10 – 100
10 – 110
11 – 1112
Sample values (first 10)
  1. χ
  2. u
  3. ndz
  4. r
  5. ˦
  6. ɨ
  7. d̤ɮ̤
  8. u

Allophones text

91.3% rows are a single word 95th-percentile length under 20 chars 93.5% duplicate strings
rows105,484
null0 (0.0%)
unique6,892
len_min1
len_max37
len_mean2.083
len_median2.000
len_p954.000
word_mean1.129
word_median1.000
n_empty0
n_duplicates98,592
duplicate_rate0.935
vocab_size1,263
readability_flesch_mean116.186
emoji_rate0.000
url_rate0.000
one_word_rate0.913
allcaps_rate2.91e-03
boilerplate_rate0.000
Show data table
Character-length distribution for Allophones (mean: 2.0834154942929732).
charscount
1 – 226762
2 – 366019
3 – 44934
4 – 53125
5 – 61246
6 – 61181
6 – 7789
7 – 8409
8 – 9334
9 – 100
10 – 11226
11 – 12160
12 – 1369
13 – 1465
14 – 1441
14 – 1536
15 – 1612
16 – 1716
17 – 1812
18 – 190
19 – 2014
20 – 219
21 – 228
22 – 230
23 – 243
24 – 243
24 – 253
25 – 263
26 – 272
27 – 280
28 – 291
29 – 300
30 – 310
31 – 320
32 – 320
32 – 330
33 – 341
34 – 350
35 – 360
36 – 371
Sample values (first 10)
  1. χ
  2. NA
  3. NA
  4. NA
  5. ˦
  6. NA
  7. d̤ɮ̤
  8. NA
  9. NA
  10. ã ə̃

Marginal categorical

rows105,484
null0 (0.0%)
unique3
top_valueFALSE
top_rate0.789
cardinality3
entropy0.812
entropy_ratio0.512
Show data table
Top values for Marginal (3 unique shown, of 3 total).
valuecountshare
FALSE8326378.9%
NA2087419.8%
TRUE13471.3%
Top values (rank 1–20)
  1. FALSE — 83,263
  2. NA — 20,874
  3. TRUE — 1,347

SegmentClass categorical

rows105,484
null0 (0.0%)
unique3
top_valueconsonant
top_rate0.685
cardinality3
entropy1.008
entropy_ratio0.636
Show data table
Top values for SegmentClass (3 unique shown, of 3 total).
valuecountshare
consonant7228268.5%
vowel3105229.4%
tone21502.0%
Top values (rank 1–20)
  1. consonant — 72,282
  2. vowel — 31,052
  3. tone — 2,150

Source categorical

rows105,484
null0 (0.0%)
unique8
top_valueph
top_rate0.344
cardinality8
entropy2.697
entropy_ratio0.899
Show data table
Top values for Source (8 unique shown, of 8 total).
valuecountshare
ph3627434.4%
ea1688316.0%
upsid1396613.2%
er94238.9%
saphon90478.6%
aa80647.6%
spa75667.2%
ra42614.0%
Top values (rank 1–20)
  1. ph — 36,274
  2. ea — 16,883
  3. upsid — 13,966
  4. er — 9,423
  5. saphon — 9,047
  6. aa — 8,064
  7. spa — 7,566
  8. ra — 4,261

tone categorical

top value is 98.0% of rows
rows105,484
null0 (0.0%)
unique2
top_value0
top_rate0.980
cardinality2
entropy0.144
entropy_ratio0.144
Show data table
Top values for tone (2 unique shown, of 2 total).
valuecountshare
010333498.0%
+21502.0%
Top values (rank 1–20)
  1. 0 — 103,334
  2. + — 2,150

stress categorical

top value is 98.0% of rows
rows105,484
null0 (0.0%)
unique2
top_value-
top_rate0.980
cardinality2
entropy0.144
entropy_ratio0.144
Show data table
Top values for stress (2 unique shown, of 2 total).
valuecountshare
-10333498.0%
021502.0%
Top values (rank 1–20)
  1. - — 103,334
  2. 0 — 2,150

syllabic categorical

rows105,484
null0 (0.0%)
unique8
top_value-
top_rate0.685
cardinality8
entropy1.042
entropy_ratio0.347
Show data table
Top values for syllabic (8 unique shown, of 8 total).
valuecountshare
-7224868.5%
+3069229.1%
021502.0%
+,-2440.2%
-,+1240.1%
-,+,-120.0%
-,+,+120.0%
+,+,-20.0%
Top values (rank 1–20)
  1. - — 72,248
  2. + — 30,692
  3. 0 — 2,150
  4. +,- — 244
  5. -,+ — 124
  6. -,+,- — 12
  7. -,+,+ — 12
  8. +,+,- — 2

short categorical

top value is 97.8% of rows
rows105,484
null0 (0.0%)
unique4
top_value-
top_rate0.978
cardinality4
entropy0.164
entropy_ratio0.082
Show data table
Top values for short (4 unique shown, of 4 total).
valuecountshare
-10312597.8%
021502.0%
+2040.2%
-,+50.0%
Top values (rank 1–20)
  1. - — 103,125
  2. 0 — 2,150
  3. + — 204
  4. -,+ — 5

long categorical

rows105,484
null0 (0.0%)
unique6
top_value-
top_rate0.899
cardinality6
entropy0.554
entropy_ratio0.214
Show data table
Top values for long (6 unique shown, of 6 total).
valuecountshare
-9484489.9%
+83868.0%
021502.0%
-,+630.1%
+,-400.0%
-,-,+10.0%
Top values (rank 1–20)
  1. - — 94,844
  2. + — 8,386
  3. 0 — 2,150
  4. -,+ — 63
  5. +,- — 40
  6. -,-,+ — 1

consonantal categorical

rows105,484
null0 (0.0%)
unique5
top_value+
top_rate0.609
cardinality5
entropy1.085
entropy_ratio0.467
Show data table
Top values for consonantal (5 unique shown, of 5 total).
valuecountshare
+6425760.9%
-3904137.0%
021512.0%
+,-340.0%
-,+10.0%
Top values (rank 1–20)
  1. + — 64,257
  2. - — 39,041
  3. 0 — 2,151
  4. +,- — 34
  5. -,+ — 1

sonorant categorical

rows105,484
null0 (0.0%)
unique8
top_value+
top_rate0.530
cardinality8
entropy1.245
entropy_ratio0.415
Show data table
Top values for sonorant (8 unique shown, of 8 total).
valuecountshare
+5592053.0%
-4532243.0%
021502.0%
+,-19481.8%
-,+890.1%
+,-,-290.0%
+,-,+250.0%
+,-,+,-10.0%
Top values (rank 1–20)
  1. + — 55,920
  2. - — 45,322
  3. 0 — 2,150
  4. +,- — 1,948
  5. -,+ — 89
  6. +,-,- — 29
  7. +,-,+ — 25
  8. +,-,+,- — 1

continuant categorical

rows105,484
null0 (0.0%)
unique9
top_value+
top_rate0.549
cardinality9
entropy1.172
entropy_ratio0.370
Show data table
Top values for continuant (9 unique shown, of 9 total).
valuecountshare
+5795254.9%
-4458542.3%
021512.0%
-,+7280.7%
-,-,+500.0%
+,-90.0%
0,-,+40.0%
-,+,+40.0%
0,0,-,+10.0%
Top values (rank 1–20)
  1. + — 57,952
  2. - — 44,585
  3. 0 — 2,151
  4. -,+ — 728
  5. -,-,+ — 50
  6. +,- — 9
  7. 0,-,+ — 4
  8. -,+,+ — 4
  9. 0,0,-,+ — 1

delayedRelease categorical

rows105,484
null0 (0.0%)
unique7
top_value0
top_rate0.550
cardinality7
entropy1.471
entropy_ratio0.524
Show data table
Top values for delayedRelease (7 unique shown, of 7 total).
valuecountshare
05803555.0%
-2738426.0%
+1953318.5%
-,+4920.5%
0,-,+330.0%
+,-60.0%
0,0,-,+10.0%
Top values (rank 1–20)
  1. 0 — 58,035
  2. - — 27,384
  3. + — 19,533
  4. -,+ — 492
  5. 0,-,+ — 33
  6. +,- — 6
  7. 0,0,-,+ — 1

approximant categorical

rows105,484
null0 (0.0%)
unique6
top_value-
top_rate0.559
cardinality6
entropy1.120
entropy_ratio0.433
Show data table
Top values for approximant (6 unique shown, of 6 total).
valuecountshare
-5896655.9%
+4426642.0%
021502.0%
-,+710.1%
-,-,+250.0%
+,-60.0%
Top values (rank 1–20)
  1. - — 58,966
  2. + — 44,266
  3. 0 — 2,150
  4. -,+ — 71
  5. -,-,+ — 25
  6. +,- — 6

tap categorical

top value is 96.7% of rows
rows105,484
null0 (0.0%)
unique5
top_value-
top_rate0.967
cardinality5
entropy0.242
entropy_ratio0.104
Show data table
Top values for tap (5 unique shown, of 5 total).
valuecountshare
-10202396.7%
022032.1%
+12181.2%
-,+250.0%
-,-,+150.0%
Top values (rank 1–20)
  1. - — 102,023
  2. 0 — 2,203
  3. + — 1,218
  4. -,+ — 25
  5. -,-,+ — 15

trill categorical

top value is 96.2% of rows
rows105,484
null0 (0.0%)
unique6
top_value-
top_rate0.962
cardinality6
entropy0.276
entropy_ratio0.107
Show data table
Top values for trill (6 unique shown, of 6 total).
valuecountshare
-10142796.2%
022022.1%
+18191.7%
-,+260.0%
-,-,+80.0%
+,-20.0%
Top values (rank 1–20)
  1. - — 101,427
  2. 0 — 2,202
  3. + — 1,819
  4. -,+ — 26
  5. -,-,+ — 8
  6. +,- — 2

nasal categorical

rows105,484
null0 (0.0%)
unique8
top_value-
top_rate0.808
cardinality8
entropy0.897
entropy_ratio0.299
Show data table
Top values for nasal (8 unique shown, of 8 total).
valuecountshare
-8526980.8%
+1594115.1%
021502.0%
+,-19731.9%
-,+950.1%
+,-,-540.1%
+,-,+,-10.0%
-,+,-10.0%
Top values (rank 1–20)
  1. - — 85,269
  2. + — 15,941
  3. 0 — 2,150
  4. +,- — 1,973
  5. -,+ — 95
  6. +,-,- — 54
  7. +,-,+,- — 1
  8. -,+,- — 1

lateral categorical

rows105,484
null0 (0.0%)
unique8
top_value-
top_rate0.938
cardinality8
entropy0.401
entropy_ratio0.134
Show data table
Top values for lateral (8 unique shown, of 8 total).
valuecountshare
-9896893.8%
+42114.0%
021502.0%
-,+1350.1%
+,-120.0%
-,-,+40.0%
-,+,-30.0%
0,-,+10.0%
Top values (rank 1–20)
  1. - — 98,968
  2. + — 4,211
  3. 0 — 2,150
  4. -,+ — 135
  5. +,- — 12
  6. -,-,+ — 4
  7. -,+,- — 3
  8. 0,-,+ — 1

labial categorical

rows105,484
null0 (0.0%)
unique15
top_value-
top_rate0.682
cardinality15
entropy1.182
entropy_ratio0.302
Show data table
Top values for labial (15 unique shown, of 15 total).
valuecountshare
-7196168.2%
+2824126.8%
-,+24142.3%
021602.0%
+,-5310.5%
-,-,+1210.1%
+,-,-210.0%
0,+,-80.0%
-,+,-60.0%
0,-,+50.0%
-,+,+50.0%
+,+,-50.0%
+,-,+40.0%
-,-,+,+10.0%
0,+,-,-10.0%
Top values (rank 1–20)
  1. - — 71,961
  2. + — 28,241
  3. -,+ — 2,414
  4. 0 — 2,160
  5. +,- — 531
  6. -,-,+ — 121
  7. +,-,- — 21
  8. 0,+,- — 8
  9. -,+,- — 6
  10. 0,-,+ — 5
  11. -,+,+ — 5
  12. +,+,- — 5
  13. +,-,+ — 4
  14. -,-,+,+ — 1
  15. 0,+,-,- — 1

round categorical

rows105,484
null0 (0.0%)
unique8
top_value0
top_rate0.703
cardinality8
entropy1.194
entropy_ratio0.398
Show data table
Top values for round (8 unique shown, of 8 total).
valuecountshare
07415570.3%
+1695616.1%
-1408213.3%
-,+2690.3%
-,-,+170.0%
-,0,+30.0%
0,-,+10.0%
+,-10.0%
Top values (rank 1–20)
  1. 0 — 74,155
  2. + — 16,956
  3. - — 14,082
  4. -,+ — 269
  5. -,-,+ — 17
  6. -,0,+ — 3
  7. 0,-,+ — 1
  8. +,- — 1

labiodental categorical

rows105,484
null0 (0.0%)
unique6
top_value0
top_rate0.703
cardinality6
entropy1.006
entropy_ratio0.389
Show data table
Top values for labiodental (6 unique shown, of 6 total).
valuecountshare
07412470.3%
-2872627.2%
+25742.4%
+,-560.1%
-,+30.0%
+,+,-10.0%
Top values (rank 1–20)
  1. 0 — 74,124
  2. - — 28,726
  3. + — 2,574
  4. +,- — 56
  5. -,+ — 3
  6. +,+,- — 1

coronal categorical

rows105,484
null0 (0.0%)
unique7
top_value-
top_rate0.628
cardinality7
entropy1.080
entropy_ratio0.385
Show data table
Top values for coronal (7 unique shown, of 7 total).
valuecountshare
-6623462.8%
+3695535.0%
021602.0%
+,-870.1%
-,+410.0%
-,-,+60.0%
+,-,+10.0%
Top values (rank 1–20)
  1. - — 66,234
  2. + — 36,955
  3. 0 — 2,160
  4. +,- — 87
  5. -,+ — 41
  6. -,-,+ — 6
  7. +,-,+ — 1

anterior categorical

rows105,484
null0 (0.0%)
unique6
top_value0
top_rate0.648
cardinality6
entropy1.251
entropy_ratio0.484
Show data table
Top values for anterior (6 unique shown, of 6 total).
valuecountshare
06837264.8%
+2570424.4%
-1139110.8%
-,+90.0%
+,-50.0%
-,-,+30.0%
Top values (rank 1–20)
  1. 0 — 68,372
  2. + — 25,704
  3. - — 11,391
  4. -,+ — 9
  5. +,- — 5
  6. -,-,+ — 3

distributed categorical

rows105,484
null0 (0.0%)
unique11
top_value0
top_rate0.660
cardinality11
entropy1.273
entropy_ratio0.368
Show data table
Top values for distributed (11 unique shown, of 11 total).
valuecountshare
06963966.0%
-2228321.1%
+1322812.5%
-,+2960.3%
-,-,+250.0%
+,-50.0%
0,-,+30.0%
+,-,+20.0%
0,+,-10.0%
+,+,-10.0%
0,0,-,+10.0%
Top values (rank 1–20)
  1. 0 — 69,639
  2. - — 22,283
  3. + — 13,228
  4. -,+ — 296
  5. -,-,+ — 25
  6. +,- — 5
  7. 0,-,+ — 3
  8. +,-,+ — 2
  9. 0,+,- — 1
  10. +,+,- — 1
  11. 0,0,-,+ — 1

strident categorical

rows105,484
null0 (0.0%)
unique9
top_value0
top_rate0.649
cardinality9
entropy1.287
entropy_ratio0.406
Show data table
Top values for strident (9 unique shown, of 9 total).
valuecountshare
06841064.9%
-2541024.1%
+1103910.5%
-,+5850.6%
-,-,+260.0%
+,-70.0%
-,+,-30.0%
0,-,+30.0%
0,0,-,+10.0%
Top values (rank 1–20)
  1. 0 — 68,410
  2. - — 25,410
  3. + — 11,039
  4. -,+ — 585
  5. -,-,+ — 26
  6. +,- — 7
  7. -,+,- — 3
  8. 0,-,+ — 3
  9. 0,0,-,+ — 1

dorsal categorical

rows105,484
null0 (0.0%)
unique13
top_value+
top_rate0.517
cardinality13
entropy1.235
entropy_ratio0.334
Show data table
Top values for dorsal (13 unique shown, of 13 total).
valuecountshare
+5453551.7%
-4705244.6%
021602.0%
-,+15301.5%
+,-1440.1%
-,-,+440.0%
0,-,+60.0%
+,-,+50.0%
-,+,+40.0%
+,+,-,-10.0%
-,+,-10.0%
+,+,-10.0%
0,0,-,+10.0%
Top values (rank 1–20)
  1. + — 54,535
  2. - — 47,052
  3. 0 — 2,160
  4. -,+ — 1,530
  5. +,- — 144
  6. -,-,+ — 44
  7. 0,-,+ — 6
  8. +,-,+ — 5
  9. -,+,+ — 4
  10. +,+,-,- — 1
  11. -,+,- — 1
  12. +,+,- — 1
  13. 0,0,-,+ — 1

high categorical

rows105,484
null0 (0.0%)
unique11
top_value0
top_rate0.467
cardinality11
entropy1.594
entropy_ratio0.461
Show data table
Top values for high (11 unique shown, of 11 total).
valuecountshare
04924746.7%
+3555933.7%
-1915618.2%
-,+8450.8%
+,-6270.6%
+,-,+380.0%
+,+,-60.0%
-,+,+20.0%
-,-,+20.0%
+,-,010.0%
-,+,-10.0%
Top values (rank 1–20)
  1. 0 — 49,247
  2. + — 35,559
  3. - — 19,156
  4. -,+ — 845
  5. +,- — 627
  6. +,-,+ — 38
  7. +,+,- — 6
  8. -,+,+ — 2
  9. -,-,+ — 2
  10. +,-,0 — 1
  11. -,+,- — 1

low categorical

rows105,484
null0 (0.0%)
unique8
top_value-
top_rate0.473
cardinality8
entropy1.305
entropy_ratio0.435
Show data table
Top values for low (8 unique shown, of 8 total).
valuecountshare
-4993047.3%
04924446.7%
+55985.3%
+,-4170.4%
-,+2700.3%
-,+,-210.0%
-,-,+30.0%
+,-,-10.0%
Top values (rank 1–20)
  1. - — 49,930
  2. 0 — 49,244
  3. + — 5,598
  4. +,- — 417
  5. -,+ — 270
  6. -,+,- — 21
  7. -,-,+ — 3
  8. +,-,- — 1

front categorical

rows105,484
null0 (0.0%)
unique13
top_value0
top_rate0.468
cardinality13
entropy1.592
entropy_ratio0.430
Show data table
Top values for front (13 unique shown, of 13 total).
valuecountshare
04931646.8%
-3422532.4%
+2068319.6%
-,+8380.8%
+,-3590.3%
-,-,+240.0%
+,-,-140.0%
-,+,+100.0%
+,-,+60.0%
-,0,+30.0%
+,+,-20.0%
0,-,+20.0%
-,+,-20.0%
Top values (rank 1–20)
  1. 0 — 49,316
  2. - — 34,225
  3. + — 20,683
  4. -,+ — 838
  5. +,- — 359
  6. -,-,+ — 24
  7. +,-,- — 14
  8. -,+,+ — 10
  9. +,-,+ — 6
  10. -,0,+ — 3
  11. +,+,- — 2
  12. 0,-,+ — 2
  13. -,+,- — 2

back categorical

rows105,484
null0 (0.0%)
unique12
top_value0
top_rate0.467
cardinality12
entropy1.521
entropy_ratio0.424
Show data table
Top values for back (12 unique shown, of 12 total).
valuecountshare
04927046.7%
-3974937.7%
+1554714.7%
+,-5110.5%
-,+3670.3%
+,-,-190.0%
-,-,+80.0%
-,+,+50.0%
-,+,-50.0%
0,+,-10.0%
+,-,+10.0%
+,+,-10.0%
Top values (rank 1–20)
  1. 0 — 49,270
  2. - — 39,749
  3. + — 15,547
  4. +,- — 511
  5. -,+ — 367
  6. +,-,- — 19
  7. -,-,+ — 8
  8. -,+,+ — 5
  9. -,+,- — 5
  10. 0,+,- — 1
  11. +,-,+ — 1
  12. +,+,- — 1

tense categorical

rows105,484
null0 (0.0%)
unique8
top_value0
top_rate0.713
cardinality8
entropy1.114
entropy_ratio0.371
Show data table
Top values for tense (8 unique shown, of 8 total).
valuecountshare
07523071.3%
+2341122.2%
-63866.1%
+,-2680.3%
-,+1790.2%
+,-,+60.0%
+,-,-30.0%
+,+,-10.0%
Top values (rank 1–20)
  1. 0 — 75,230
  2. + — 23,411
  3. - — 6,386
  4. +,- — 268
  5. -,+ — 179
  6. +,-,+ — 6
  7. +,-,- — 3
  8. +,+,- — 1

retractedTongueRoot categorical

top value is 97.4% of rows
rows105,484
null0 (0.0%)
unique7
top_value-
top_rate0.974
cardinality7
entropy0.193
entropy_ratio0.069
Show data table
Top values for retractedTongueRoot (7 unique shown, of 7 total).
valuecountshare
-10278897.4%
022352.1%
-,+2510.2%
+1990.2%
-,-,+90.0%
-,+,-10.0%
+,-10.0%
Top values (rank 1–20)
  1. - — 102,788
  2. 0 — 2,235
  3. -,+ — 251
  4. + — 199
  5. -,-,+ — 9
  6. -,+,- — 1
  7. +,- — 1

advancedTongueRoot categorical

top value is 97.9% of rows
rows105,484
null0 (0.0%)
unique3
top_value-
top_rate0.979
cardinality3
entropy0.150
entropy_ratio0.094
Show data table
Top values for advancedTongueRoot (3 unique shown, of 3 total).
valuecountshare
-10323897.9%
022352.1%
+110.0%
Top values (rank 1–20)
  1. - — 103,238
  2. 0 — 2,235
  3. + — 11

periodicGlottalSource categorical

rows105,484
null0 (0.0%)
unique7
top_value+
top_rate0.680
cardinality7
entropy1.051
entropy_ratio0.375
Show data table
Top values for periodicGlottalSource (7 unique shown, of 7 total).
valuecountshare
+7169468.0%
-3117929.6%
021392.0%
+,-3710.4%
-,+870.1%
+,-,-80.0%
+,-,+60.0%
Top values (rank 1–20)
  1. + — 71,694
  2. - — 31,179
  3. 0 — 2,139
  4. +,- — 371
  5. -,+ — 87
  6. +,-,- — 8
  7. +,-,+ — 6

epilaryngealSource categorical

top value is 97.9% of rows
rows105,484
null0 (0.0%)
unique3
top_value-
top_rate0.979
cardinality3
entropy0.147
entropy_ratio0.093
Show data table
Top values for epilaryngealSource (3 unique shown, of 3 total).
valuecountshare
-10330397.9%
021502.0%
+310.0%
Top values (rank 1–20)
  1. - — 103,303
  2. 0 — 2,150
  3. + — 31

spreadGlottis categorical

rows105,484
null0 (0.0%)
unique10
top_value-
top_rate0.918
cardinality10
entropy0.497
entropy_ratio0.149
Show data table
Top values for spreadGlottis (10 unique shown, of 10 total).
valuecountshare
-9685591.8%
+61565.8%
021382.0%
-,+2060.2%
+,-1150.1%
-,-,+50.0%
+,0,-50.0%
+,-,-20.0%
+,0,-,-10.0%
+,-,+10.0%
Top values (rank 1–20)
  1. - — 96,855
  2. + — 6,156
  3. 0 — 2,138
  4. -,+ — 206
  5. +,- — 115
  6. -,-,+ — 5
  7. +,0,- — 5
  8. +,-,- — 2
  9. +,0,-,- — 1
  10. +,-,+ — 1

constrictedGlottis categorical

rows105,484
null0 (0.0%)
unique7
top_value-
top_rate0.945
cardinality7
entropy0.372
entropy_ratio0.132
Show data table
Top values for constrictedGlottis (7 unique shown, of 7 total).
valuecountshare
-9972794.5%
+33833.2%
021382.0%
+,-1410.1%
-,+930.1%
+,-,-10.0%
-,-,+10.0%
Top values (rank 1–20)
  1. - — 99,727
  2. + — 3,383
  3. 0 — 2,138
  4. +,- — 141
  5. -,+ — 93
  6. +,-,- — 1
  7. -,-,+ — 1

fortis categorical

rows105,484
null0 (0.0%)
unique3
top_value-
top_rate0.681
cardinality3
entropy0.934
entropy_ratio0.589
Show data table
Top values for fortis (3 unique shown, of 3 total).
valuecountshare
-7186768.1%
03320231.5%
+4150.4%
Top values (rank 1–20)
  1. - — 71,867
  2. 0 — 33,202
  3. + — 415

lenis categorical

rows105,484
null0 (0.0%)
unique3
top_value-
top_rate0.681
cardinality3
entropy0.934
entropy_ratio0.589
Show data table
Top values for lenis (3 unique shown, of 3 total).
valuecountshare
-7186668.1%
03320231.5%
+4160.4%
Top values (rank 1–20)
  1. - — 71,866
  2. 0 — 33,202
  3. + — 416

raisedLarynxEjective categorical

top value is 96.4% of rows
rows105,484
null0 (0.0%)
unique6
top_value-
top_rate0.964
cardinality6
entropy0.267
entropy_ratio0.103
Show data table
Top values for raisedLarynxEjective (6 unique shown, of 6 total).
valuecountshare
-10165296.4%
021502.0%
+15731.5%
-,+850.1%
+,-230.0%
-,-,+10.0%
Top values (rank 1–20)
  1. - — 101,652
  2. 0 — 2,150
  3. + — 1,573
  4. -,+ — 85
  5. +,- — 23
  6. -,-,+ — 1

loweredLarynxImplosive categorical

top value is 97.3% of rows
rows105,484
null0 (0.0%)
unique5
top_value-
top_rate0.973
cardinality5
entropy0.203
entropy_ratio0.088
Show data table
Top values for loweredLarynxImplosive (5 unique shown, of 5 total).
valuecountshare
-10260997.3%
021502.0%
+7160.7%
-,+70.0%
+,-20.0%
Top values (rank 1–20)
  1. - — 102,609
  2. 0 — 2,150
  3. + — 716
  4. -,+ — 7
  5. +,- — 2

click categorical

rows105,484
null0 (0.0%)
unique5
top_value-
top_rate0.682
cardinality5
entropy0.928
entropy_ratio0.400
Show data table
Top values for click (5 unique shown, of 5 total).
valuecountshare
-7197168.2%
03320231.5%
+2530.2%
+,-520.0%
-,+60.0%
Top values (rank 1–20)
  1. - — 71,971
  2. 0 — 33,202
  3. + — 253
  4. +,- — 52
  5. -,+ — 6