data trove wlasl word level american sign language
Reading
This dataset appears to be a sign language lexicon index (WLASL — Word-Level American Sign Language), containing 2,000 entries each pairing a gloss (a written word label for a sign) with associated instances, likely video or image examples. Every gloss is unique, confirming this is a vocabulary index rather than a repeated-observation log. The gloss labels are almost entirely single words (97.75% one-word rate) and are short, averaging just 6 characters, covering everyday vocabulary like 'up', 'hearing', 'dog', and 'hot'. The most interesting angle to explore is the 'instances' column, which is currently unanalysed — the number of example instances per sign likely varies considerably and would reveal which signs are well-represented versus data-sparse.
citing: row_count · column_count · columns[0].n_unique · columns[0].stats.one_word_rate · columns[0].stats.len_mean · columns[0].stats.len_min · columns[0].stats.len_max · columns[0].top_words · columns[1].alerts
Charts the summary said to look at first
Show data table
| chars | count |
|---|---|
| 1 – 1 | 21 |
| 1 – 2 | 0 |
| 2 – 2 | 13 |
| 2 – 2 | 0 |
| 2 – 3 | 0 |
| 3 – 3 | 139 |
| 3 – 4 | 0 |
| 4 – 4 | 0 |
| 4 – 4 | 377 |
| 4 – 5 | 0 |
| 5 – 5 | 376 |
| 5 – 6 | 0 |
| 6 – 6 | 0 |
| 6 – 6 | 337 |
| 6 – 7 | 0 |
| 7 – 7 | 0 |
| 7 – 7 | 285 |
| 7 – 8 | 0 |
| 8 – 8 | 189 |
| 8 – 8 | 0 |
| 8 – 9 | 0 |
| 9 – 9 | 127 |
| 9 – 10 | 0 |
| 10 – 10 | 0 |
| 10 – 10 | 66 |
| 10 – 11 | 0 |
| 11 – 11 | 35 |
| 11 – 12 | 0 |
| 12 – 12 | 0 |
| 12 – 12 | 19 |
| 12 – 13 | 0 |
| 13 – 13 | 0 |
| 13 – 13 | 10 |
| 13 – 14 | 0 |
| 14 – 14 | 3 |
| 14 – 14 | 0 |
| 14 – 15 | 0 |
| 15 – 15 | 2 |
| 15 – 16 | 0 |
| 16 – 16 | 1 |
Show data table
| chars | count |
|---|---|
| 1 – 1 | 21 |
| 1 – 2 | 0 |
| 2 – 2 | 13 |
| 2 – 2 | 0 |
| 2 – 3 | 0 |
| 3 – 3 | 139 |
| 3 – 4 | 0 |
| 4 – 4 | 0 |
| 4 – 4 | 377 |
| 4 – 5 | 0 |
| 5 – 5 | 376 |
| 5 – 6 | 0 |
| 6 – 6 | 0 |
| 6 – 6 | 337 |
| 6 – 7 | 0 |
| 7 – 7 | 0 |
| 7 – 7 | 285 |
| 7 – 8 | 0 |
| 8 – 8 | 189 |
| 8 – 8 | 0 |
| 8 – 9 | 0 |
| 9 – 9 | 127 |
| 9 – 10 | 0 |
| 10 – 10 | 0 |
| 10 – 10 | 66 |
| 10 – 11 | 0 |
| 11 – 11 | 35 |
| 11 – 12 | 0 |
| 12 – 12 | 0 |
| 12 – 12 | 19 |
| 12 – 13 | 0 |
| 13 – 13 | 0 |
| 13 – 13 | 10 |
| 13 – 14 | 0 |
| 14 – 14 | 3 |
| 14 – 14 | 0 |
| 14 – 15 | 0 |
| 15 – 15 | 2 |
| 15 – 16 | 0 |
| 16 – 16 | 1 |
Show data table
| chars | count |
|---|---|
| 1 – 1 | 21 |
| 1 – 2 | 0 |
| 2 – 2 | 13 |
| 2 – 2 | 0 |
| 2 – 3 | 0 |
| 3 – 3 | 139 |
| 3 – 4 | 0 |
| 4 – 4 | 0 |
| 4 – 4 | 377 |
| 4 – 5 | 0 |
| 5 – 5 | 376 |
| 5 – 6 | 0 |
| 6 – 6 | 0 |
| 6 – 6 | 337 |
| 6 – 7 | 0 |
| 7 – 7 | 0 |
| 7 – 7 | 285 |
| 7 – 8 | 0 |
| 8 – 8 | 189 |
| 8 – 8 | 0 |
| 8 – 9 | 0 |
| 9 – 9 | 127 |
| 9 – 10 | 0 |
| 10 – 10 | 0 |
| 10 – 10 | 66 |
| 10 – 11 | 0 |
| 11 – 11 | 35 |
| 11 – 12 | 0 |
| 12 – 12 | 0 |
| 12 – 12 | 19 |
| 12 – 13 | 0 |
| 13 – 13 | 0 |
| 13 – 13 | 10 |
| 13 – 14 | 0 |
| 14 – 14 | 3 |
| 14 – 14 | 0 |
| 14 – 15 | 0 |
| 15 – 15 | 2 |
| 15 – 16 | 0 |
| 16 – 16 | 1 |
Schema
2 columns| Alerts | ||||
|---|---|---|---|---|
| gloss | text | 0.0% | 2,000 |
near_unique
one_word
short_text
|
| instances | unknown | 0.0% | — |
skipped
|
gloss
text label near_unique one_word short_textThis column contains linguistic glosses — short, single-word (or near-single-word) labels typically used in linguistics datasets to provide the English translation or morphological tag for a lexical item. With 2000 rows, 2000 unique values, and zero duplicates, every gloss is distinct, which is consistent with a vocabulary or lexicon dataset where each entry has a unique meaning. The near-complete one-word rate (97.75%) and mean token length of ~6 characters align with single English words or abbreviations; top words like 'up', 'hearing', 'dog', and 'take' reinforce a natural-language vocabulary context. The fully unique distribution means this column functions effectively as an identifier and would carry no predictive signal in modelling. Treatment: Use as a human-readable label or key; drop from feature matrices, or embed with a lightweight word encoder if semantic content is needed.
- n
- 2,000
- nulls
- 0 (0.0%)
- unique
- 2,000
- len_min
- 1
- len_max
- 16
- len_mean
- 6.008
- len_median
- 6
- len_p95
- 10
- word_mean
- 1.024
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,984
- readability_flesch_mean
- 54.58
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.9775
- allcaps_rate
- 0
- boilerplate_rate
- 0
instances
unknown other skippedThis column ('instances') was skipped during profiling, so almost no statistical evidence is available. With 2,000 rows, zero nulls, and no computed stats or uniqueness count, its data type and distribution are entirely unknown. The 'skipped' alert suggests the profiler either encountered an unsupported type or was explicitly configured to bypass this column. Treatment: Manually inspect raw values to determine type and semantics before assigning a role or transformation.
- n
- 2,000
- nulls
- 0 (0.0%)
- unique
- —