saturn·

data trove wlasl word level american sign language

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/accessibility/wlasl_index.json

Saturn profiled 2,000 rows across 2 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/accessibility/wlasl_index.json",
    "--findings", "data-trove-wlasl-word-level-american-sign-language.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This dataset appears to be a sign language lexicon index (WLASL — Word-Level American Sign Language), containing 2,000 entries each pairing a gloss (a written word label for a sign) with associated instances, likely video or image examples. Every gloss is unique, confirming this is a vocabulary index rather than a repeated-observation log. The gloss labels are almost entirely single words (97.75% one-word rate) and are short, averaging just 6 characters, covering everyday vocabulary like 'up', 'hearing', 'dog', and 'hot'. The most interesting angle to explore is the 'instances' column, which is currently unanalysed — the number of example instances per sign likely varies considerably and would reveal which signs are well-represented versus data-sparse.

citing: row_count · column_count · columns[0].n_unique · columns[0].stats.one_word_rate · columns[0].stats.len_mean · columns[0].stats.len_min · columns[0].stats.len_max · columns[0].top_words · columns[1].alerts

Out[4]:

saturn.schema() · 2 columns

column kind n null% unique alerts
gloss text 2,000 0.0% 2,000 near_unique one_word short_text
instances unknown 2,000 0.0% skipped
Fig 1.
gloss · Look for the distribution of gloss label lengths — most signs map to short words, but check for any multi-word or unusually long outliers.
Show data table
Character-length distribution for gloss (mean: 6.0075).
charscount
1 – 121
1 – 20
2 – 213
2 – 20
2 – 30
3 – 3139
3 – 40
4 – 40
4 – 4377
4 – 50
5 – 5376
5 – 60
6 – 60
6 – 6337
6 – 70
7 – 70
7 – 7285
7 – 80
8 – 8189
8 – 80
8 – 90
9 – 9127
9 – 100
10 – 100
10 – 1066
10 – 110
11 – 1135
11 – 120
12 – 120
12 – 1219
12 – 130
13 – 130
13 – 1310
13 – 140
14 – 143
14 – 140
14 – 150
15 – 152
15 – 160
16 – 161
Fig 2.
gloss · Top words in the gloss column reveal the most common sign labels; 'up', 'hearing', and 'dog' appear more than once despite near-unique entries.
Show data table
Character-length distribution for gloss (mean: 6.0075).
charscount
1 – 121
1 – 20
2 – 213
2 – 20
2 – 30
3 – 3139
3 – 40
4 – 40
4 – 4377
4 – 50
5 – 5376
5 – 60
6 – 60
6 – 6337
6 – 70
7 – 70
7 – 7285
7 – 80
8 – 8189
8 – 80
8 – 90
9 – 9127
9 – 100
10 – 100
10 – 1066
10 – 110
11 – 1135
11 – 120
12 – 120
12 – 1219
12 – 130
13 – 130
13 – 1310
13 – 140
14 – 143
14 – 140
14 – 150
15 – 152
15 – 160
16 – 161
Fig 3.
gloss · A histogram of word count per gloss confirms the overwhelming dominance of single-word signs versus occasional multi-word phrases.
Show data table
Character-length distribution for gloss (mean: 6.0075).
charscount
1 – 121
1 – 20
2 – 213
2 – 20
2 – 30
3 – 3139
3 – 40
4 – 40
4 – 4377
4 – 50
5 – 5376
5 – 60
6 – 60
6 – 6337
6 – 70
7 – 70
7 – 7285
7 – 80
8 – 8189
8 – 80
8 – 90
9 – 9127
9 – 100
10 – 100
10 – 1066
10 – 110
11 – 1135
11 – 120
12 – 120
12 – 1219
12 – 130
13 – 130
13 – 1310
13 – 140
14 – 143
14 – 140
14 – 150
15 – 152
15 – 160
16 – 161
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
glosstext0.0%
instancesunknown0.0%

gloss text label

This column contains linguistic glosses — short, single-word (or near-single-word) labels typically used in linguistics datasets to provide the English translation or morphological tag for a lexical item. With 2000 rows, 2000 unique values, and zero duplicates, every gloss is distinct, which is consistent with a vocabulary or lexicon dataset where each entry has a unique meaning. The near-complete one-word rate (97.75%) and mean token length of ~6 characters align with single English words or abbreviations; top words like 'up', 'hearing', 'dog', and 'take' reinforce a natural-language vocabulary context. The fully unique distribution means this column functions effectively as an identifier and would carry no predictive signal in modelling.

Treatment: Use as a human-readable label or key; drop from feature matrices, or embed with a lightweight word encoder if semantic content is needed.

anthropic:default · confidence high
Out[10]:

saturn.columns["gloss"].stats

statvalue
n2,000
nulls0 (0.0%)
unique2,000
len_min 1
len_max 16
len_mean 6.008
len_median 6
len_p95 10
word_mean 1.024
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,984
readability_flesch_mean 54.58
emoji_rate 0
url_rate 0
one_word_rate 0.9775
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word97.8% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 5.
Character-length distribution for gloss.
Show data table
Character-length distribution for gloss (mean: 6.0075).
charscount
1 – 121
1 – 20
2 – 213
2 – 20
2 – 30
3 – 3139
3 – 40
4 – 40
4 – 4377
4 – 50
5 – 5376
5 – 60
6 – 60
6 – 6337
6 – 70
7 – 70
7 – 7285
7 – 80
8 – 8189
8 – 80
8 – 90
9 – 9127
9 – 100
10 – 100
10 – 1066
10 – 110
11 – 1135
11 – 120
12 – 120
12 – 1219
12 – 130
13 – 130
13 – 1310
13 – 140
14 – 143
14 – 140
14 – 150
15 – 152
15 – 160
16 – 161

instances unknown other

This column ('instances') was skipped during profiling, so almost no statistical evidence is available. With 2,000 rows, zero nulls, and no computed stats or uniqueness count, its data type and distribution are entirely unknown. The 'skipped' alert suggests the profiler either encountered an unsupported type or was explicitly configured to bypass this column.

Treatment: Manually inspect raw values to determine type and semantics before assigning a role or transformation.

anthropic:default · confidence low
Out[13]:

saturn.columns["instances"].stats

statvalue
n2,000
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-wlasl-word-level-american-sign-language-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove wlasl word level american sign language},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-wlasl-word-level-american-sign-language}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove wlasl word level american sign language. Source: /home/coolhand/html/datavis/data_trove/cache/accessibility/wlasl_index.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-wlasl-word-level-american-sign-language