saturn

/home/coolhand/html/datavis/data_trove/cache/accessibility/wlasl_index.json 2,000 rows sample n=2,000 seed 42 2026-06-21T23:17:47+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/cache/accessibility/wlasl_index.json
Total rows	2,000
Profiled sample	2,000
Columns	2
Generated	2026-06-21T23:17:47+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
gloss	text	0.0%
instances	unknown	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset medium anthropic:default

This dataset appears to be a sign language lexicon index (WLASL — Word-Level American Sign Language), containing 2,000 entries each pairing a gloss (a written word label for a sign) with associated instances, likely video or image examples. Every gloss is unique, confirming this is a vocabulary index rather than a repeated-observation log. The gloss labels are almost entirely single words (97.75% one-word rate) and are short, averaging just 6 characters, covering everyday vocabulary like 'up', 'hearing', 'dog', and 'hot'. The most interesting angle to explore is the 'instances' column, which is currently unanalysed — the number of example instances per sign likely varies considerably and would reveal which signs are well-represented versus data-sparse.

gloss high anthropic:default

This column contains linguistic glosses — short, single-word (or near-single-word) labels typically used in linguistics datasets to provide the English translation or morphological tag for a lexical item. With 2000 rows, 2000 unique values, and zero duplicates, every gloss is distinct, which is consistent with a vocabulary or lexicon dataset where each entry has a unique meaning. The near-complete one-word rate (97.75%) and mean token length of ~6 characters align with single English words or abbreviations; top words like 'up', 'hearing', 'dog', and 'take' reinforce a natural-language vocabulary context. The fully unique distribution means this column functions effectively as an identifier and would carry no predictive signal in modelling.

instances low anthropic:default

This column ('instances') was skipped during profiling, so almost no statistical evidence is available. With 2,000 rows, zero nulls, and no computed stats or uniqueness count, its data type and distribution are entirely unknown. The 'skipped' alert suggests the profiler either encountered an unsupported type or was explicitly configured to bypass this column.

gloss text

100.0% of rows are unique strings 97.8% rows are a single word 95th-percentile length under 20 chars

rows2,000

null0 (0.0%)

unique2,000

len_min1

len_max16

len_mean6.008

len_median6.000

len_p9510.000

word_mean1.024

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size1,984

readability_flesch_mean54.577

emoji_rate0.000

url_rate0.000

one_word_rate0.978

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for gloss (mean: 6.0075).
chars	count
1 – 1	21
1 – 2	0
2 – 2	13
2 – 2	0
2 – 3	0
3 – 3	139
3 – 4	0
4 – 4	0
4 – 4	377
4 – 5	0
5 – 5	376
5 – 6	0
6 – 6	0
6 – 6	337
6 – 7	0
7 – 7	0
7 – 7	285
7 – 8	0
8 – 8	189
8 – 8	0
8 – 9	0
9 – 9	127
9 – 10	0
10 – 10	0
10 – 10	66
10 – 11	0
11 – 11	35
11 – 12	0
12 – 12	0
12 – 12	19
12 – 13	0
13 – 13	0
13 – 13	10
13 – 14	0
14 – 14	3
14 – 14	0
14 – 15	0
15 – 15	2
15 – 16	0
16 – 16	1

Sample values (first 10)

computer
reputation
eraser
camel
community
stadium
exact
laundry
caterpillar
birthday

instances unknown

no profiler for kind=unknown

rows2,000

null0 (0.0%)