saturn

/home/coolhand/html/datavis/data_trove/cache/wlasl_index.json 2,000 rows sample n=2,000 seed 42 2026-05-01T23:08:03+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/wlasl_index.json
Total rows2,000
Profiled sample2,000
Columns2
Generated2026-05-01T23:08:03+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset is a 2000-row index from a WLASL (Word-Level American Sign Language) source, with two columns: 'gloss' (text labels) and 'instances' (an unparsed/unknown field, likely nested data). The 'gloss' column is essentially a vocabulary list — every one of the 2000 rows is unique, 97.75% are single words, and the mean length is just 6 characters. The 'instances' column was skipped by the profiler and warrants manual inspection, since it likely contains the actual sign-language sample records keyed to each gloss. Start by looking at the gloss length distribution to confirm the single-word pattern, then dig into the structure of 'instances' separately.

gloss high anthropic:claude-opus-4-7

This column holds short glosses—2000 rows, all unique, with 97.75% being a single word and a mean length of 6 characters (max 16). The vocabulary is 1984 distinct tokens across 2000 rows, so almost every entry is its own term, with only minor repeats like 'up' (7) or 'hearing' (3). It reads as a lexicon-style label field rather than free text.

instances low anthropic:claude-opus-4-7

The column is named "instances" but saturn skipped detailed profiling, so its kind is unknown and no descriptive statistics were computed. We can only confirm 2000 rows with a 0.0 null rate; uniqueness, distribution, and dtype are all unreported. Without a sample value or type signal, the semantic role cannot be inferred from the evidence.

gloss text

100.0% of rows are unique strings 97.8% rows are a single word 95th-percentile length under 20 chars
rows2,000
null0 (0.0%)
unique2,000
len_min1
len_max16
len_mean6.008
len_median6.000
len_p9510.000
word_mean1.024
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,984
readability_flesch_mean54.577
emoji_rate0.000
url_rate0.000
one_word_rate0.978
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. computer
  2. reputation
  3. eraser
  4. camel
  5. community
  6. stadium
  7. exact
  8. laundry
  9. caterpillar
  10. birthday

instances unknown

no profiler for kind=unknown
rows2,000
null0 (0.0%)