saturn·

vizwiz

source /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv 4,319 rows 5 columns profiled 2026-04-22 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This is the VizWiz validation annotation set: 4,319 rows linking an image filename to a question, a bundle of crowd answers, an answer_type label, and a binary 'answerable' flag. The question column is where the dataset's character lives — it has only 2,798 unique values with a 35% duplicate rate, dominated by short generic prompts like 'What is this?' (523 occurrences). Worth a closer look: the answer_type distribution is heavily skewed toward 'other' (62%) with 'unanswerable' a strong second, and the numeric 'answerable' flag confirms that ~32% of items are flagged unanswerable — a meaningful portion to account for in any downstream evaluation.

citing: row_count · column_count · columns.question.n_unique · columns.question.stats.duplicate_rate · columns.question.top_values · columns.answer_type.top_values · columns.answer_type.stats.top_rate · columns.answerable.stats.mean · columns.answerable.stats.zero_rate · columns.question.stats.word_mean

Schema

5 columns
Per-column summary. Click column name to jump to its detail.
Alerts
image text 0.0% 4,319
near_unique one_word
question text 0.0% 2,798
duplicates multilingual
answers text 0.0% 4,295
near_unique
answer_type categorical 0.0% 4
answerable numeric 0.0% 2

image

text identifier near_unique one_word
This column holds image filenames following the pattern `vizwiz_val_########.jpg`, with all 4319 values unique and exactly 23 characters long. Every entry is a single token with no duplicates or nulls, confirming it functions as a per-row file pointer rather than analyzable text. Treatment: Treat as a file-path key; join to image assets rather than modelling the string. high · anthropic:claude-opus-4-7
n
4,319
nulls
0 (0.0%)
unique
4,319
len_min
23
len_max
23
len_mean
23
len_median
23
len_p95
23
word_mean
1
word_median
1
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
4,319
readability_flesch_mean
-47.98
emoji_rate
0
url_rate
0
one_word_rate
1
allcaps_rate
0
boilerplate_rate
0

question

text free_text duplicates multilingual
Short English questions, averaging 7.26 words and 35 characters, overwhelmingly of the form 'What is this?' (523 occurrences alone). 35.2% of the 4319 rows are duplicates, leaving only 2798 unique strings, and the vocabulary is tiny (2779 tokens) with very high Flesch readability (101.7). A handful of rows are tagged as non-English (es, la, it, fy, hu, ia, ast), but English dominates at 4308. Treatment: Tokenize and embed for modelling; consider deduplicating or weighting given the 35% duplicate rate. high · anthropic:claude-opus-4-7
n
4,319
nulls
0 (0.0%)
unique
2,798
len_min
7
len_max
264
len_mean
35.1
len_median
26
len_p95
95
word_mean
7.259
word_median
5
n_empty
0
n_duplicates
1,521
duplicate_rate
0.3522
vocab_size
2,779
readability_flesch_mean
101.7
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0.002547
boilerplate_rate
0.003473

answers

text feature near_unique
This column holds serialized lists of answer dicts (keys like 'answer' and 'answer_confidence' with values such as 'yes', 'maybe', 'unanswerable'), not free-form text. Rows are long and uniform (len_mean 559.7, len_min 450, len_max 933) and nearly all unique (4295/4319), with a tiny 0.56% duplicate rate. The strongly negative Flesch score (-56.5) confirms this is structured payload rather than natural language. Treatment: Parse the stringified dicts and explode answer/confidence fields into structured columns before modelling. high · anthropic:claude-opus-4-7
n
4,319
nulls
0 (0.0%)
unique
4,295
len_min
450
len_max
933
len_mean
559.7
len_median
550
len_p95
660.1
word_mean
47.66
word_median
45
n_empty
0
n_duplicates
24
duplicate_rate
0.005557
vocab_size
11,308
readability_flesch_mean
-56.5
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

answer_type

categorical label
Categorical label tagging the type of answer expected, with just 4 classes: 'other' dominates at 62.3% (2691/4319), followed by 'unanswerable' at 1385, while 'yes/no' (195) and 'number' (48) are rare. No nulls, but the class imbalance is severe — 'number' represents barely 1% of rows. Entropy ratio of 0.61 confirms the distribution is far from uniform. Treatment: Use as a stratified target; consider class weighting or merging rare classes ('yes/no', 'number') given the imbalance. high · anthropic:claude-opus-4-7
n
4,319
nulls
0 (0.0%)
unique
4
top_value
other
top_rate
0.6231
cardinality
4
entropy
1.225
entropy_ratio
0.6127

answerable

numeric label
Binary 0/1 flag indicating whether an item is answerable, with 4319 rows and no nulls. Class is imbalanced toward 1: mean 0.6793 implies roughly 68% positives versus a 0.3207 zero-rate, and skew -0.768 with kurtosis -1.41 confirm the lopsided two-point distribution. Treatment: Use as binary target; account for the ~68/32 class imbalance via stratified splits or class weights. high · anthropic:claude-opus-4-7
n
4,319
nulls
0 (0.0%)
unique
2
min
0
max
1
mean
0.6793
median
1
std
0.4668
q1
0
q3
1
iqr
1
skew
-0.7684
kurtosis
-1.41
n_outliers
0
outlier_rate
0
zero_rate
0.3207