vizwiz

source /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv 4,319 rows 5 columns profiled 2026-04-22 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This is the VizWiz validation annotation set: 4,319 rows linking an image filename to a question, a bundle of crowd answers, an answer_type label, and a binary 'answerable' flag. The question column is where the dataset's character lives — it has only 2,798 unique values with a 35% duplicate rate, dominated by short generic prompts like 'What is this?' (523 occurrences). Worth a closer look: the answer_type distribution is heavily skewed toward 'other' (62%) with 'unanswerable' a strong second, and the numeric 'answerable' flag confirms that ~32% of items are flagged unanswerable — a meaningful portion to account for in any downstream evaluation.

citing: row_count · column_count · columns.question.n_unique · columns.question.stats.duplicate_rate · columns.question.top_values · columns.answer_type.top_values · columns.answer_type.stats.top_rate · columns.answerable.stats.mean · columns.answerable.stats.zero_rate · columns.question.stats.word_mean

Charts the summary said to look at first

answer_type · Shows how 'other' dominates over unanswerable, yes/no, and number — useful for sizing class imbalance.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

answerable · Roughly two-thirds answerable vs. one-third unanswerable; check this before any accuracy calculation.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

question · Top question strings reveal heavy repetition of generic prompts like 'What is this?' — confirms the 35% duplicate rate.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

question · Question length distribution is short and right-skewed (median 26 chars, max 264) — a few long outliers worth inspecting.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

answers · Answer-bundle string lengths cluster tightly (450–660 chars) because each row stores a fixed-size list of crowd responses.

Show data table

Character-length distribution for answers (mean: 559.675387821255).
chars	count
450 – 462	14
462 – 474	71
474 – 486	126
486 – 498	175
498 – 510	228
510 – 522	279
522 – 535	464
535 – 547	598
547 – 559	585
559 – 571	369
571 – 583	330
583 – 595	282
595 – 607	212
607 – 619	133
619 – 631	91
631 – 643	72
643 – 655	54
655 – 667	44
667 – 679	38
679 – 692	24
692 – 704	28
704 – 716	18
716 – 728	18
728 – 740	6
740 – 752	10
752 – 764	8
764 – 776	10
776 – 788	3
788 – 800	7
800 – 812	4
812 – 824	5
824 – 836	2
836 – 848	2
848 – 861	1
861 – 873	2
873 – 885	1
885 – 897	1
897 – 909	0
909 – 921	2
921 – 933	2

Schema

5 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
image	text	0.0%	4,319	near_unique one_word
question	text	0.0%	2,798	duplicates multilingual
answers	text	0.0%	4,295	near_unique
answer_type	categorical	0.0%	4
answerable	numeric	0.0%	2

image

text identifier near_unique one_word

This column holds image filenames following the pattern `vizwiz_val_########.jpg`, with all 4319 values unique and exactly 23 characters long. Every entry is a single token with no duplicates or nulls, confirming it functions as a per-row file pointer rather than analyzable text. Treatment: Treat as a file-path key; join to image assets rather than modelling the string. high · anthropic:claude-opus-4-7

n: 4,319
nulls: 0 (0.0%)
unique: 4,319
len_min: 23
len_max: 23
len_mean: 23
len_median: 23
len_p95: 23
word_mean: 1
word_median: 1
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 4,319
readability_flesch_mean: -47.98
emoji_rate: 0
url_rate: 0
one_word_rate: 1
allcaps_rate: 0
boilerplate_rate: 0

question

text free_text duplicates multilingual

Short English questions, averaging 7.26 words and 35 characters, overwhelmingly of the form 'What is this?' (523 occurrences alone). 35.2% of the 4319 rows are duplicates, leaving only 2798 unique strings, and the vocabulary is tiny (2779 tokens) with very high Flesch readability (101.7). A handful of rows are tagged as non-English (es, la, it, fy, hu, ia, ast), but English dominates at 4308. Treatment: Tokenize and embed for modelling; consider deduplicating or weighting given the 35% duplicate rate. high · anthropic:claude-opus-4-7

n: 4,319
nulls: 0 (0.0%)
unique: 2,798
len_min: 7
len_max: 264
len_mean: 35.1
len_median: 26
len_p95: 95
word_mean: 7.259
word_median: 5
n_empty: 0
n_duplicates: 1,521
duplicate_rate: 0.3522
vocab_size: 2,779
readability_flesch_mean: 101.7
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0.002547
boilerplate_rate: 0.003473

answers

text feature near_unique

This column holds serialized lists of answer dicts (keys like 'answer' and 'answer_confidence' with values such as 'yes', 'maybe', 'unanswerable'), not free-form text. Rows are long and uniform (len_mean 559.7, len_min 450, len_max 933) and nearly all unique (4295/4319), with a tiny 0.56% duplicate rate. The strongly negative Flesch score (-56.5) confirms this is structured payload rather than natural language. Treatment: Parse the stringified dicts and explode answer/confidence fields into structured columns before modelling. high · anthropic:claude-opus-4-7

n: 4,319
nulls: 0 (0.0%)
unique: 4,295
len_min: 450
len_max: 933
len_mean: 559.7
len_median: 550
len_p95: 660.1
word_mean: 47.66
word_median: 45
n_empty: 0
n_duplicates: 24
duplicate_rate: 0.005557
vocab_size: 11,308
readability_flesch_mean: -56.5
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

answer_type

categorical label

Categorical label tagging the type of answer expected, with just 4 classes: 'other' dominates at 62.3% (2691/4319), followed by 'unanswerable' at 1385, while 'yes/no' (195) and 'number' (48) are rare. No nulls, but the class imbalance is severe — 'number' represents barely 1% of rows. Entropy ratio of 0.61 confirms the distribution is far from uniform. Treatment: Use as a stratified target; consider class weighting or merging rare classes ('yes/no', 'number') given the imbalance. high · anthropic:claude-opus-4-7

n: 4,319
nulls: 0 (0.0%)
unique: 4
top_value: other
top_rate: 0.6231
cardinality: 4
entropy: 1.225
entropy_ratio: 0.6127

answerable

numeric label

Binary 0/1 flag indicating whether an item is answerable, with 4319 rows and no nulls. Class is imbalanced toward 1: mean 0.6793 implies roughly 68% positives versus a 0.3207 zero-rate, and skew -0.768 with kurtosis -1.41 confirm the lopsided two-point distribution. Treatment: Use as binary target; account for the ~68/32 class imbalance via stratified splits or class weights. high · anthropic:claude-opus-4-7

n: 4,319
nulls: 0 (0.0%)
unique: 2
min: 0
max: 1
mean: 0.6793
median: 1
std: 0.4668
q1: 0
q3: 1
iqr: 1
skew: -0.7684
kurtosis: -1.41
n_outliers: 0
outlier_rate: 0
zero_rate: 0.3207