accessibility atlas vizwiz val annotations

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv

Saturn profiled 4,319 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv",
    "--findings", "accessibility-atlas-vizwiz_val_annotations.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset is the VizWiz validation annotations file with 4,319 rows and 5 columns: an image filename, a question, a set of crowd answers, an answer_type label, and a binary answerable flag. The questions are dominated by a small number of generic openers — 'What is this?' alone accounts for 523 rows and questions have a 35% duplicate rate, so visual variety hides behind repeated prompts. Answer_type is heavily skewed: 'other' covers 62% of rows and 'unanswerable' another 1,385, while 'yes/no' and 'number' are rare. Consistent with that, the answerable flag has a mean of 0.68, meaning roughly 32% of items are flagged unanswerable — a notable share to inspect before modeling. The answers column is a serialized list of dicts (long strings averaging ~560 characters) and will need parsing rather than direct text analysis.

citing: row_count · column_count · columns.question.stats.duplicate_rate · columns.question.top_values · columns.answer_type.top_values · columns.answer_type.stats.top_rate · columns.answerable.stats.mean · columns.answerable.stats.zero_rate · columns.answers.stats.len_mean

Out[4]:

saturn.schema() · 5 columns

column	kind	n	null%	unique	alerts
image	text	4,319	0.0%	4,319	near_unique one_word
question	text	4,319	0.0%	2,798	multilingual duplicates
answers	text	4,319	0.0%	4,295	near_unique
answer_type	categorical	4,319	0.0%	4
answerable	numeric	4,319	0.0%	2

Fig 1.

answer_type · Shows how 'other' dominates and 'unanswerable' is the clear second category.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

Fig 2.

answerable · Visualizes the ~68/32 split between answerable and unanswerable items.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

Fig 3.

question · Highlights the most repeated prompts, led by 'What is this?' at 523 occurrences.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 4.

question · Reveals that most questions are short (median 26 chars) with a long tail up to 264.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 5.

answers · Confirms answers are long serialized structures (~560 chars) needing parsing before use.

Show data table

Character-length distribution for answers (mean: 559.675387821255).
chars	count
450 – 462	14
462 – 474	71
474 – 486	126
486 – 498	175
498 – 510	228
510 – 522	279
522 – 535	464
535 – 547	598
547 – 559	585
559 – 571	369
571 – 583	330
583 – 595	282
595 – 607	212
607 – 619	133
619 – 631	91
631 – 643	72
643 – 655	54
655 – 667	44
667 – 679	38
679 – 692	24
692 – 704	28
704 – 716	18
716 – 728	18
728 – 740	6
740 – 752	10
752 – 764	8
764 – 776	10
776 – 788	3
788 – 800	7
800 – 812	4
812 – 824	5
824 – 836	2
836 – 848	2
848 – 861	1
861 – 873	2
873 – 885	1
885 – 897	1
897 – 909	0
909 – 921	2
921 – 933	2

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
image	text	0.0%
question	text	0.0%
answers	text	0.0%
answer_type	categorical	0.0%
answerable	numeric	0.0%

Fig 7.

Language mix across all text columns (per-string detection, sampled).

Show data table

Per-language counts (total 4,317 detected strings).
lang	count	share
en	4308	99.8%
la	2	0.0%
es	2	0.0%
hu	1	0.0%
fy	1	0.0%
ast	1	0.0%
it	1	0.0%
ia	1	0.0%

image text identifier

This column holds image filenames following a fixed `vizwiz_val_########.jpg` pattern, with all 4319 values unique and exactly 23 characters long. It functions as a per-row image identifier rather than analysable text — vocab_size equals n, one_word_rate is 1.0, and there are no duplicates or nulls. The negative Flesch score (-47.98) is an artifact of scoring filenames as prose and should be ignored.

Treatment: Use as a key to join image features or load pixel data; do not treat as text.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["image"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4,319
len_min	23
len_max	23
len_mean	23
len_median	23
len_p95	23
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	4,319
readability_flesch_mean	-47.98
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word

Fig 8.

Character-length distribution for image.

Show data table

Character-length distribution for image (mean: 23.0).
chars	count
22 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	4319
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 24	0

question text free_text

Short natural-language questions, predominantly English (4308/4319) with a handful of other-language detections, averaging 7.26 words and 35.1 characters. The column is heavily repetitive: 35.2% duplicate rate with 1521 duplicates, and the single string "What is this?" alone accounts for 523 of 4319 rows. Vocabulary is small (2779 unique tokens) and dominated by interrogatives like "what", "is", "this", consistent with VQA-style prompts directed at images or objects.

Treatment: Tokenize and embed as a text feature; expect heavy duplication so consider pairing with the associated image/context rather than treating as a standalone signal.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["question"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2,798
len_min	7
len_max	264
len_mean	35.1
len_median	26
len_p95	95
word_mean	7.259
word_median	5
n_empty	0
n_duplicates	1,521
duplicate_rate	0.3522
vocab_size	2,779
readability_flesch_mean	101.7
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0.002547
boilerplate_rate	0.003473
alert: multilingual	9 languages detected in sample
alert: duplicates	35.2% duplicate strings

Fig 9.

Character-length distribution for question.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

answers text free_text

This column holds serialized lists of answer dictionaries (each containing 'answer' and 'answer_confidence' keys with values like 'yes', 'maybe', 'unanswerable', 'unsuitable'), not raw natural-language text. Lengths are tightly bounded (min 450, max 933, median 550 chars) and 4295 of 4319 rows are unique, with only 24 duplicates flagged as near_unique. The strongly negative Flesch score (-56.5) confirms this is structured/JSON-like content rather than prose.

Treatment: Parse the literal dict/list structure and explode answer and answer_confidence into separate fields before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["answers"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4,295
len_min	450
len_max	933
len_mean	559.7
len_median	550
len_p95	660.1
word_mean	47.66
word_median	45
n_empty	0
n_duplicates	24
duplicate_rate	0.005557
vocab_size	11,308
readability_flesch_mean	-56.5
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	99.4% of rows are unique strings

Fig 10.

Character-length distribution for answers.

Show data table

Character-length distribution for answers (mean: 559.675387821255).
chars	count
450 – 462	14
462 – 474	71
474 – 486	126
486 – 498	175
498 – 510	228
510 – 522	279
522 – 535	464
535 – 547	598
547 – 559	585
559 – 571	369
571 – 583	330
583 – 595	282
595 – 607	212
607 – 619	133
619 – 631	91
631 – 643	72
643 – 655	54
655 – 667	44
667 – 679	38
679 – 692	24
692 – 704	28
704 – 716	18
716 – 728	18
728 – 740	6
740 – 752	10
752 – 764	8
764 – 776	10
776 – 788	3
788 – 800	7
800 – 812	4
812 – 824	5
824 – 836	2
836 – 848	2
848 – 861	1
861 – 873	2
873 – 885	1
885 – 897	1
897 – 909	0
909 – 921	2
921 – 933	2

answer_type categorical label

Categorical label tagging the type of answer for each row, with only 4 distinct values and no nulls across 4319 rows. The distribution is heavily skewed: 'other' covers 62.3% (2691) and 'unanswerable' another 1385, while 'yes/no' (195) and 'number' (48) are rare. Entropy ratio of 0.61 confirms the imbalance, which will matter for any stratification or class-weighted modelling.

Treatment: One-hot encode and apply class weighting or stratified sampling to handle the imbalance.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["answer_type"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4
top_value	other
top_rate	0.6231
cardinality	4
entropy	1.225
entropy_ratio	0.6127

Fig 11.

Top values for answer_type.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

answerable numeric label

Binary 0/1 flag, almost certainly indicating whether a question is answerable. About 67.9% of rows are 1 and 32.1% are 0, with no nulls across 4319 rows. The class imbalance is moderate but worth noting for any classifier trained on this label.

Treatment: Use directly as a binary target; account for the ~68/32 class imbalance during training.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["answerable"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2
min	0
max	1
mean	0.6793
median	1
std	0.4668
q1	0
q3	1
iqr	1
skew	-0.7684
kurtosis	-1.41
n_outliers	0
outlier_rate	0
zero_rate	0.3207

Fig 12.

Distribution of answerable. Vertical dash marks the median.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

How to cite

click to copy

BibTeX

@misc{saturn-accessibility-atlas-vizwiz-val-annotations-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: accessibility atlas vizwiz val annotations},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/accessibility-atlas-vizwiz_val_annotations}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: accessibility atlas vizwiz val annotations. Source: /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/accessibility-atlas-vizwiz_val_annotations