vizwiz

saturn notebook · generated 2026-04-22 Report Notebook

Overview

Source: /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv

Saturn profiled 4,319 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv",
    "--findings", "vizwiz.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is the VizWiz validation annotation set: 4,319 rows linking an image filename to a question, a bundle of crowd answers, an answer_type label, and a binary 'answerable' flag. The question column is where the dataset's character lives — it has only 2,798 unique values with a 35% duplicate rate, dominated by short generic prompts like 'What is this?' (523 occurrences). Worth a closer look: the answer_type distribution is heavily skewed toward 'other' (62%) with 'unanswerable' a strong second, and the numeric 'answerable' flag confirms that ~32% of items are flagged unanswerable — a meaningful portion to account for in any downstream evaluation.

citing: row_count · column_count · columns.question.n_unique · columns.question.stats.duplicate_rate · columns.question.top_values · columns.answer_type.top_values · columns.answer_type.stats.top_rate · columns.answerable.stats.mean · columns.answerable.stats.zero_rate · columns.question.stats.word_mean

Out[4]:

saturn.schema() · 5 columns

column	kind	n	null%	unique	alerts
image	text	4,319	0.0%	4,319	near_unique one_word
question	text	4,319	0.0%	2,798	duplicates multilingual
answers	text	4,319	0.0%	4,295	near_unique
answer_type	categorical	4,319	0.0%	4
answerable	numeric	4,319	0.0%	2

Fig 1.

answer_type · Shows how 'other' dominates over unanswerable, yes/no, and number — useful for sizing class imbalance.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

Fig 2.

answerable · Roughly two-thirds answerable vs. one-third unanswerable; check this before any accuracy calculation.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

Fig 3.

question · Top question strings reveal heavy repetition of generic prompts like 'What is this?' — confirms the 35% duplicate rate.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 4.

question · Question length distribution is short and right-skewed (median 26 chars, max 264) — a few long outliers worth inspecting.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 5.

answers · Answer-bundle string lengths cluster tightly (450–660 chars) because each row stores a fixed-size list of crowd responses.

Show data table

Character-length distribution for answers (mean: 559.675387821255).
chars	count
450 – 462	14
462 – 474	71
474 – 486	126
486 – 498	175
498 – 510	228
510 – 522	279
522 – 535	464
535 – 547	598
547 – 559	585
559 – 571	369
571 – 583	330
583 – 595	282
595 – 607	212
607 – 619	133
619 – 631	91
631 – 643	72
643 – 655	54
655 – 667	44
667 – 679	38
679 – 692	24
692 – 704	28
704 – 716	18
716 – 728	18
728 – 740	6
740 – 752	10
752 – 764	8
764 – 776	10
776 – 788	3
788 – 800	7
800 – 812	4
812 – 824	5
824 – 836	2
836 – 848	2
848 – 861	1
861 – 873	2
873 – 885	1
885 – 897	1
897 – 909	0
909 – 921	2
921 – 933	2

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
image	text	0.0%
question	text	0.0%
answers	text	0.0%
answer_type	categorical	0.0%
answerable	numeric	0.0%

Fig 7.

Language mix across all text columns (per-string detection, sampled).

Show data table

Per-language counts (total 4,317 detected strings).
lang	count	share
en	4308	99.8%
la	2	0.0%
es	2	0.0%
hu	1	0.0%
fy	1	0.0%
ast	1	0.0%
it	1	0.0%
ia	1	0.0%

image text identifier

This column holds image filenames following the pattern `vizwiz_val_########.jpg`, with all 4319 values unique and exactly 23 characters long. Every entry is a single token with no duplicates or nulls, confirming it functions as a per-row file pointer rather than analyzable text.

Treatment: Treat as a file-path key; join to image assets rather than modelling the string.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["image"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4,319
len_min	23
len_max	23
len_mean	23
len_median	23
len_p95	23
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	4,319
readability_flesch_mean	-47.98
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word

Fig 8.

Character-length distribution for image.

Show data table

Character-length distribution for image (mean: 23.0).
chars	count
22 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	4319
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 24	0

question text free_text

Short English questions, averaging 7.26 words and 35 characters, overwhelmingly of the form 'What is this?' (523 occurrences alone). 35.2% of the 4319 rows are duplicates, leaving only 2798 unique strings, and the vocabulary is tiny (2779 tokens) with very high Flesch readability (101.7). A handful of rows are tagged as non-English (es, la, it, fy, hu, ia, ast), but English dominates at 4308.

Treatment: Tokenize and embed for modelling; consider deduplicating or weighting given the 35% duplicate rate.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["question"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2,798
len_min	7
len_max	264
len_mean	35.1
len_median	26
len_p95	95
word_mean	7.259
word_median	5
n_empty	0
n_duplicates	1,521
duplicate_rate	0.3522
vocab_size	2,779
readability_flesch_mean	101.7
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0.002547
boilerplate_rate	0.003473
alert: duplicates	35.2% duplicate strings
alert: multilingual	9 languages detected in sample

Fig 9.

Character-length distribution for question.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

answers text feature

This column holds serialized lists of answer dicts (keys like 'answer' and 'answer_confidence' with values such as 'yes', 'maybe', 'unanswerable'), not free-form text. Rows are long and uniform (len_mean 559.7, len_min 450, len_max 933) and nearly all unique (4295/4319), with a tiny 0.56% duplicate rate. The strongly negative Flesch score (-56.5) confirms this is structured payload rather than natural language.

Treatment: Parse the stringified dicts and explode answer/confidence fields into structured columns before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["answers"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4,295
len_min	450
len_max	933
len_mean	559.7
len_median	550
len_p95	660.1
word_mean	47.66
word_median	45
n_empty	0
n_duplicates	24
duplicate_rate	0.005557
vocab_size	11,308
readability_flesch_mean	-56.5
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	99.4% of rows are unique strings

Fig 10.

Character-length distribution for answers.

Show data table

Character-length distribution for answers (mean: 559.675387821255).
chars	count
450 – 462	14
462 – 474	71
474 – 486	126
486 – 498	175
498 – 510	228
510 – 522	279
522 – 535	464
535 – 547	598
547 – 559	585
559 – 571	369
571 – 583	330
583 – 595	282
595 – 607	212
607 – 619	133
619 – 631	91
631 – 643	72
643 – 655	54
655 – 667	44
667 – 679	38
679 – 692	24
692 – 704	28
704 – 716	18
716 – 728	18
728 – 740	6
740 – 752	10
752 – 764	8
764 – 776	10
776 – 788	3
788 – 800	7
800 – 812	4
812 – 824	5
824 – 836	2
836 – 848	2
848 – 861	1
861 – 873	2
873 – 885	1
885 – 897	1
897 – 909	0
909 – 921	2
921 – 933	2

answer_type categorical label

Categorical label tagging the type of answer expected, with just 4 classes: 'other' dominates at 62.3% (2691/4319), followed by 'unanswerable' at 1385, while 'yes/no' (195) and 'number' (48) are rare. No nulls, but the class imbalance is severe — 'number' represents barely 1% of rows. Entropy ratio of 0.61 confirms the distribution is far from uniform.

Treatment: Use as a stratified target; consider class weighting or merging rare classes ('yes/no', 'number') given the imbalance.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["answer_type"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4
top_value	other
top_rate	0.6231
cardinality	4
entropy	1.225
entropy_ratio	0.6127

Fig 11.

Top values for answer_type.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

answerable numeric label

Binary 0/1 flag indicating whether an item is answerable, with 4319 rows and no nulls. Class is imbalanced toward 1: mean 0.6793 implies roughly 68% positives versus a 0.3207 zero-rate, and skew -0.768 with kurtosis -1.41 confirm the lopsided two-point distribution.

Treatment: Use as binary target; account for the ~68/32 class imbalance via stratified splits or class weights.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["answerable"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2
min	0
max	1
mean	0.6793
median	1
std	0.4668
q1	0
q3	1
iqr	1
skew	-0.7684
kurtosis	-1.41
n_outliers	0
outlier_rate	0
zero_rate	0.3207

Fig 12.

Distribution of answerable. Vertical dash marks the median.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

How to cite

click to copy

BibTeX

@misc{saturn-vizwiz-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: vizwiz},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/vizwiz}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: vizwiz. Source: /home/coolhand/datasets/accessibility-atlas/vizwiz_val_annotations.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/vizwiz