vizwiz val annotations

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/vizwiz_val_annotations.json

Saturn profiled 4,319 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/vizwiz_val_annotations.json",
    "--findings", "vizwiz_val_annotations.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 4,319 rows from the VizWiz validation annotations, structured around image filenames, the questions asked about each image, the answers, an answer_type label, and an answerable flag. The questions column is the most interesting: about 35% are duplicates, with 'What is this?' alone appearing 523 times, suggesting a heavy concentration of generic identification queries. Answer_type is dominated by 'other' (62%) and 'unanswerable' (32%), and the answerable flag confirms that roughly 32% of items are flagged as not answerable — a key signal for any downstream modeling. The image column is uniquely identifying per row and not worth deeper analysis, while the answers column was skipped by the profiler.

citing: row_count · columns.question.stats.duplicate_rate · columns.question.top_values · columns.answer_type.top_values · columns.answer_type.stats.top_rate · columns.answerable.stats.mean · columns.answerable.stats.zero_rate · columns.question.language_counts

Out[4]:

saturn.schema() · 5 columns

column	kind	n	null%	unique	alerts
image	text	4,319	0.0%	4,319	near_unique one_word
question	text	4,319	0.0%	2,798	multilingual duplicates
answers	unknown	4,319	0.0%	—	skipped
answer_type	categorical	4,319	0.0%	4
answerable	numeric	4,319	0.0%	2

Fig 1.

answer_type · Shows the strong skew toward 'other' and 'unanswerable' categories that together cover ~94% of rows.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

Fig 2.

answerable · Highlights that about 32% of questions are flagged as not answerable.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

Fig 3.

question · Top question values reveal how often generic prompts like 'What is this?' dominate the dataset.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 4.

question · Question length distribution shows most are short (median 26 chars) with a long tail up to 264 chars.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

Fig 5.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
image	text	0.0%
question	text	0.0%
answers	unknown	0.0%
answer_type	categorical	0.0%
answerable	numeric	0.0%

Fig 6.

Language mix across all text columns (per-string detection, sampled).

Show data table

Per-language counts (total 4,317 detected strings).
lang	count	share
en	4308	99.8%
la	2	0.0%
es	2	0.0%
hu	1	0.0%
fy	1	0.0%
ast	1	0.0%
it	1	0.0%
ia	1	0.0%

image text identifier

This column holds image filenames following the pattern `vizwiz_val_########.jpg`, with all 4319 values being unique single tokens of exactly 23 characters. There are no nulls, duplicates, or vocabulary variation — every row maps one-to-one to a distinct image in what appears to be the VizWiz validation split. The negative Flesch score is an artifact of scoring filenames as prose and can be ignored.

Treatment: Use as a foreign key to load the corresponding image file; do not feed as text to a model.

anthropic:claude-opus-4-7 · confidence high

Out[12]:

saturn.columns["image"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4,319
len_min	23
len_max	23
len_mean	23
len_median	23
len_p95	23
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	4,319
readability_flesch_mean	-47.98
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word

Fig 7.

Character-length distribution for image.

Show data table

Character-length distribution for image (mean: 23.0).
chars	count
22 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	4319
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 23	0
23 – 24	0

question text free_text

Short natural-language questions, mostly English (4308/4319) and overwhelmingly identification prompts — "What is this?" alone appears 523 times and the top 10 values are all generic "what is/color/says" queries. Heavy duplication (35.2%, 1521 rows) and a small vocab (2779 unique words across 4319 rows) suggest a VQA-style prompt set rather than diverse free text. Mean length is 35 chars / 7.3 words with very high Flesch readability (101.7), and a handful of non-English rows (es, la, it, fy, hu, ia, ast) introduce minor language drift.

Treatment: Tokenize and embed for modelling; deduplicate or weight by frequency given the 35% duplicate rate.

anthropic:claude-opus-4-7 · confidence high

Out[15]:

saturn.columns["question"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2,798
len_min	7
len_max	264
len_mean	35.1
len_median	26
len_p95	95
word_mean	7.259
word_median	5
n_empty	0
n_duplicates	1,521
duplicate_rate	0.3522
vocab_size	2,779
readability_flesch_mean	101.7
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0.002547
boilerplate_rate	0.003473
alert: multilingual	9 languages detected in sample
alert: duplicates	35.2% duplicate strings

Fig 8.

Character-length distribution for question.

Show data table

Character-length distribution for question (mean: 35.10141236397314).
chars	count
7 – 13	759
13 – 20	550
20 – 26	931
26 – 33	609
33 – 39	368
39 – 46	250
46 – 52	143
52 – 58	143
58 – 65	96
65 – 71	84
71 – 78	68
78 – 84	42
84 – 91	35
91 – 97	35
97 – 103	30
103 – 110	23
110 – 116	19
116 – 123	12
123 – 129	23
129 – 136	14
136 – 142	8
142 – 148	14
148 – 155	8
155 – 161	6
161 – 168	10
168 – 174	8
174 – 180	8
180 – 187	3
187 – 193	5
193 – 200	3
200 – 206	3
206 – 213	2
213 – 219	1
219 – 225	2
225 – 232	1
232 – 238	1
238 – 245	0
245 – 251	1
251 – 258	0
258 – 264	1

answers unknown other

The column 'answers' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. All 4319 rows are non-null, but uniqueness, type, and value distribution are unavailable. The name suggests it holds response content, likely structured (e.g., nested objects or arrays) which is why automatic profiling bailed out.

Treatment: Inspect raw values manually and parse into a typed structure before further profiling.

anthropic:claude-opus-4-7 · confidence low

Out[18]:

saturn.columns["answers"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

answer_type categorical label

Categorical label with only 4 distinct values across 4319 rows and no nulls, classifying answers into 'other', 'unanswerable', 'yes/no', and 'number'. The distribution is heavily imbalanced: 'other' covers 62.3% and 'unanswerable' another 1385 rows, while 'number' appears only 48 times. Entropy ratio of 0.61 confirms the skew toward the top two classes.

Treatment: One-hot or integer-encode; consider class-weighting or stratified sampling given the imbalance toward 'other'.

anthropic:claude-opus-4-7 · confidence high

Out[20]:

saturn.columns["answer_type"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	4
top_value	other
top_rate	0.6231
cardinality	4
entropy	1.225
entropy_ratio	0.6127

Fig 9.

Top values for answer_type.

Show data table

Top values for answer_type (4 unique shown, of 4 total).
value	count	share
other	2691	62.3%
unanswerable	1385	32.1%
yes/no	195	4.5%
number	48	1.1%

answerable numeric label

Binary 0/1 flag indicating whether a question is answerable, with 4319 rows and no nulls. Roughly 68% are marked answerable (mean 0.6793) and 32% are zeros, giving a moderate class imbalance toward the positive class. Only two unique values confirm this is a clean indicator rather than a probability score.

Treatment: Use directly as a binary target; account for the ~68/32 class imbalance during training or evaluation.

anthropic:claude-opus-4-7 · confidence high

Out[23]:

saturn.columns["answerable"].stats

stat	value
n	4,319
nulls	0 (0.0%)
unique	2
min	0
max	1
mean	0.6793
median	1
std	0.4668
q1	0
q3	1
iqr	1
skew	-0.7684
kurtosis	-1.41
n_outliers	0
outlier_rate	0
zero_rate	0.3207

Fig 10.

Distribution of answerable. Vertical dash marks the median.

Show data table

Histogram bins for answerable (median: 1.0).
bin	count
0 – 0.025	1385
0.025 – 0.05	0
0.05 – 0.075	0
0.075 – 0.1	0
0.1 – 0.125	0
0.125 – 0.15	0
0.15 – 0.175	0
0.175 – 0.2	0
0.2 – 0.225	0
0.225 – 0.25	0
0.25 – 0.275	0
0.275 – 0.3	0
0.3 – 0.325	0
0.325 – 0.35	0
0.35 – 0.375	0
0.375 – 0.4	0
0.4 – 0.425	0
0.425 – 0.45	0
0.45 – 0.475	0
0.475 – 0.5	0
0.5 – 0.525	0
0.525 – 0.55	0
0.55 – 0.575	0
0.575 – 0.6	0
0.6 – 0.625	0
0.625 – 0.65	0
0.65 – 0.675	0
0.675 – 0.7	0
0.7 – 0.725	0
0.725 – 0.75	0
0.75 – 0.775	0
0.775 – 0.8	0
0.8 – 0.825	0
0.825 – 0.85	0
0.85 – 0.875	0
0.875 – 0.9	0
0.9 – 0.925	0
0.925 – 0.95	0
0.95 – 0.975	0
0.975 – 1	2934

How to cite

click to copy

BibTeX

@misc{saturn-vizwiz-val-annotations-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: vizwiz val annotations},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/vizwiz_val_annotations}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: vizwiz val annotations. Source: /home/coolhand/html/datavis/data_trove/cache/vizwiz_val_annotations.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/vizwiz_val_annotations