saturn·

web accessibility data top100

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /tmp/saturn-uploads/f55be675ada5/web_accessibility_data_top100.csv

Saturn profiled 92 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/tmp/saturn-uploads/f55be675ada5/web_accessibility_data_top100.csv",
    "--findings", "web_accessibility_data_top100.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset profiles 92 popular websites with accessibility metrics, including error counts, error density, and two ranking signals (popularity and wave rank). The error metrics are highly skewed: errors range from 0 to 364 with a median of just 5, and 8 sites (about 10%) qualify as outliers — worth flagging as the worst offenders. The notes field is the richest qualitative signal, with 'Low contrast text' (12 sites) and 'Missing form input label' (9 sites) dominating the issue mix. Popularity_rank is evenly spread across 1–100, so it works well as a control axis when comparing error patterns across the popularity spectrum.

citing: errors · error_density · notes · popularity_rank · wave_rank

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
domain categorical 92 0.0% 92 long_tail
wave_rank numeric 92 9.8% 83
popularity_rank numeric 92 9.8% 82
errors numeric 92 9.8% 36 high_skew outliers
error_density numeric 92 9.8% 64 high_skew outliers
notes categorical 92 0.0% 38 long_tail
Fig 1.
errors · Most sites have few errors but a long right tail reaches 364 — look for the outlier sites driving the mean above the median.
Show data table
Histogram bins for errors (median: 5.0).
bincount
0 – 40.4470
40.44 – 80.897
80.89 – 121.31
121.3 – 161.82
161.8 – 202.20
202.2 – 242.72
242.7 – 283.10
283.1 – 323.60
323.6 – 3641
Fig 2.
error_density · Error density is similarly skewed; check whether the same sites top both error count and density.
Show data table
Histogram bins for error_density (median: 0.0057).
bincount
0 – 0.0336765
0.03367 – 0.0673310
0.06733 – 0.1012
0.101 – 0.13472
0.1347 – 0.16830
0.1683 – 0.2021
0.202 – 0.23572
0.2357 – 0.26930
0.2693 – 0.3031
Fig 3.
notes · See which accessibility issues recur most often — low contrast text and missing labels lead by a clear margin.
Show data table
Top values for notes (20 unique shown, of 38 total).
valuecountshare
Low contrast text1213.0%
Missing form input label99.8%
Low contrast text, missing alt text99.8%
Missing alt text55.4%
Asia-based55.4%
No detected errors55.4%
Low contrast text, empty link44.3%
Low contrast text, missing alt text, empty link, empty button44.3%
No data33.3%
Low contrast text, missing alt text, missing labels33.3%
Low contrast text, missing alt text, empty button33.3%
Missing document language22.2%
Missing alt text, missing document language22.2%
Missing form input label, empty button22.2%
Low contrast text, missing alt text, empty link, missing form labels, empty button11.1%
Low contrast text, missing alt text, empty links11.1%
Missing alt text, missing form labels, empty buttons11.1%
Missing form input label, missing document language11.1%
Low contrast text, missing alt text, missing language11.1%
Missing alt text, missing form labels11.1%
Fig 4.
popularity_rank · Popularity is roughly uniform across 1–100, confirming the sample spans the full top-100 range.
Show data table
Histogram bins for popularity_rank (median: 52.0).
bincount
1 – 1211
12 – 235
23 – 3410
34 – 4510
45 – 569
56 – 679
67 – 789
78 – 899
89 – 10011
Fig 5.
wave_rank · Wave rank spans three orders of magnitude with a mild right skew — useful for segmenting sites by traffic tier.
Show data table
Histogram bins for wave_rank (median: 203569.0).
bincount
1183 – 1.112e+0527
1.112e+05 – 2.212e+0517
2.212e+05 – 3.312e+0510
3.312e+05 – 4.411e+055
4.411e+05 – 5.511e+057
5.511e+05 – 6.611e+056
6.611e+05 – 7.711e+052
7.711e+05 – 8.811e+054
8.811e+05 – 9.911e+055
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
domaincategorical0.0%
wave_ranknumeric9.8%
popularity_ranknumeric9.8%
errorsnumeric9.8%
error_densitynumeric9.8%
notescategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
wave_rankpopularity_rankerrorserror_density
wave_rank+1.00+0.01+0.77+0.52
popularity_rank+0.01+1.00+0.06-0.09
errors+0.77+0.06+1.00+0.28
error_density+0.52-0.09+0.28+1.00

domain categorical identifier

This column holds web domain names (e.g., google.com, youtube.com, wikipedia.org), with one row per domain. Every one of the 92 values is unique (n_unique equals n, entropy_ratio = 1.0, top_rate ≈ 0.0109), so it functions as a row identifier rather than a categorical feature. No nulls are present, but the long_tail alert reflects the fully flat distribution.

Treatment: Use as a row key or join key; do not one-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["domain"].stats

statvalue
n92
nulls0 (0.0%)
unique92
top_value google.com
top_rate 0.01087
cardinality 92
entropy 6.524
entropy_ratio 1
alert: long_tail92 singleton categories
Fig 8.
Top values for domain.
Show data table
Top values for domain (20 unique shown, of 92 total).
valuecountshare
google.com11.1%
youtube.com11.1%
facebook.com11.1%
wikipedia.org11.1%
instagram.com11.1%
bing.com11.1%
reddit.com11.1%
x.com11.1%
chatgpt.com11.1%
yandex.ru11.1%
whatsapp.com11.1%
amazon.com11.1%
yahoo.com11.1%
weather.com11.1%
duckduckgo.com11.1%
microsoftonline.com11.1%
twitch.tv11.1%
linkedin.com11.1%
live.com11.1%
fandom.com11.1%

wave_rank numeric feature

A numeric ranking-style field spanning 1,183 to 991,094 with mean 301,136 and median 203,569, suggesting position or score values rather than counts. The distribution is right-skewed (skew 0.96) with a wide IQR of 395,414, but no outliers were flagged. Notably, 9.78% of values are null and only 83 of 92 rows are unique, so there are a few repeated ranks.

Treatment: Impute or flag the ~10% nulls and consider a log or rank transform before modelling given the right skew.

anthropic:claude-opus-4-7 · confidence medium
Out[16]:

saturn.columns["wave_rank"].stats

statvalue
n92
nulls9 (9.8%)
unique83
min 1,183
max 991,094
mean 3.011e+05
median 203,569
std 2.746e+05
q1 8.221e+04
q3 477,628
iqr 3.954e+05
skew 0.9633
kurtosis -0.1876
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 9.
Distribution of wave_rank. Vertical dash marks the median.
Show data table
Histogram bins for wave_rank (median: 203569.0).
bincount
1183 – 1.112e+0527
1.112e+05 – 2.212e+0517
2.212e+05 – 3.312e+0510
3.312e+05 – 4.411e+055
4.411e+05 – 5.511e+057
5.511e+05 – 6.611e+056
6.611e+05 – 7.711e+052
7.711e+05 – 8.811e+054
8.811e+05 – 9.911e+055

popularity_rank numeric feature

Almost certainly a 1-to-100 popularity ranking, with min 1.0, max 100.0, and a near-symmetric distribution (mean 51.23, median 52.0, skew -0.05). The spread is essentially uniform across the range (std 29.39, IQR 49.0, kurtosis -1.19, no outliers), which is consistent with rank data rather than a measured score. Note 9.78% of rows are null and 82 unique values across 92 rows means a handful of ties or repeats.

Treatment: Impute or flag the ~10% nulls; treat as ordinal and avoid log transforms given the uniform 1-100 spread.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["popularity_rank"].stats

statvalue
n92
nulls9 (9.8%)
unique82
min 1
max 100
mean 51.23
median 52
std 29.39
q1 27.5
q3 76.5
iqr 49
skew -0.05046
kurtosis -1.192
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of popularity_rank. Vertical dash marks the median.
Show data table
Histogram bins for popularity_rank (median: 52.0).
bincount
1 – 1211
12 – 235
23 – 3410
34 – 4510
45 – 569
56 – 679
67 – 789
78 – 899
89 – 10011

errors numeric feature

Numeric counter of errors per record, ranging from 0 to 364 with a median of just 5 but a mean of 27.17 — a classic long-tail count. Distribution is severely right-skewed (skew 3.80, kurtosis 16.12) with 8 outliers (9.6% of values) and a std of 57.20 dwarfing the IQR of 24.5. Note also a 9.78% null rate and 6.02% exact zeros, so missingness and zero-inflation both need handling.

Treatment: Impute nulls and apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["errors"].stats

statvalue
n92
nulls9 (9.8%)
unique36
min 0
max 364
mean 27.17
median 5
std 57.2
q1 2
q3 26.5
iqr 24.5
skew 3.803
kurtosis 16.12
n_outliers 8
outlier_rate 0.09639
zero_rate 0.06024
alert: high_skewskew=+3.80
alert: outliers9.6% rows beyond 1.5 IQR
Fig 11.
Distribution of errors. Vertical dash marks the median.
Show data table
Histogram bins for errors (median: 5.0).
bincount
0 – 40.4470
40.44 – 80.897
80.89 – 121.31
121.3 – 161.82
161.8 – 202.20
202.2 – 242.72
242.7 – 283.10
283.1 – 323.60
323.6 – 3641

error_density numeric feature

Likely a per-record error density (errors per unit), bounded at 0 with a long right tail: median 0.0057 vs mean 0.0272 and max 0.303, with skew 3.23 and kurtosis 10.95. Roughly 9.6% of values are flagged outliers, 6% are exactly zero, and 9.8% are null, so a small set of high-error records dominates the distribution.

Treatment: Log1p- or winsorise before modelling, and impute or flag the ~10% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["error_density"].stats

statvalue
n92
nulls9 (9.8%)
unique64
min 0
max 0.303
mean 0.02725
median 0.0057
std 0.05376
q1 0.0025
q3 0.02545
iqr 0.02295
skew 3.229
kurtosis 10.95
n_outliers 8
outlier_rate 0.09639
zero_rate 0.06024
alert: high_skewskew=+3.23
alert: outliers9.6% rows beyond 1.5 IQR
Fig 12.
Distribution of error_density. Vertical dash marks the median.
Show data table
Histogram bins for error_density (median: 0.0057).
bincount
0 – 0.0336765
0.03367 – 0.0673310
0.06733 – 0.1012
0.101 – 0.13472
0.1347 – 0.16830
0.1683 – 0.2021
0.202 – 0.23572
0.2357 – 0.26930
0.2693 – 0.3031

notes categorical free_text

Free-form QA notes describing accessibility issues found on items, dominated by recurring phrases like "Low contrast text" (12/92) and "Missing form input label" (9/92). High entropy ratio (0.89) and 38 unique values across only 92 rows indicate a long tail of compound descriptions (e.g., "Low contrast text, missing alt text, empty link, empty button"). Notable signals include sentinel-like entries "No detected errors" and "No data", plus an off-topic "Asia-based" tag that suggests inconsistent note conventions.

Treatment: Split on commas into multi-label issue tags before aggregating; isolate sentinel values like "No data".

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["notes"].stats

statvalue
n92
nulls0 (0.0%)
unique38
top_value Low contrast text
top_rate 0.1304
cardinality 38
entropy 4.663
entropy_ratio 0.8885
alert: long_tail24 singleton categories
Fig 13.
Top values for notes.
Show data table
Top values for notes (20 unique shown, of 38 total).
valuecountshare
Low contrast text1213.0%
Missing form input label99.8%
Low contrast text, missing alt text99.8%
Missing alt text55.4%
Asia-based55.4%
No detected errors55.4%
Low contrast text, empty link44.3%
Low contrast text, missing alt text, empty link, empty button44.3%
No data33.3%
Low contrast text, missing alt text, missing labels33.3%
Low contrast text, missing alt text, empty button33.3%
Missing document language22.2%
Missing alt text, missing document language22.2%
Missing form input label, empty button22.2%
Low contrast text, missing alt text, empty link, missing form labels, empty button11.1%
Low contrast text, missing alt text, empty links11.1%
Missing alt text, missing form labels, empty buttons11.1%
Missing form input label, missing document language11.1%
Low contrast text, missing alt text, missing language11.1%
Missing alt text, missing form labels11.1%

How to cite

click to copy

BibTeX
@misc{saturn-web-accessibility-data-top100-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: web accessibility data top100},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/web_accessibility_data_top100}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: web accessibility data top100. Source: /tmp/saturn-uploads/f55be675ada5/web_accessibility_data_top100.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/web_accessibility_data_top100