saturn

/home/coolhand/html/datavis/data_trove/accessibility_audit/web_accessibility_data_top100.csv 92 rows sample n=92 seed 42 2026-06-21T23:18:38+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/accessibility_audit/web_accessibility_data_top100.csv
Total rows	92
Profiled sample	92
Columns	6
Generated	2026-06-21T23:18:38+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
domain	categorical	0.0%
wave_rank	numeric	9.8%
popularity_rank	numeric	9.8%
errors	numeric	9.8%
error_density	numeric	9.8%
notes	categorical	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is a web accessibility audit of approximately 100 top websites, covering 92 rows with metrics on error counts, error density, popularity rank, and automated WAVE tool scores. The most striking finding is that both 'errors' and 'error_density' are heavily right-skewed with extreme outliers — the median error count is just 5, but the max reaches 364, suggesting a small cluster of sites are dramatically worse than the rest. A second angle worth exploring is the 'notes' column, where 'Low contrast text' dominates as the most common accessibility issue (12 occurrences), pointing to a systemic problem across high-traffic sites. The near-uniform distribution of 'popularity_rank' suggests the sample spans the full top-100 range evenly, making comparisons across popularity tiers feasible.

error_density high anthropic:default

This column represents a density or rate of errors — likely errors per unit (e.g., per line of code, per request, or per token) — given its name and bounded [0, 0.303] range. The distribution is severely right-skewed (skew=3.23, kurtosis=10.95): the median is only 0.0057 while the mean is 0.027, and 8 outliers (9.6% of rows) pull the tail toward the maximum of 0.303. With ~9.8% nulls and ~6% zeros, the bulk of observations cluster near zero but a meaningful minority exhibit substantially elevated error rates.

errors high anthropic:default

This column likely represents an error count per observation (e.g., system errors, validation failures, or defects), with values ranging from 0 to 364. The distribution is severely right-skewed (skew = 3.80, kurtosis = 16.12): the median is only 5.0 while the mean is 27.17 and the std is 57.20, indicating a small number of extreme cases are pulling the average dramatically upward. Eight outliers (9.6% of non-null rows) drive this effect, and ~9.8% of rows are null — both figures warrant investigation before modelling.

domain high anthropic:default

This column contains internet domain names (e.g., google.com, youtube.com, wikipedia.org), functioning as a unique identifier for each row in the dataset. With 92 rows and 92 unique values, cardinality is 100% and entropy_ratio is exactly 1.0, meaning every domain appears exactly once — the top_rate of 0.0109 confirms this. There is no distributional signal here beyond the list itself, making the column unsuitable as a categorical feature without further engineering.

notes high anthropic:default

This column contains free-text accessibility audit notes describing detected WCAG/accessibility violations (e.g., 'Low contrast text', 'Missing alt text', 'Missing form input label') for 92 records. The top value 'Low contrast text' appears 12 times (13% of rows), and with 38 unique values across 92 rows the entropy ratio is high at 0.89, flagging a long-tail distribution of compound violation combinations. Notably, values like 'Asia-based' and 'No data' appear alongside structured violation strings, suggesting the column conflates geographic metadata and data-availability flags with accessibility findings — an inconsistency an analyst should investigate. No nulls exist, but the semantic mixing across categories limits direct modelling use.

popularity_rank high anthropic:default

This column is a popularity rank, almost certainly an ordinal score from 1 to 100 assigned to each record. Its distribution is strikingly uniform: mean 51.2, median 52.0, IQR of 49.0 spanning Q1=27.5 to Q3=76.5, near-zero skew (-0.05), and a platykurtic shape (kurtosis -1.19), all consistent with an approximately flat distribution across the full 1–100 range. With 82 unique values out of 92 non-null rows, some ranks are shared between records, suggesting either ties or binned scoring rather than a strict unique ranking. The 9.78% null rate warrants attention—records missing this field may represent unranked or newly added items.

wave_rank medium anthropic:default

This column appears to be a numeric rank or score assigned within a 'wave' (likely a survey wave or data collection round), with values ranging from 1,183 to 991,094 — a span suggesting population-scale ranking rather than a small cohort. The distribution is moderately right-skewed (skew ≈ 0.96) with a wide IQR of 395,414.5 and a mean (301,136) well above the median (203,569), indicating many lower-ranked entries but a long tail of high-rank values. Notably, 83 of 92 non-null values are unique, consistent with a rank-like identifier, yet ~9.8% of rows are null, which warrants investigation before use.

Numeric correlation

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	wave_rank	popularity_rank	errors	error_density
wave_rank	+1.00	+0.01	+0.77	+0.52
popularity_rank	+0.01	+1.00	+0.06	-0.09
errors	+0.77	+0.06	+1.00	+0.28
error_density	+0.52	-0.09	+0.28	+1.00

domain categorical

92 singleton categories

rows92

null0 (0.0%)

unique92

top_valuegoogle.com

top_rate0.011

cardinality92

entropy6.524

entropy_ratio1.000

Show data table

Top values for domain (20 unique shown, of 92 total).
value	count	share
google.com	1	1.1%
youtube.com	1	1.1%
facebook.com	1	1.1%
wikipedia.org	1	1.1%
instagram.com	1	1.1%
bing.com	1	1.1%
reddit.com	1	1.1%
x.com	1	1.1%
chatgpt.com	1	1.1%
yandex.ru	1	1.1%
whatsapp.com	1	1.1%
amazon.com	1	1.1%
yahoo.com	1	1.1%
weather.com	1	1.1%
duckduckgo.com	1	1.1%
microsoftonline.com	1	1.1%
twitch.tv	1	1.1%
linkedin.com	1	1.1%
live.com	1	1.1%
fandom.com	1	1.1%

Top values (rank 1–20)

google.com — 1
youtube.com — 1
facebook.com — 1
wikipedia.org — 1
instagram.com — 1
bing.com — 1
reddit.com — 1
x.com — 1
chatgpt.com — 1
yandex.ru — 1
whatsapp.com — 1
amazon.com — 1
yahoo.com — 1
weather.com — 1
duckduckgo.com — 1
microsoftonline.com — 1
twitch.tv — 1
linkedin.com — 1
live.com — 1
fandom.com — 1

wave_rank numeric

rows92

null9 (9.8%)

unique83

min1,183

max991,094

mean301,136

median203,569

std274,563

q182,214

q3477,628

iqr395,414

skew0.963

kurtosis-0.188

n_outliers0

outlier_rate0.000

zero_rate0.000

Show data table

Histogram bins for wave_rank (median: 203569.0).
bin	count
1183 – 1.112e+05	27
1.112e+05 – 2.212e+05	17
2.212e+05 – 3.312e+05	10
3.312e+05 – 4.411e+05	5
4.411e+05 – 5.511e+05	7
5.511e+05 – 6.611e+05	6
6.611e+05 – 7.711e+05	2
7.711e+05 – 8.811e+05	4
8.811e+05 – 9.911e+05	5

popularity_rank numeric

rows92

null9 (9.8%)

unique82

min1.000

max100.000

mean51.229

median52.000

std29.388

q127.500

q376.500

iqr49.000

skew-0.050

kurtosis-1.192

n_outliers0

outlier_rate0.000

zero_rate0.000

Show data table

Histogram bins for popularity_rank (median: 52.0).
bin	count
1 – 12	11
12 – 23	5
23 – 34	10
34 – 45	10
45 – 56	9
56 – 67	9
67 – 78	9
78 – 89	9
89 – 100	11

errors numeric

skew=+3.80 9.6% rows beyond 1.5 IQR

rows92

null9 (9.8%)

unique36

min0.000

max364.000

mean27.169

median5.000

std57.198

q12.000

q326.500

iqr24.500

skew3.803

kurtosis16.120

n_outliers8

outlier_rate0.096

zero_rate0.060

Show data table

Histogram bins for errors (median: 5.0).
bin	count
0 – 40.44	70
40.44 – 80.89	7
80.89 – 121.3	1
121.3 – 161.8	2
161.8 – 202.2	0
202.2 – 242.7	2
242.7 – 283.1	0
283.1 – 323.6	0
323.6 – 364	1

error_density numeric

skew=+3.23 9.6% rows beyond 1.5 IQR

rows92

null9 (9.8%)

unique64

min0.000

max0.303

mean0.027

median5.70e-03

std0.054

q12.50e-03

q30.025

iqr0.023

skew3.229

kurtosis10.949

n_outliers8

outlier_rate0.096

zero_rate0.060

Show data table

Histogram bins for error_density (median: 0.0057).
bin	count
0 – 0.03367	65
0.03367 – 0.06733	10
0.06733 – 0.101	2
0.101 – 0.1347	2
0.1347 – 0.1683	0
0.1683 – 0.202	1
0.202 – 0.2357	2
0.2357 – 0.2693	0
0.2693 – 0.303	1

notes categorical

24 singleton categories

rows92

null0 (0.0%)

unique38

top_valueLow contrast text

top_rate0.130

cardinality38

entropy4.663

entropy_ratio0.889

Show data table

Top values for notes (20 unique shown, of 38 total).
value	count	share
Low contrast text	12	13.0%
Missing form input label	9	9.8%
Low contrast text, missing alt text	9	9.8%
Missing alt text	5	5.4%
Asia-based	5	5.4%
No detected errors	5	5.4%
Low contrast text, empty link	4	4.3%
Low contrast text, missing alt text, empty link, empty button	4	4.3%
No data	3	3.3%
Low contrast text, missing alt text, missing labels	3	3.3%
Low contrast text, missing alt text, empty button	3	3.3%
Missing document language	2	2.2%
Missing alt text, missing document language	2	2.2%
Missing form input label, empty button	2	2.2%
Low contrast text, missing alt text, empty link, missing form labels, empty button	1	1.1%
Low contrast text, missing alt text, empty links	1	1.1%
Missing alt text, missing form labels, empty buttons	1	1.1%
Missing form input label, missing document language	1	1.1%
Low contrast text, missing alt text, missing language	1	1.1%
Missing alt text, missing form labels	1	1.1%

Top values (rank 1–20)

Low contrast text — 12
Missing form input label — 9
Low contrast text, missing alt text — 9
Missing alt text — 5
Asia-based — 5
No detected errors — 5
Low contrast text, empty link — 4
Low contrast text, missing alt text, empty link, empty button — 4
No data — 3
Low contrast text, missing alt text, missing labels — 3
Low contrast text, missing alt text, empty button — 3
Missing document language — 2
Missing alt text, missing document language — 2
Missing form input label, empty button — 2
Low contrast text, missing alt text, empty link, missing form labels, empty button — 1
Low contrast text, missing alt text, empty links — 1
Missing alt text, missing form labels, empty buttons — 1
Missing form input label, missing document language — 1
Low contrast text, missing alt text, missing language — 1
Missing alt text, missing form labels — 1