saturn·

accessibility atlas cdc dhds disability prevalence

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv

Saturn profiled 3,592 rows across 30 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv",
    "--findings", "accessibility-atlas-cdc_dhds_disability_prevalence.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,592 BRFSS-derived records of age-adjusted disability prevalence among U.S. adults 18+, broken out by state/territory (65 locations), year (2016-2022), and 8 disability response types. The core measure is Data_Value (percent prevalence), which ranges from 1.8% to 81.3% with a median of 9.1% and a heavily right-skewed distribution flagged for outliers. Most metadata columns (Category, Indicator, DataSource, Stratification1, etc.) are constant single-value fields and can be ignored as filters. The two things worth a closer look are the distribution of Data_Value across the 8 disability types in Response, and the geographic spread via LocationDesc — both are perfectly balanced in row counts, so any variation will come from the prevalence values themselves.

citing: row_count · column_count · Data_Value · Response · LocationDesc · Year · WeightedNumber · Category

Out[4]:

saturn.schema() · 30 columns

column kind n null% unique alerts
Year numeric 3,592 0.0% 7
LocationAbbr categorical 3,592 0.0% 65
LocationDesc categorical 3,592 0.0% 65
DataSource categorical 3,592 0.0% 1 imbalance
Category categorical 3,592 0.0% 1 imbalance
Indicator categorical 3,592 0.0% 1 imbalance
Response categorical 3,592 0.0% 8
Data_Value_Unit categorical 3,592 0.0% 1 imbalance
Data_Value_Type categorical 3,592 0.0% 1 imbalance
Data_Value numeric 3,592 0.1% 486 outliers
Data_Value_Alt numeric 3,592 0.1% 486 outliers
Data_Value_Footnote_Symbol categorical 3,592 99.9% 1 null_rate imbalance
Data_Value_Footnote categorical 3,592 99.9% 1 null_rate imbalance
Low_Confidence_Limit numeric 3,592 0.1% 489 outliers
High_Confidence_Limit numeric 3,592 0.1% 503 outliers
Number numeric 3,592 0.1% 2,267 high_skew outliers
WeightedNumber numeric 3,592 0.1% 3,580 high_skew outliers
StratificationCategory1 categorical 3,592 0.0% 1 imbalance
Stratification1 categorical 3,592 0.0% 1 imbalance
StratificationCategory2 unknown 3,592 0.0% skipped
Stratification2 unknown 3,592 0.0% skipped
CategoryID categorical 3,592 0.0% 1 imbalance
IndicatorID categorical 3,592 0.0% 1 imbalance
LocationID numeric 3,592 0.0% 65
ResponseID categorical 3,592 0.0% 8
DataValueTypeID categorical 3,592 0.0% 1 imbalance
StratificationCategoryID1 categorical 3,592 0.0% 1 imbalance
StratificationID1 categorical 3,592 0.0% 1 imbalance
StratificationCategoryID2 unknown 3,592 0.0% skipped
StratificationID2 unknown 3,592 0.0% skipped
Fig 1.
Data_Value · Prevalence percentages are right-skewed with a median near 9.1% and a long tail up to 81.3% — watch for the outlier cluster.
Show data table
Histogram bins for Data_Value (median: 9.1).
bincount
1.8 – 3.788442
3.788 – 5.775606
5.775 – 7.763551
7.763 – 9.75297
9.75 – 11.74343
11.74 – 13.73237
13.73 – 15.71112
15.71 – 17.758
17.7 – 19.6938
19.69 – 21.6850
21.68 – 23.6684
23.66 – 25.6588
25.65 – 27.6479
27.64 – 29.6266
29.62 – 31.6129
31.61 – 33.625
33.6 – 35.5914
35.59 – 37.577
37.57 – 39.568
39.56 – 41.552
41.55 – 43.542
43.54 – 45.520
45.52 – 47.510
47.51 – 49.50
49.5 – 51.490
51.49 – 53.480
53.48 – 55.460
55.46 – 57.450
57.45 – 59.443
59.44 – 61.425
61.42 – 63.418
63.41 – 65.410
65.4 – 67.3921
67.39 – 69.3825
69.38 – 71.3644
71.36 – 73.3576
73.35 – 75.3481
75.34 – 77.3392
77.33 – 79.3166
79.31 – 81.318
Fig 2.
Response · All 8 disability types appear in equal counts (449 each), confirming a balanced design across Cognitive, Mobility, Vision, Hearing, Self-care, Independent Living, Any, and No Disability.
Show data table
Top values for Response (8 unique shown, of 8 total).
valuecountshare
Cognitive Disability44912.5%
No Disability44912.5%
Mobility Disability44912.5%
Independent Living Disability44912.5%
Any Disability44912.5%
Vision Disability44912.5%
Self-care Disability44912.5%
Hearing Disability44912.5%
Fig 3.
Year · Records span 2016-2022; check whether coverage is even across years before doing any trend analysis.
Show data table
Histogram bins for Year (median: 2019.0).
bincount
2016 – 2016520
2016 – 20160
2016 – 20160
2016 – 20170
2017 – 20170
2017 – 20170
2017 – 2017512
2017 – 20170
2017 – 20170
2017 – 20180
2018 – 20180
2018 – 20180
2018 – 20180
2018 – 2018512
2018 – 20180
2018 – 20180
2018 – 20190
2019 – 20190
2019 – 20190
2019 – 20190
2019 – 2019504
2019 – 20190
2019 – 20190
2019 – 20200
2020 – 20200
2020 – 20200
2020 – 2020512
2020 – 20200
2020 – 20200
2020 – 20200
2020 – 20210
2021 – 20210
2021 – 20210
2021 – 2021512
2021 – 20210
2021 – 20210
2021 – 20220
2022 – 20220
2022 – 20220
2022 – 2022520
Fig 4.
LocationDesc · 65 locations (states, DC, territories) each contribute 56 rows — useful for confirming geographic completeness before mapping prevalence.
Show data table
Top values for LocationDesc (20 unique shown, of 65 total).
valuecountshare
Pennsylvania561.6%
Louisiana561.6%
Arkansas561.6%
Wyoming561.6%
Alaska561.6%
Maryland561.6%
Guam561.6%
Massachusetts561.6%
West Virginia561.6%
Utah561.6%
North Dakota561.6%
North Carolina561.6%
Ohio561.6%
South Dakota561.6%
Connecticut561.6%
Oregon561.6%
Minnesota561.6%
HHS Region 6561.6%
Michigan561.6%
HHS Region 8561.6%
Fig 5.
WeightedNumber · Weighted population estimates range from ~1.6K to 181M with extreme skew (kurtosis ~262) — large states dominate the tail.
Show data table
Histogram bins for WeightedNumber (median: 418252.0).
bincount
1641 – 4.532e+063285
4.532e+06 – 9.063e+06156
9.063e+06 – 1.359e+0742
1.359e+07 – 1.812e+0740
1.812e+07 – 2.265e+0715
2.265e+07 – 2.718e+074
2.718e+07 – 3.172e+0719
3.172e+07 – 3.625e+0712
3.625e+07 – 4.078e+070
4.078e+07 – 4.531e+070
4.531e+07 – 4.984e+070
4.984e+07 – 5.437e+070
5.437e+07 – 5.89e+070
5.89e+07 – 6.343e+071
6.343e+07 – 6.796e+075
6.796e+07 – 7.249e+070
7.249e+07 – 7.702e+071
7.702e+07 – 8.155e+070
8.155e+07 – 8.608e+070
8.608e+07 – 9.061e+070
9.061e+07 – 9.514e+070
9.514e+07 – 9.967e+070
9.967e+07 – 1.042e+080
1.042e+08 – 1.087e+080
1.087e+08 – 1.133e+080
1.133e+08 – 1.178e+080
1.178e+08 – 1.223e+080
1.223e+08 – 1.269e+080
1.269e+08 – 1.314e+080
1.314e+08 – 1.359e+080
1.359e+08 – 1.404e+080
1.404e+08 – 1.45e+080
1.45e+08 – 1.495e+080
1.495e+08 – 1.54e+080
1.54e+08 – 1.586e+080
1.586e+08 – 1.631e+080
1.631e+08 – 1.676e+081
1.676e+08 – 1.722e+082
1.722e+08 – 1.767e+080
1.767e+08 – 1.812e+084
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
Yearnumeric0.0%
LocationAbbrcategorical0.0%
LocationDesccategorical0.0%
DataSourcecategorical0.0%
Categorycategorical0.0%
Indicatorcategorical0.0%
Responsecategorical0.0%
Data_Value_Unitcategorical0.0%
Data_Value_Typecategorical0.0%
Data_Valuenumeric0.1%
Data_Value_Altnumeric0.1%
Data_Value_Footnote_Symbolcategorical99.9%
Data_Value_Footnotecategorical99.9%
Low_Confidence_Limitnumeric0.1%
High_Confidence_Limitnumeric0.1%
Numbernumeric0.1%
WeightedNumbernumeric0.1%
StratificationCategory1categorical0.0%
Stratification1categorical0.0%
StratificationCategory2unknown0.0%
Stratification2unknown0.0%
CategoryIDcategorical0.0%
IndicatorIDcategorical0.0%
LocationIDnumeric0.0%
ResponseIDcategorical0.0%
DataValueTypeIDcategorical0.0%
StratificationCategoryID1categorical0.0%
StratificationID1categorical0.0%
StratificationCategoryID2unknown0.0%
StratificationID2unknown0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 8 numeric columns (values clipped to 2 decimals).
YearData_ValueData_Value_AltLow_Confidence_LimitHigh_Confidence_LimitNumberWeightedNumberLocationID
Year+1.00+0.02+0.02+0.02+0.02+0.03+0.04+0.03
Data_Value+0.02+1.00+1.00+1.00+1.00+0.25+0.23-0.00
Data_Value_Alt+0.02+1.00+1.00+1.00+1.00+0.25+0.23-0.00
Low_Confidence_Limit+0.02+1.00+1.00+1.00+1.00+0.26+0.23-0.00
High_Confidence_Limit+0.02+1.00+1.00+1.00+1.00+0.25+0.22-0.00
Number+0.03+0.25+0.25+0.26+0.25+1.00+0.98-0.00
WeightedNumber+0.04+0.23+0.23+0.23+0.22+0.98+1.00-0.01
LocationID+0.03-0.00-0.00-0.00-0.00-0.00-0.01+1.00

Year numeric timestamp

This is a Year column spanning 2016 to 2022 with only 7 unique values across 3592 rows, no nulls, and a perfectly symmetric distribution centered on 2019 (mean = median = 2019). Despite being typed numeric, it functions as a low-cardinality temporal category. No outliers and zero zero-values, so the field is clean.

Treatment: Treat as an ordinal/categorical year for grouping or one-hot encoding rather than a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["Year"].stats

statvalue
n3,592
nulls0 (0.0%)
unique7
min 2,016
max 2,022
mean 2,019
median 2,019
std 2.008
q1 2,017
q3 2,021
iqr 4
skew 0
kurtosis -1.259
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of Year. Vertical dash marks the median.
Show data table
Histogram bins for Year (median: 2019.0).
bincount
2016 – 2016520
2016 – 20160
2016 – 20160
2016 – 20170
2017 – 20170
2017 – 20170
2017 – 2017512
2017 – 20170
2017 – 20170
2017 – 20180
2018 – 20180
2018 – 20180
2018 – 20180
2018 – 2018512
2018 – 20180
2018 – 20180
2018 – 20190
2019 – 20190
2019 – 20190
2019 – 20190
2019 – 2019504
2019 – 20190
2019 – 20190
2019 – 20200
2020 – 20200
2020 – 20200
2020 – 2020512
2020 – 20200
2020 – 20200
2020 – 20200
2020 – 20210
2021 – 20210
2021 – 20210
2021 – 2021512
2021 – 20210
2021 – 20210
2021 – 20220
2022 – 20220
2022 – 20220
2022 – 2022520

LocationAbbr categorical foreign_key

This is a US state/territory abbreviation code (e.g., PA, LA, AR, WY, GU), serving as a geographic key. With 65 unique values across 3592 rows and a near-uniform distribution (entropy ratio 0.999, top_rate just 0.0156), most codes appear exactly 56 times — suggesting a balanced panel of states/territories repeated across another dimension. The cardinality of 65 exceeds the 50 states, indicating territories and possibly national/regional aggregates are included.

Treatment: left-join on this code to enrich with state/territory metadata, or one-hot encode for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["LocationAbbr"].stats

statvalue
n3,592
nulls0 (0.0%)
unique65
top_value PA
top_rate 0.01559
cardinality 65
entropy 6.017
entropy_ratio 0.9992
Fig 9.
Top values for LocationAbbr.
Show data table
Top values for LocationAbbr (20 unique shown, of 65 total).
valuecountshare
PA561.6%
LA561.6%
AR561.6%
WY561.6%
AK561.6%
MD561.6%
GU561.6%
MA561.6%
WV561.6%
UT561.6%
ND561.6%
NC561.6%
OH561.6%
SD561.6%
CT561.6%
OR561.6%
MN561.6%
HHS6561.6%
MI561.6%
HHS8561.6%

LocationDesc categorical feature

LocationDesc is a US state/territory name field with 65 distinct values including states, DC, and territories like Guam. The distribution is essentially uniform — entropy_ratio of 0.999 and the top 10 values all tie at 56 occurrences — suggesting this is a balanced panel where each location contributes the same number of rows. No nulls and a tidy, closed vocabulary.

Treatment: Use as a categorical grouping key; one-hot or target-encode if modelling.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["LocationDesc"].stats

statvalue
n3,592
nulls0 (0.0%)
unique65
top_value Pennsylvania
top_rate 0.01559
cardinality 65
entropy 6.017
entropy_ratio 0.9992
Fig 10.
Top values for LocationDesc.
Show data table
Top values for LocationDesc (20 unique shown, of 65 total).
valuecountshare
Pennsylvania561.6%
Louisiana561.6%
Arkansas561.6%
Wyoming561.6%
Alaska561.6%
Maryland561.6%
Guam561.6%
Massachusetts561.6%
West Virginia561.6%
Utah561.6%
North Dakota561.6%
North Carolina561.6%
Ohio561.6%
South Dakota561.6%
Connecticut561.6%
Oregon561.6%
Minnesota561.6%
HHS Region 6561.6%
Michigan561.6%
HHS Region 8561.6%

DataSource categorical metadata

This column records the dataset's provenance, with every one of the 3592 rows tagged "BRFSS". Cardinality is 1 and entropy is 0, so it carries no discriminative signal.

Treatment: Drop; constant column adds no information.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["DataSource"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value BRFSS
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 11.
Top values for DataSource.
Show data table
Top values for DataSource (1 unique shown, of 1 total).
valuecountshare
BRFSS3592100.0%

Category categorical metadata

This column is a single-valued tag labeling every row as "Disability Estimates" across all 3592 records. With cardinality of 1, top_rate of 1.0, and entropy of 0.0, it carries no information for modelling or filtering.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["Category"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value Disability Estimates
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 12.
Top values for Category.
Show data table
Top values for Category (1 unique shown, of 1 total).
valuecountshare
Disability Estimates3592100.0%

Indicator categorical metadata

This column holds a single constant string ('Disability status and types among adults 18 years of age or older') across all 3,592 rows, with cardinality 1 and entropy 0. It carries no information for modelling and likely just labels the survey indicator the dataset was filtered to.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["Indicator"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value Disability status and types among adults 18 years of age or older
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 13.
Top values for Indicator.
Show data table
Top values for Indicator (1 unique shown, of 1 total).
valuecountshare
Disability status and types among adults 18 years of age or older3592100.0%

Response categorical label

This column enumerates a disability response category, with 8 distinct values such as 'Cognitive Disability', 'No Disability', and 'Hearing Disability'. The distribution is perfectly uniform — each of the 8 values appears exactly 449 times (top_rate 0.125, entropy_ratio 1.0), indicating the dataset is balanced or pivoted by category rather than sampled organically. There are no nulls.

Treatment: Use as a categorical label; one-hot or factor encode for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["Response"].stats

statvalue
n3,592
nulls0 (0.0%)
unique8
top_value Cognitive Disability
top_rate 0.125
cardinality 8
entropy 3
entropy_ratio 1
Fig 14.
Top values for Response.
Show data table
Top values for Response (8 unique shown, of 8 total).
valuecountshare
Cognitive Disability44912.5%
No Disability44912.5%
Mobility Disability44912.5%
Independent Living Disability44912.5%
Any Disability44912.5%
Vision Disability44912.5%
Self-care Disability44912.5%
Hearing Disability44912.5%

Data_Value_Unit categorical metadata

This column records the unit of measurement for the data values, and it is constant: every one of the 3592 rows carries the value "%". With cardinality 1, entropy 0, and top_rate 1.0, it provides no information for modelling or segmentation.

Treatment: Drop; constant column carrying no signal.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["Data_Value_Unit"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value %
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for Data_Value_Unit.
Show data table
Top values for Data_Value_Unit (1 unique shown, of 1 total).
valuecountshare
%3592100.0%

Data_Value_Type categorical metadata

This column records the type of data value reported, but every one of the 3592 rows holds the single label "Age-adjusted Prevalence". Cardinality is 1 and entropy is 0, so the field carries no information for modelling or segmentation. It likely exists as a schema placeholder from a wider source where multiple value types are possible.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["Data_Value_Type"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value Age-adjusted Prevalence
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 16.
Top values for Data_Value_Type.
Show data table
Top values for Data_Value_Type (1 unique shown, of 1 total).
valuecountshare
Age-adjusted Prevalence3592100.0%

Data_Value numeric feature

Data_Value is a continuous numeric measurement spanning 1.8 to 81.3 with a median of 9.1 but mean of 18.25, indicating heavy right skew (skew 1.88, kurtosis 2.09). The distribution flags 450 outliers (12.5% of rows) and the standard deviation (22.16) exceeds the mean, suggesting a long upper tail or a mixture of differently-scaled metrics. Nulls are negligible (0.14%) and there are no zeros, but only 486 unique values across 3,592 rows hints at rounding or a discrete reporting grid.

Treatment: Log-transform or winsorize before modelling to tame the right skew and 12.5% outlier load.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["Data_Value"].stats

statvalue
n3,592
nulls5 (0.1%)
unique486
min 1.8
max 81.3
mean 18.25
median 9.1
std 22.16
q1 5.3
q3 19.95
iqr 14.65
skew 1.876
kurtosis 2.086
n_outliers 450
outlier_rate 0.1255
zero_rate 0
alert: outliers12.5% rows beyond 1.5 IQR
Fig 17.
Distribution of Data_Value. Vertical dash marks the median.
Show data table
Histogram bins for Data_Value (median: 9.1).
bincount
1.8 – 3.788442
3.788 – 5.775606
5.775 – 7.763551
7.763 – 9.75297
9.75 – 11.74343
11.74 – 13.73237
13.73 – 15.71112
15.71 – 17.758
17.7 – 19.6938
19.69 – 21.6850
21.68 – 23.6684
23.66 – 25.6588
25.65 – 27.6479
27.64 – 29.6266
29.62 – 31.6129
31.61 – 33.625
33.6 – 35.5914
35.59 – 37.577
37.57 – 39.568
39.56 – 41.552
41.55 – 43.542
43.54 – 45.520
45.52 – 47.510
47.51 – 49.50
49.5 – 51.490
51.49 – 53.480
53.48 – 55.460
55.46 – 57.450
57.45 – 59.443
59.44 – 61.425
61.42 – 63.418
63.41 – 65.410
65.4 – 67.3921
67.39 – 69.3825
69.38 – 71.3644
71.36 – 73.3576
73.35 – 75.3481
75.34 – 77.3392
77.33 – 79.3166
79.31 – 81.318

Data_Value_Alt numeric feature

A numeric measurement field (likely an alternate encoding of Data_Value) ranging from 1.8 to 81.3 with a median of 9.1 and mean of 18.25. The distribution is heavily right-skewed (skew 1.88, kurtosis 2.09) with std 22.16 dwarfing the IQR of 14.65, and 12.5% of rows (450) flagged as outliers. Only 486 distinct values across 3,592 rows suggest a discretised or rounded scale rather than a continuous measure.

Treatment: Log-transform or winsorise before modelling to tame the right skew and outlier mass.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["Data_Value_Alt"].stats

statvalue
n3,592
nulls5 (0.1%)
unique486
min 1.8
max 81.3
mean 18.25
median 9.1
std 22.16
q1 5.3
q3 19.95
iqr 14.65
skew 1.876
kurtosis 2.086
n_outliers 450
outlier_rate 0.1255
zero_rate 0
alert: outliers12.5% rows beyond 1.5 IQR
Fig 18.
Distribution of Data_Value_Alt. Vertical dash marks the median.
Show data table
Histogram bins for Data_Value_Alt (median: 9.1).
bincount
1.8 – 3.788442
3.788 – 5.775606
5.775 – 7.763551
7.763 – 9.75297
9.75 – 11.74343
11.74 – 13.73237
13.73 – 15.71112
15.71 – 17.758
17.7 – 19.6938
19.69 – 21.6850
21.68 – 23.6684
23.66 – 25.6588
25.65 – 27.6479
27.64 – 29.6266
29.62 – 31.6129
31.61 – 33.625
33.6 – 35.5914
35.59 – 37.577
37.57 – 39.568
39.56 – 41.552
41.55 – 43.542
43.54 – 45.520
45.52 – 47.510
47.51 – 49.50
49.5 – 51.490
51.49 – 53.480
53.48 – 55.460
55.46 – 57.450
57.45 – 59.443
59.44 – 61.425
61.42 – 63.418
63.41 – 65.410
65.4 – 67.3921
67.39 – 69.3825
69.38 – 71.3644
71.36 – 73.3576
73.35 – 75.3481
75.34 – 77.3392
77.33 – 79.3166
79.31 – 81.318

Data_Value_Footnote_Symbol categorical metadata

This appears to be a footnote symbol marker, almost entirely empty with a 99.86% null rate and only 5 non-null entries — all the single character '*'. With cardinality of 1 and entropy of 0, the column carries no discriminative information.

Treatment: Drop; effectively constant with 99.86% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["Data_Value_Footnote_Symbol"].stats

statvalue
n3,592
nulls3,587 (99.9%)
unique1
top_value *
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate99.9% null
alert: imbalancetop value is 100.0% of rows
Fig 19.
Top values for Data_Value_Footnote_Symbol.
Show data table
Top values for Data_Value_Footnote_Symbol (1 unique shown, of 1 total).
valuecountshare
*50.1%

Data_Value_Footnote categorical metadata

This column is a footnote/annotation field accompanying a Data_Value column, used to flag exceptional rows. It is effectively empty: 99.86% null, with only 5 non-null entries, all carrying the single value "Data suppressed" (cardinality 1, entropy 0). It carries no discriminative information on its own and only marks the handful of suppressed measurements.

Treatment: Convert to a boolean is_suppressed flag and drop the original column.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["Data_Value_Footnote"].stats

statvalue
n3,592
nulls3,587 (99.9%)
unique1
top_value Data suppressed
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate99.9% null
alert: imbalancetop value is 100.0% of rows
Fig 20.
Top values for Data_Value_Footnote.
Show data table
Top values for Data_Value_Footnote (1 unique shown, of 1 total).
valuecountshare
Data suppressed50.1%

Low_Confidence_Limit numeric feature

This is the lower bound of a confidence interval for some measured rate or percentage, ranging from 1.1 to 80.5 with a median of 8.2. The distribution is heavily right-skewed (skew 1.90, kurtosis 2.16) and 12.57% of values flag as outliers, suggesting a long tail of high-confidence-floor estimates above the bulk of small values. Nulls are negligible (0.14%) and there are no zeros.

Treatment: Log-transform before modelling to tame the right skew, and pair with the matching upper limit.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["Low_Confidence_Limit"].stats

statvalue
n3,592
nulls5 (0.1%)
unique489
min 1.1
max 80.5
mean 17.31
median 8.2
std 21.89
q1 4.7
q3 18.7
iqr 14
skew 1.899
kurtosis 2.159
n_outliers 451
outlier_rate 0.1257
zero_rate 0
alert: outliers12.6% rows beyond 1.5 IQR
Fig 21.
Distribution of Low_Confidence_Limit. Vertical dash marks the median.
Show data table
Histogram bins for Low_Confidence_Limit (median: 8.2).
bincount
1.1 – 3.085375
3.085 – 5.07669
5.07 – 7.055587
7.055 – 9.04317
9.04 – 11.03324
11.03 – 13.01232
13.01 – 15104
15 – 16.9848
16.98 – 18.9738
18.97 – 20.9556
20.95 – 22.9491
22.94 – 24.9295
24.92 – 26.9178
26.91 – 28.8952
28.89 – 30.8826
30.88 – 32.8620
32.86 – 34.8512
34.85 – 36.837
36.83 – 38.824
38.82 – 40.83
40.8 – 42.790
42.79 – 44.770
44.77 – 46.760
46.76 – 48.740
48.74 – 50.730
50.73 – 52.710
52.71 – 54.70
54.7 – 56.683
56.68 – 58.671
58.67 – 60.658
60.65 – 62.648
62.64 – 64.6219
64.62 – 66.6122
66.61 – 68.5924
68.59 – 70.5852
70.58 – 72.5679
72.56 – 74.5576
74.55 – 76.5385
76.53 – 78.5261
78.52 – 80.511

High_Confidence_Limit numeric feature

A numeric upper-confidence-bound feature, ranging from 2.2 to 83.0 with a median of 10.1 but a mean of 19.26, indicating a long right tail. The distribution is heavily right-skewed (skew 1.85, kurtosis 2.01) and 12.5% of values (449 rows) are flagged as outliers. With 503 unique values across 3592 rows and only 0.14% nulls, it behaves as a continuous measurement rather than a categorical bound.

Treatment: Log-transform before modelling to compress the right tail and dampen the 12.5% outlier mass.

anthropic:claude-opus-4-7 · confidence high
Out[55]:

saturn.columns["High_Confidence_Limit"].stats

statvalue
n3,592
nulls5 (0.1%)
unique503
min 2.2
max 83
mean 19.26
median 10.1
std 22.4
q1 6
q3 21.5
iqr 15.5
skew 1.851
kurtosis 2.011
n_outliers 449
outlier_rate 0.1252
zero_rate 0
alert: outliers12.5% rows beyond 1.5 IQR
Fig 22.
Distribution of High_Confidence_Limit. Vertical dash marks the median.
Show data table
Histogram bins for High_Confidence_Limit (median: 10.1).
bincount
2.2 – 4.22407
4.22 – 6.24580
6.24 – 8.26534
8.26 – 10.28290
10.28 – 12.3347
12.3 – 14.32257
14.32 – 16.34137
16.34 – 18.3670
18.36 – 20.3846
20.38 – 22.449
22.4 – 24.4276
24.42 – 26.4492
26.44 – 28.4675
28.46 – 30.4872
30.48 – 32.534
32.5 – 34.5221
34.52 – 36.5426
36.54 – 38.5611
38.56 – 40.587
40.58 – 42.64
42.6 – 44.623
44.62 – 46.640
46.64 – 48.660
48.66 – 50.680
50.68 – 52.70
52.7 – 54.720
54.72 – 56.740
56.74 – 58.760
58.76 – 60.783
60.78 – 62.83
62.8 – 64.828
64.82 – 66.8410
66.84 – 68.8620
68.86 – 70.8826
70.88 – 72.949
72.9 – 74.9278
74.92 – 76.9496
76.94 – 78.9688
78.96 – 80.9855
80.98 – 8313

Number numeric feature

This is a numeric 'Number' column, almost certainly a count or quantity metric rather than an identifier given 2267 unique values across 3592 rows and a non-trivial null rate of 0.0014. The distribution is severely right-skewed (skew 14.57, kurtosis 256.99): the median is 978 while the mean is 3780 and the max reaches 327817, with 385 outliers (10.7%) flagged. The IQR (467 to 2750) is tiny relative to the max, so a handful of extreme values dominate the variance (std 15294).

Treatment: Log-transform (or winsorize) before any distance- or variance-based modelling.

anthropic:claude-opus-4-7 · confidence high
Out[58]:

saturn.columns["Number"].stats

statvalue
n3,592
nulls5 (0.1%)
unique2,267
min 31
max 327,817
mean 3780
median 978
std 1.529e+04
q1 467
q3 2,750
iqr 2,283
skew 14.57
kurtosis 257
n_outliers 385
outlier_rate 0.1073
zero_rate 0
alert: high_skewskew=+14.57
alert: outliers10.7% rows beyond 1.5 IQR
Fig 23.
Distribution of Number. Vertical dash marks the median.
Show data table
Histogram bins for Number (median: 978.0).
bincount
31 – 82263323
8226 – 1.642e+04124
1.642e+04 – 2.461e+0448
2.461e+04 – 3.281e+0435
3.281e+04 – 4.1e+0425
4.1e+04 – 4.92e+049
4.92e+04 – 5.739e+043
5.739e+04 – 6.559e+040
6.559e+04 – 7.378e+044
7.378e+04 – 8.198e+042
8.198e+04 – 9.017e+040
9.017e+04 – 9.837e+040
9.837e+04 – 1.066e+050
1.066e+05 – 1.148e+051
1.148e+05 – 1.23e+050
1.23e+05 – 1.311e+053
1.311e+05 – 1.393e+052
1.393e+05 – 1.475e+051
1.475e+05 – 1.557e+050
1.557e+05 – 1.639e+050
1.639e+05 – 1.721e+050
1.721e+05 – 1.803e+050
1.803e+05 – 1.885e+050
1.885e+05 – 1.967e+050
1.967e+05 – 2.049e+050
2.049e+05 – 2.131e+050
2.131e+05 – 2.213e+050
2.213e+05 – 2.295e+050
2.295e+05 – 2.377e+050
2.377e+05 – 2.459e+050
2.459e+05 – 2.541e+050
2.541e+05 – 2.623e+050
2.623e+05 – 2.705e+050
2.705e+05 – 2.786e+052
2.786e+05 – 2.868e+051
2.868e+05 – 2.95e+052
2.95e+05 – 3.032e+051
3.032e+05 – 3.114e+050
3.114e+05 – 3.196e+050
3.196e+05 – 3.278e+051

WeightedNumber numeric feature

WeightedNumber is a numeric measure with 3580 distinct values across 3592 rows, ranging from 1641 to 181,223,676 with a median of 418,252 but a mean of 2,103,449. The distribution is severely right-skewed (skew 14.65, kurtosis 262.16) and 444 rows (12.4%) fall outside the IQR fence, suggesting a long tail of very large weights dominating the mean.

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[61]:

saturn.columns["WeightedNumber"].stats

statvalue
n3,592
nulls5 (0.1%)
unique3,580
min 1,641
max 1.812e+08
mean 2.103e+06
median 418,252
std 9.082e+06
q1 149,677
q3 1.303e+06
iqr 1.153e+06
skew 14.65
kurtosis 262.2
n_outliers 444
outlier_rate 0.1238
zero_rate 0
alert: high_skewskew=+14.65
alert: outliers12.4% rows beyond 1.5 IQR
Fig 24.
Distribution of WeightedNumber. Vertical dash marks the median.
Show data table
Histogram bins for WeightedNumber (median: 418252.0).
bincount
1641 – 4.532e+063285
4.532e+06 – 9.063e+06156
9.063e+06 – 1.359e+0742
1.359e+07 – 1.812e+0740
1.812e+07 – 2.265e+0715
2.265e+07 – 2.718e+074
2.718e+07 – 3.172e+0719
3.172e+07 – 3.625e+0712
3.625e+07 – 4.078e+070
4.078e+07 – 4.531e+070
4.531e+07 – 4.984e+070
4.984e+07 – 5.437e+070
5.437e+07 – 5.89e+070
5.89e+07 – 6.343e+071
6.343e+07 – 6.796e+075
6.796e+07 – 7.249e+070
7.249e+07 – 7.702e+071
7.702e+07 – 8.155e+070
8.155e+07 – 8.608e+070
8.608e+07 – 9.061e+070
9.061e+07 – 9.514e+070
9.514e+07 – 9.967e+070
9.967e+07 – 1.042e+080
1.042e+08 – 1.087e+080
1.087e+08 – 1.133e+080
1.133e+08 – 1.178e+080
1.178e+08 – 1.223e+080
1.223e+08 – 1.269e+080
1.269e+08 – 1.314e+080
1.314e+08 – 1.359e+080
1.359e+08 – 1.404e+080
1.404e+08 – 1.45e+080
1.45e+08 – 1.495e+080
1.495e+08 – 1.54e+080
1.54e+08 – 1.586e+080
1.586e+08 – 1.631e+080
1.631e+08 – 1.676e+081
1.676e+08 – 1.722e+082
1.722e+08 – 1.767e+080
1.767e+08 – 1.812e+084

StratificationCategory1 categorical metadata

This column is a stratification dimension label, but every one of the 3592 rows holds the single value "Overall" (top_rate 1.0, cardinality 1, entropy 0.0). It carries no information and likely indicates this slice of the source dataset was filtered to the un-stratified aggregate.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[64]:

saturn.columns["StratificationCategory1"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value Overall
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 25.
Top values for StratificationCategory1.
Show data table
Top values for StratificationCategory1 (1 unique shown, of 1 total).
valuecountshare
Overall3592100.0%

Stratification1 categorical metadata

This column is a stratification label that takes the single value "Overall" across all 3592 rows. With cardinality 1 and entropy 0, it carries no information and cannot differentiate records. It likely indicates that this slice of the source data was not broken out by any subgroup.

Treatment: drop, constant column with a single value.

anthropic:claude-opus-4-7 · confidence high
Out[67]:

saturn.columns["Stratification1"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value Overall
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 26.
Top values for Stratification1.
Show data table
Top values for Stratification1 (1 unique shown, of 1 total).
valuecountshare
Overall3592100.0%

StratificationCategory2 unknown metadata

Column was skipped by the profiler, so no value-level statistics are available beyond a row count of 3592 and a null rate of 0.0. The name suggests a secondary stratification dimension used alongside a primary category, typical of public health or survey datasets. Without unique counts or value distributions, its content cannot be characterised further.

Treatment: Re-profile with the skip removed to inspect cardinality before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[70]:

saturn.columns["StratificationCategory2"].stats

statvalue
n3,592
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

Stratification2 unknown other

Saturn skipped detailed profiling for Stratification2, so only the row count (3592) and a 0.0 null rate are known. With no unique count, type, or value distribution available, the column's content cannot be characterised from this evidence alone. The name suggests a secondary stratification key paired with a primary Stratification1 field, but that is not confirmed by the stats.

Treatment: Re-profile or inspect raw values before deciding; do not use until kind and cardinality are established.

anthropic:claude-opus-4-7 · confidence low
Out[72]:

saturn.columns["Stratification2"].stats

statvalue
n3,592
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

CategoryID categorical metadata

CategoryID is a categorical column that carries no information: every one of the 3592 rows holds the single value "DISEST", giving cardinality 1 and entropy 0. It likely encodes a fixed dataset-level tag or filter rather than a per-row attribute.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high
Out[74]:

saturn.columns["CategoryID"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value DISEST
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 27.
Top values for CategoryID.
Show data table
Top values for CategoryID (1 unique shown, of 1 total).
valuecountshare
DISEST3592100.0%

IndicatorID categorical metadata

IndicatorID is a categorical column that holds the single value "STATTYPE" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and likely functions as a constant tag identifying the indicator type for this slice of the dataset.

Treatment: Drop before modelling; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[77]:

saturn.columns["IndicatorID"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value STATTYPE
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 28.
Top values for IndicatorID.
Show data table
Top values for IndicatorID (1 unique shown, of 1 total).
valuecountshare
STATTYPE3592100.0%

LocationID numeric foreign_key

LocationID is almost certainly a categorical location key encoded as integers, with 65 distinct values across 3592 rows and no nulls. Values range from 1 to 89 with a median of 36 and mild positive skew (0.50), consistent with an ID lookup rather than a measured quantity. Treating it as numeric would be misleading despite its int dtype.

Treatment: Cast to categorical and left-join to a location lookup table rather than using as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[80]:

saturn.columns["LocationID"].stats

statvalue
n3,592
nulls0 (0.0%)
unique65
min 1
max 89
mean 39.69
median 36
std 25.34
q1 20
q3 54
iqr 34
skew 0.5048
kurtosis -0.7622
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 29.
Distribution of LocationID. Vertical dash marks the median.
Show data table
Histogram bins for LocationID (median: 36.0).
bincount
1 – 3.2112
3.2 – 5.4112
5.4 – 7.656
7.6 – 9.8112
9.8 – 12112
12 – 14.2104
14.2 – 16.4112
16.4 – 18.6112
18.6 – 20.8112
20.8 – 23112
23 – 25.2168
25.2 – 27.4112
27.4 – 29.6112
29.6 – 31.8112
31.8 – 34112
34 – 36.2160
36.2 – 38.4112
38.4 – 40.6112
40.6 – 42.8112
42.8 – 4556
45 – 47.2168
47.2 – 49.4112
49.4 – 51.6112
51.6 – 53.856
53.8 – 56168
56 – 58.20
58.2 – 60.456
60.4 – 62.60
62.6 – 64.80
64.8 – 6756
67 – 69.20
69.2 – 71.40
71.4 – 73.656
73.6 – 75.80
75.8 – 780
78 – 80.280
80.2 – 82.4112
82.4 – 84.6112
84.6 – 86.8112
86.8 – 89168

ResponseID categorical feature

ResponseID holds 8 distinct codes (Q6COG, Q6DIS2, Q6MOB, Q6IND, Q6DIS1, Q6VIS, Q6SEL, Q6HEAR), each appearing exactly 449 times across 3592 rows with no nulls. The perfectly uniform distribution and entropy ratio of 1.0 indicate this is a question/disability-domain identifier replicated per respondent rather than a unique response key. Despite the name, it behaves as a categorical factor, not an identifier.

Treatment: Treat as a categorical factor (one-hot or group-by key); do not use as a unique row id.

anthropic:claude-opus-4-7 · confidence high
Out[83]:

saturn.columns["ResponseID"].stats

statvalue
n3,592
nulls0 (0.0%)
unique8
top_value Q6COG
top_rate 0.125
cardinality 8
entropy 3
entropy_ratio 1
Fig 30.
Top values for ResponseID.
Show data table
Top values for ResponseID (8 unique shown, of 8 total).
valuecountshare
Q6COG44912.5%
Q6DIS244912.5%
Q6MOB44912.5%
Q6IND44912.5%
Q6DIS144912.5%
Q6VIS44912.5%
Q6SEL44912.5%
Q6HEAR44912.5%

DataValueTypeID categorical metadata

DataValueTypeID is a categorical metadata field indicating the type of statistical measure reported, but every one of the 3592 rows carries the single value 'AGEADJPREV' (age-adjusted prevalence). Cardinality is 1 and entropy is 0, so the column carries no information for modelling or filtering.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[86]:

saturn.columns["DataValueTypeID"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value AGEADJPREV
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 31.
Top values for DataValueTypeID.
Show data table
Top values for DataValueTypeID (1 unique shown, of 1 total).
valuecountshare
AGEADJPREV3592100.0%

StratificationCategoryID1 categorical metadata

This column holds a single constant value "CAT1" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0, meaning it carries no information for any downstream task. The name suggests it was meant to identify a stratification category, but only one category is represented in this slice.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[89]:

saturn.columns["StratificationCategoryID1"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value CAT1
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 32.
Top values for StratificationCategoryID1.
Show data table
Top values for StratificationCategoryID1 (1 unique shown, of 1 total).
valuecountshare
CAT13592100.0%

StratificationID1 categorical metadata

This column holds a single constant value 'BO1' across all 3592 rows, with cardinality 1 and entropy 0. As a 'StratificationID1' it likely encodes a stratification dimension (e.g., overall/total) that was never varied in this slice. It carries no information for modelling or grouping.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[92]:

saturn.columns["StratificationID1"].stats

statvalue
n3,592
nulls0 (0.0%)
unique1
top_value BO1
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 33.
Top values for StratificationID1.
Show data table
Top values for StratificationID1 (1 unique shown, of 1 total).
valuecountshare
BO13592100.0%

StratificationCategoryID2 unknown other

This column is named StratificationCategoryID2, suggesting it holds a secondary stratification category identifier in a public-health style dataset. Saturn skipped profiling, so no uniqueness, value, or distribution stats are available beyond a row count of 3592 and a null rate of 0.0. Without further signals, its actual content and cardinality cannot be characterised here.

Treatment: Re-profile with type coercion to confirm whether this is a categorical key before use.

anthropic:claude-opus-4-7 · confidence low
Out[95]:

saturn.columns["StratificationCategoryID2"].stats

statvalue
n3,592
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

StratificationID2 unknown foreign_key

StratificationID2 was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. The only confirmed signals are that it has 3592 rows and a null rate of 0.0. The name suggests a secondary stratification key (e.g., demographic subgroup) commonly paired with a StratificationCategoryID2 in CDC-style indicator tables.

Treatment: Re-profile the column to determine cardinality, then treat as a categorical join key against its stratification lookup.

anthropic:claude-opus-4-7 · confidence low
Out[97]:

saturn.columns["StratificationID2"].stats

statvalue
n3,592
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

How to cite

click to copy

BibTeX
@misc{saturn-accessibility-atlas-cdc-dhds-disability-prevalence-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: accessibility atlas cdc dhds disability prevalence},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/accessibility-atlas-cdc_dhds_disability_prevalence}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: accessibility atlas cdc dhds disability prevalence. Source: /home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/accessibility-atlas-cdc_dhds_disability_prevalence