saturn·

accessibility atlas cdc dhds disability prevalence

source /home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv 3,592 rows 30 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,592 BRFSS-derived records of age-adjusted disability prevalence among U.S. adults 18+, broken out by state/territory (65 locations), year (2016-2022), and 8 disability response types. The core measure is Data_Value (percent prevalence), which ranges from 1.8% to 81.3% with a median of 9.1% and a heavily right-skewed distribution flagged for outliers. Most metadata columns (Category, Indicator, DataSource, Stratification1, etc.) are constant single-value fields and can be ignored as filters. The two things worth a closer look are the distribution of Data_Value across the 8 disability types in Response, and the geographic spread via LocationDesc — both are perfectly balanced in row counts, so any variation will come from the prevalence values themselves.

citing: row_count · column_count · Data_Value · Response · LocationDesc · Year · WeightedNumber · Category

Schema

30 columns
Per-column summary. Click column name to jump to its detail.
Alerts
Year numeric 0.0% 7
LocationAbbr categorical 0.0% 65
LocationDesc categorical 0.0% 65
DataSource categorical 0.0% 1
imbalance
Category categorical 0.0% 1
imbalance
Indicator categorical 0.0% 1
imbalance
Response categorical 0.0% 8
Data_Value_Unit categorical 0.0% 1
imbalance
Data_Value_Type categorical 0.0% 1
imbalance
Data_Value numeric 0.1% 486
outliers
Data_Value_Alt numeric 0.1% 486
outliers
Data_Value_Footnote_Symbol categorical 99.9% 1
null_rate imbalance
Data_Value_Footnote categorical 99.9% 1
null_rate imbalance
Low_Confidence_Limit numeric 0.1% 489
outliers
High_Confidence_Limit numeric 0.1% 503
outliers
Number numeric 0.1% 2,267
high_skew outliers
WeightedNumber numeric 0.1% 3,580
high_skew outliers
StratificationCategory1 categorical 0.0% 1
imbalance
Stratification1 categorical 0.0% 1
imbalance
StratificationCategory2 unknown 0.0%
skipped
Stratification2 unknown 0.0%
skipped
CategoryID categorical 0.0% 1
imbalance
IndicatorID categorical 0.0% 1
imbalance
LocationID numeric 0.0% 65
ResponseID categorical 0.0% 8
DataValueTypeID categorical 0.0% 1
imbalance
StratificationCategoryID1 categorical 0.0% 1
imbalance
StratificationID1 categorical 0.0% 1
imbalance
StratificationCategoryID2 unknown 0.0%
skipped
StratificationID2 unknown 0.0%
skipped

Year

numeric timestamp
This is a Year column spanning 2016 to 2022 with only 7 unique values across 3592 rows, no nulls, and a perfectly symmetric distribution centered on 2019 (mean = median = 2019). Despite being typed numeric, it functions as a low-cardinality temporal category. No outliers and zero zero-values, so the field is clean. Treatment: Treat as an ordinal/categorical year for grouping or one-hot encoding rather than a continuous numeric feature. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
7
min
2,016
max
2,022
mean
2,019
median
2,019
std
2.008
q1
2,017
q3
2,021
iqr
4
skew
0
kurtosis
-1.259
n_outliers
0
outlier_rate
0
zero_rate
0

LocationAbbr

categorical foreign_key
This is a US state/territory abbreviation code (e.g., PA, LA, AR, WY, GU), serving as a geographic key. With 65 unique values across 3592 rows and a near-uniform distribution (entropy ratio 0.999, top_rate just 0.0156), most codes appear exactly 56 times — suggesting a balanced panel of states/territories repeated across another dimension. The cardinality of 65 exceeds the 50 states, indicating territories and possibly national/regional aggregates are included. Treatment: left-join on this code to enrich with state/territory metadata, or one-hot encode for modelling. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
65
top_value
PA
top_rate
0.01559
cardinality
65
entropy
6.017
entropy_ratio
0.9992

LocationDesc

categorical feature
LocationDesc is a US state/territory name field with 65 distinct values including states, DC, and territories like Guam. The distribution is essentially uniform — entropy_ratio of 0.999 and the top 10 values all tie at 56 occurrences — suggesting this is a balanced panel where each location contributes the same number of rows. No nulls and a tidy, closed vocabulary. Treatment: Use as a categorical grouping key; one-hot or target-encode if modelling. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
65
top_value
Pennsylvania
top_rate
0.01559
cardinality
65
entropy
6.017
entropy_ratio
0.9992

DataSource

categorical metadata imbalance
This column records the dataset's provenance, with every one of the 3592 rows tagged "BRFSS". Cardinality is 1 and entropy is 0, so it carries no discriminative signal. Treatment: Drop; constant column adds no information. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
BRFSS
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Category

categorical metadata imbalance
This column is a single-valued tag labeling every row as "Disability Estimates" across all 3592 records. With cardinality of 1, top_rate of 1.0, and entropy of 0.0, it carries no information for modelling or filtering. Treatment: Drop; constant column with no variance. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
Disability Estimates
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Indicator

categorical metadata imbalance
This column holds a single constant string ('Disability status and types among adults 18 years of age or older') across all 3,592 rows, with cardinality 1 and entropy 0. It carries no information for modelling and likely just labels the survey indicator the dataset was filtered to. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
Disability status and types among adults 18 years of age or older
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Response

categorical label
This column enumerates a disability response category, with 8 distinct values such as 'Cognitive Disability', 'No Disability', and 'Hearing Disability'. The distribution is perfectly uniform — each of the 8 values appears exactly 449 times (top_rate 0.125, entropy_ratio 1.0), indicating the dataset is balanced or pivoted by category rather than sampled organically. There are no nulls. Treatment: Use as a categorical label; one-hot or factor encode for modelling. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
8
top_value
Cognitive Disability
top_rate
0.125
cardinality
8
entropy
3
entropy_ratio
1

Data_Value_Unit

categorical metadata imbalance
This column records the unit of measurement for the data values, and it is constant: every one of the 3592 rows carries the value "%". With cardinality 1, entropy 0, and top_rate 1.0, it provides no information for modelling or segmentation. Treatment: Drop; constant column carrying no signal. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
%
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Data_Value_Type

categorical metadata imbalance
This column records the type of data value reported, but every one of the 3592 rows holds the single label "Age-adjusted Prevalence". Cardinality is 1 and entropy is 0, so the field carries no information for modelling or segmentation. It likely exists as a schema placeholder from a wider source where multiple value types are possible. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
Age-adjusted Prevalence
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Data_Value

numeric feature outliers
Data_Value is a continuous numeric measurement spanning 1.8 to 81.3 with a median of 9.1 but mean of 18.25, indicating heavy right skew (skew 1.88, kurtosis 2.09). The distribution flags 450 outliers (12.5% of rows) and the standard deviation (22.16) exceeds the mean, suggesting a long upper tail or a mixture of differently-scaled metrics. Nulls are negligible (0.14%) and there are no zeros, but only 486 unique values across 3,592 rows hints at rounding or a discrete reporting grid. Treatment: Log-transform or winsorize before modelling to tame the right skew and 12.5% outlier load. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
486
min
1.8
max
81.3
mean
18.25
median
9.1
std
22.16
q1
5.3
q3
19.95
iqr
14.65
skew
1.876
kurtosis
2.086
n_outliers
450
outlier_rate
0.1255
zero_rate
0

Data_Value_Alt

numeric feature outliers
A numeric measurement field (likely an alternate encoding of Data_Value) ranging from 1.8 to 81.3 with a median of 9.1 and mean of 18.25. The distribution is heavily right-skewed (skew 1.88, kurtosis 2.09) with std 22.16 dwarfing the IQR of 14.65, and 12.5% of rows (450) flagged as outliers. Only 486 distinct values across 3,592 rows suggest a discretised or rounded scale rather than a continuous measure. Treatment: Log-transform or winsorise before modelling to tame the right skew and outlier mass. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
486
min
1.8
max
81.3
mean
18.25
median
9.1
std
22.16
q1
5.3
q3
19.95
iqr
14.65
skew
1.876
kurtosis
2.086
n_outliers
450
outlier_rate
0.1255
zero_rate
0

Data_Value_Footnote_Symbol

categorical metadata null_rate imbalance
This appears to be a footnote symbol marker, almost entirely empty with a 99.86% null rate and only 5 non-null entries — all the single character '*'. With cardinality of 1 and entropy of 0, the column carries no discriminative information. Treatment: Drop; effectively constant with 99.86% nulls. high · anthropic:claude-opus-4-7
n
3,592
nulls
3,587 (99.9%)
unique
1
top_value
*
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Data_Value_Footnote

categorical metadata null_rate imbalance
This column is a footnote/annotation field accompanying a Data_Value column, used to flag exceptional rows. It is effectively empty: 99.86% null, with only 5 non-null entries, all carrying the single value "Data suppressed" (cardinality 1, entropy 0). It carries no discriminative information on its own and only marks the handful of suppressed measurements. Treatment: Convert to a boolean is_suppressed flag and drop the original column. high · anthropic:claude-opus-4-7
n
3,592
nulls
3,587 (99.9%)
unique
1
top_value
Data suppressed
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Low_Confidence_Limit

numeric feature outliers
This is the lower bound of a confidence interval for some measured rate or percentage, ranging from 1.1 to 80.5 with a median of 8.2. The distribution is heavily right-skewed (skew 1.90, kurtosis 2.16) and 12.57% of values flag as outliers, suggesting a long tail of high-confidence-floor estimates above the bulk of small values. Nulls are negligible (0.14%) and there are no zeros. Treatment: Log-transform before modelling to tame the right skew, and pair with the matching upper limit. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
489
min
1.1
max
80.5
mean
17.31
median
8.2
std
21.89
q1
4.7
q3
18.7
iqr
14
skew
1.899
kurtosis
2.159
n_outliers
451
outlier_rate
0.1257
zero_rate
0

High_Confidence_Limit

numeric feature outliers
A numeric upper-confidence-bound feature, ranging from 2.2 to 83.0 with a median of 10.1 but a mean of 19.26, indicating a long right tail. The distribution is heavily right-skewed (skew 1.85, kurtosis 2.01) and 12.5% of values (449 rows) are flagged as outliers. With 503 unique values across 3592 rows and only 0.14% nulls, it behaves as a continuous measurement rather than a categorical bound. Treatment: Log-transform before modelling to compress the right tail and dampen the 12.5% outlier mass. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
503
min
2.2
max
83
mean
19.26
median
10.1
std
22.4
q1
6
q3
21.5
iqr
15.5
skew
1.851
kurtosis
2.011
n_outliers
449
outlier_rate
0.1252
zero_rate
0

Number

numeric feature high_skew outliers
This is a numeric 'Number' column, almost certainly a count or quantity metric rather than an identifier given 2267 unique values across 3592 rows and a non-trivial null rate of 0.0014. The distribution is severely right-skewed (skew 14.57, kurtosis 256.99): the median is 978 while the mean is 3780 and the max reaches 327817, with 385 outliers (10.7%) flagged. The IQR (467 to 2750) is tiny relative to the max, so a handful of extreme values dominate the variance (std 15294). Treatment: Log-transform (or winsorize) before any distance- or variance-based modelling. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
2,267
min
31
max
327,817
mean
3780
median
978
std
1.529e+04
q1
467
q3
2,750
iqr
2,283
skew
14.57
kurtosis
257
n_outliers
385
outlier_rate
0.1073
zero_rate
0

WeightedNumber

numeric feature high_skew outliers
WeightedNumber is a numeric measure with 3580 distinct values across 3592 rows, ranging from 1641 to 181,223,676 with a median of 418,252 but a mean of 2,103,449. The distribution is severely right-skewed (skew 14.65, kurtosis 262.16) and 444 rows (12.4%) fall outside the IQR fence, suggesting a long tail of very large weights dominating the mean. Treatment: log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,592
nulls
5 (0.1%)
unique
3,580
min
1,641
max
1.812e+08
mean
2.103e+06
median
418,252
std
9.082e+06
q1
149,677
q3
1.303e+06
iqr
1.153e+06
skew
14.65
kurtosis
262.2
n_outliers
444
outlier_rate
0.1238
zero_rate
0

StratificationCategory1

categorical metadata imbalance
This column is a stratification dimension label, but every one of the 3592 rows holds the single value "Overall" (top_rate 1.0, cardinality 1, entropy 0.0). It carries no information and likely indicates this slice of the source dataset was filtered to the un-stratified aggregate. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
Overall
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

Stratification1

categorical metadata imbalance
This column is a stratification label that takes the single value "Overall" across all 3592 rows. With cardinality 1 and entropy 0, it carries no information and cannot differentiate records. It likely indicates that this slice of the source data was not broken out by any subgroup. Treatment: drop, constant column with a single value. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
Overall
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

StratificationCategory2

unknown metadata skipped
Column was skipped by the profiler, so no value-level statistics are available beyond a row count of 3592 and a null rate of 0.0. The name suggests a secondary stratification dimension used alongside a primary category, typical of public health or survey datasets. Without unique counts or value distributions, its content cannot be characterised further. Treatment: Re-profile with the skip removed to inspect cardinality before deciding on encoding. low · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique

Stratification2

unknown other skipped
Saturn skipped detailed profiling for Stratification2, so only the row count (3592) and a 0.0 null rate are known. With no unique count, type, or value distribution available, the column's content cannot be characterised from this evidence alone. The name suggests a secondary stratification key paired with a primary Stratification1 field, but that is not confirmed by the stats. Treatment: Re-profile or inspect raw values before deciding; do not use until kind and cardinality are established. low · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique

CategoryID

categorical metadata imbalance
CategoryID is a categorical column that carries no information: every one of the 3592 rows holds the single value "DISEST", giving cardinality 1 and entropy 0. It likely encodes a fixed dataset-level tag or filter rather than a per-row attribute. Treatment: Drop; constant column with zero variance. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
DISEST
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

IndicatorID

categorical metadata imbalance
IndicatorID is a categorical column that holds the single value "STATTYPE" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and likely functions as a constant tag identifying the indicator type for this slice of the dataset. Treatment: Drop before modelling; constant column with no variance. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
STATTYPE
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

LocationID

numeric foreign_key
LocationID is almost certainly a categorical location key encoded as integers, with 65 distinct values across 3592 rows and no nulls. Values range from 1 to 89 with a median of 36 and mild positive skew (0.50), consistent with an ID lookup rather than a measured quantity. Treating it as numeric would be misleading despite its int dtype. Treatment: Cast to categorical and left-join to a location lookup table rather than using as a numeric feature. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
65
min
1
max
89
mean
39.69
median
36
std
25.34
q1
20
q3
54
iqr
34
skew
0.5048
kurtosis
-0.7622
n_outliers
0
outlier_rate
0
zero_rate
0

ResponseID

categorical feature
ResponseID holds 8 distinct codes (Q6COG, Q6DIS2, Q6MOB, Q6IND, Q6DIS1, Q6VIS, Q6SEL, Q6HEAR), each appearing exactly 449 times across 3592 rows with no nulls. The perfectly uniform distribution and entropy ratio of 1.0 indicate this is a question/disability-domain identifier replicated per respondent rather than a unique response key. Despite the name, it behaves as a categorical factor, not an identifier. Treatment: Treat as a categorical factor (one-hot or group-by key); do not use as a unique row id. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
8
top_value
Q6COG
top_rate
0.125
cardinality
8
entropy
3
entropy_ratio
1

DataValueTypeID

categorical metadata imbalance
DataValueTypeID is a categorical metadata field indicating the type of statistical measure reported, but every one of the 3592 rows carries the single value 'AGEADJPREV' (age-adjusted prevalence). Cardinality is 1 and entropy is 0, so the column carries no information for modelling or filtering. Treatment: Drop; constant column with no variance. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
AGEADJPREV
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

StratificationCategoryID1

categorical metadata imbalance
This column holds a single constant value "CAT1" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0, meaning it carries no information for any downstream task. The name suggests it was meant to identify a stratification category, but only one category is represented in this slice. Treatment: Drop; constant column with no variance. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
CAT1
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

StratificationID1

categorical metadata imbalance
This column holds a single constant value 'BO1' across all 3592 rows, with cardinality 1 and entropy 0. As a 'StratificationID1' it likely encodes a stratification dimension (e.g., overall/total) that was never varied in this slice. It carries no information for modelling or grouping. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique
1
top_value
BO1
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

StratificationCategoryID2

unknown other skipped
This column is named StratificationCategoryID2, suggesting it holds a secondary stratification category identifier in a public-health style dataset. Saturn skipped profiling, so no uniqueness, value, or distribution stats are available beyond a row count of 3592 and a null rate of 0.0. Without further signals, its actual content and cardinality cannot be characterised here. Treatment: Re-profile with type coercion to confirm whether this is a categorical key before use. low · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique

StratificationID2

unknown foreign_key skipped
StratificationID2 was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. The only confirmed signals are that it has 3592 rows and a null rate of 0.0. The name suggests a secondary stratification key (e.g., demographic subgroup) commonly paired with a StratificationCategoryID2 in CDC-style indicator tables. Treatment: Re-profile the column to determine cardinality, then treat as a categorical join key against its stratification lookup. low · anthropic:claude-opus-4-7
n
3,592
nulls
0 (0.0%)
unique