saturn·

disability census disability by county 2022

source /home/coolhand/datasets/us-inequality-atlas/disability/census_disability_by_county_2022.csv 3,222 rows 16 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 2022 US Census disability counts for 3,222 counties, broken out by disability type (ambulatory, cognitive, hearing, vision, self-care, independent living) along with totals, a derived disability rate, and FIPS identifiers. Nearly every count column is heavily right-skewed (skew above 10) with substantial outliers — total_population alone ranges from 47 to 9.87M with a mean of ~102K but a median of just 25,328, so a handful of large counties dominate the raw counts. The disability_rate field is the most analyst-friendly view: it's bounded, less skewed (skew 2.17), and centers around a median of 1.07 with an IQR of 0.77–1.42. Start with disability_rate to compare counties on equal footing, then look at total_population to understand the size distribution before interpreting any raw disability counts.

citing: row_count · column_count · total_population · disability_rate · disability_total · ambulatory_disability · independent_living_disability · state_fips

Schema

16 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
state_fips numeric 0.0% 52
county_fips numeric 0.0% 330
high_skew outliers
total_population numeric 0.0% 3,141
high_skew outliers
disability_total numeric 0.0% 1,393
high_skew outliers
disability_rate numeric 0.0% 305
high_skew
no_disability numeric 0.0% 2,955
high_skew outliers
one_disability numeric 0.0% 1,212
high_skew outliers
two_plus_disabilities numeric 0.0% 786
high_skew outliers
hearing_disability numeric 0.0% 2,314
high_skew outliers
vision_disability numeric 0.0% 2,349
high_skew outliers
cognitive_disability numeric 0.0% 2,473
high_skew outliers
ambulatory_disability numeric 0.0% 2,614
high_skew outliers
self_care_disability numeric 0.0% 1,961
high_skew outliers
independent_living_disability numeric 0.0% 2,773
high_skew outliers

fips

numeric identifier
This column is the FIPS county code: every one of the 3222 rows is unique and non-null, and the value range (1001 to 72153) matches the standard US state+county FIPS encoding. The distribution is near-symmetric (skew 0.16, kurtosis -0.63) with no outliers, which is expected for an identifier rather than a measured quantity. Treat it as a categorical key, not a number. Treatment: Cast to zero-padded string and use as a join key to county-level data; do not feed as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
Each of the 3,222 rows holds a unique county-and-state string (e.g., '... County, Texas'), averaging 24 characters and roughly 3 words. The token 'county,' appears 2,999 times, so a small minority of entries use a different suffix (parish, borough, census area). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with a full U.S. county roster. Treatment: Split into county and state fields, then left-join on a FIPS lookup. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state_fips

numeric foreign_key
This is almost certainly the US state FIPS code: 52 unique integer values across 3222 rows, ranging from 1 to 72 with no nulls or zeros. The count of 52 (rather than 50) and a max of 72 indicate inclusion of DC and territories like Puerto Rico. Distribution is roughly uniform (skew 0.16, kurtosis -0.63), consistent with a categorical geographic identifier rather than a measurement. Treatment: Cast to categorical/string and join to a state lookup table; do not treat as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
min
1
max
72
mean
31.27
median
30
std
16.29
q1
19
q3
46
iqr
27
skew
0.1574
kurtosis
-0.6267
n_outliers
0
outlier_rate
0
zero_rate
0

county_fips

numeric foreign_key high_skew outliers
This is the county-level portion of a FIPS code, stored as an integer from 1 to 840 across 3222 rows with no nulls and only 330 distinct values. The distribution is heavily right-skewed (skew 2.87, kurtosis 11.64) with 178 outliers (5.5%), which is expected since county codes are categorical identifiers and most values cluster low (median 79, Q3 133) while a few counties carry much larger codes. Treating this as a numeric feature would be misleading despite the numeric dtype. Treatment: Cast to categorical and combine with state FIPS to join on full county identifier. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
330
min
1
max
840
mean
103.2
median
79
std
106.6
q1
35
q3
133
iqr
98
skew
2.866
kurtosis
11.64
n_outliers
178
outlier_rate
0.05525
zero_rate
0

total_population

numeric feature high_skew outliers
Likely a county- or region-level total population count across 3,222 rows with no nulls and 3,141 unique values. The distribution is extremely right-skewed (skew 13.38, kurtosis 298.69): the median is 25,328 but the mean is 102,232 and the max reaches 9,866,623, with 453 outliers (14.06%) flagged above the IQR fence. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,141
min
47
max
9.867e+06
mean
1.022e+05
median
25,328
std
3.269e+05
q1
1.061e+04
q3
65,190
iqr
5.458e+04
skew
13.38
kurtosis
298.7
n_outliers
453
outlier_rate
0.1406
zero_rate
0

disability_total

numeric feature high_skew outliers
A heavily right-skewed count of disability cases or claims, ranging from 0 to 69,705 with a median of just 298 but a mean of 1,043. Skew of 10.28 and kurtosis of 166.8 indicate an extreme long tail, with 404 outliers (12.5% of rows) and only 1.7% zeros. The std (2,906) dwarfs the IQR (689), so a small number of very large records dominate the distribution. Treatment: Apply a log1p transform before modelling to tame the extreme skew and outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,393
min
0
max
69,705
mean
1043
median
298
std
2906
q1
107
q3
796.2
iqr
689.2
skew
10.28
kurtosis
166.8
n_outliers
404
outlier_rate
0.1254
zero_rate
0.01676

disability_rate

numeric feature high_skew
Numeric disability_rate spanning 0.0 to 9.17 with a median of 1.07 and IQR 0.77-1.42, almost certainly a per-row rate or percentage. The distribution is heavily right-skewed (skew 2.17, kurtosis 15.24) with 117 outliers (3.6%) stretching well beyond the typical range, and 1.7% of rows sit at exactly zero. No nulls across 3,222 rows, and only 305 distinct values suggest rounding to two decimals. Treatment: Apply a log or winsorising transform before modelling to tame the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
305
min
0
max
9.17
mean
1.145
median
1.07
std
0.6215
q1
0.77
q3
1.42
iqr
0.65
skew
2.167
kurtosis
15.24
n_outliers
117
outlier_rate
0.03631
zero_rate
0.01676

no_disability

numeric feature high_skew outliers
Counts of people recorded as having no disability per geographic or administrative unit, ranging from 0 to 2,091,332 with a median of 5,607. The distribution is extremely right-skewed (skew 12.67, kurtosis 259.77) and the mean of 22,872 sits well above Q3 of 14,739, with 442 outliers (13.7%) flagging a long tail of very large units. Only one zero is present and there are no nulls, so the heavy tail—not missingness—is the dominant feature. Treatment: Log-transform (log1p) before modelling to tame the skew and outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,955
min
0
max
2.091e+06
mean
2.287e+04
median
5,607
std
7.329e+04
q1
2384
q3
1.474e+04
iqr
12,355
skew
12.67
kurtosis
259.8
n_outliers
442
outlier_rate
0.1372
zero_rate
0.0003104

one_disability

numeric feature high_skew outliers
This column appears to be a count of people with one disability per geographic or administrative unit, ranging from 0 to 44,466 with a median of 217.5. The distribution is severely right-skewed (skew 9.45, kurtosis 139.4), with the mean (755.7) more than triple the median and 408 outliers (12.7% of rows) — consistent with a few very large units dominating a long tail of small ones. About 2.8% of rows are zero and there are no nulls. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,212
min
0
max
44,466
mean
755.7
median
217.5
std
2032
q1
76.25
q3
586.8
iqr
510.5
skew
9.449
kurtosis
139.4
n_outliers
408
outlier_rate
0.1266
zero_rate
0.02793

two_plus_disabilities

numeric feature high_skew outliers
This column appears to be a count of people (likely per geographic unit) reporting two or more disabilities, ranging from 0 to 25,239 with a median of just 76. The distribution is extremely right-skewed (skew 12.57, kurtosis 253.95), with 11.67% of rows flagged as outliers and ~9% exact zeros, suggesting a few very large jurisdictions dominate while most are small. The mean (287.7) sits well above Q3 (222), confirming the long tail. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
786
min
0
max
25,239
mean
287.7
median
76
std
890.1
q1
21
q3
222
iqr
201
skew
12.57
kurtosis
253.9
n_outliers
376
outlier_rate
0.1167
zero_rate
0.0897

hearing_disability

numeric feature high_skew outliers
This appears to be a count or population-style measure related to hearing disability, with all 3222 rows populated and 2314 distinct values ranging from 1 to 296898. The distribution is extremely right-skewed (skew 11.54, kurtosis 226.6) with a median of 1326 well below the mean of 4003, and 391 outliers (12.1%) inflate the tail. The min of 1 and absence of zeros suggest these are aggregated counts rather than individual indicators. Treatment: log-transform before regression to tame the heavy right tail. medium · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,314
min
1
max
296,898
mean
4003
median
1,326
std
1.068e+04
q1
579.2
q3
3,193
iqr
2614
skew
11.54
kurtosis
226.6
n_outliers
391
outlier_rate
0.1214
zero_rate
0

vision_disability

numeric feature high_skew outliers
Numeric counts of people with a vision disability per geographic or demographic unit, ranging from 0 to 346,901 with a median of 1,361. The distribution is extremely right-skewed (skew 12.29, kurtosis 254.79) and the mean of 4,246 sits well above Q3 of 3,291, with 380 outliers (11.8%) inflating the upper tail. Near-zero zero_rate (0.03%) and no nulls suggest clean population-style aggregates rather than survey responses. Treatment: Log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,349
min
0
max
346,901
mean
4246
median
1,361
std
1.205e+04
q1
567
q3
3,291
iqr
2,724
skew
12.29
kurtosis
254.8
n_outliers
380
outlier_rate
0.1179
zero_rate
0.0003104

cognitive_disability

numeric feature high_skew outliers
Likely a count of people with a cognitive disability per geographic or administrative unit, ranging from 0 to 413,990 with a median of 1,623. The distribution is severely right-skewed (skew 12.09, kurtosis 254.7) with 375 outliers (11.6% of rows) and a mean (5,142) more than triple the median, indicating a few very large units dominate. Near-zero null and zero rates suggest the count is reliably populated. Treatment: Apply a log or log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,473
min
0
max
413,990
mean
5142
median
1,623
std
1.421e+04
q1
634
q3
4173
iqr
3539
skew
12.09
kurtosis
254.7
n_outliers
375
outlier_rate
0.1164
zero_rate
0.0003104

ambulatory_disability

numeric feature high_skew outliers
Counts of people with ambulatory disability per geographic unit, ranging from 3 to 548,175 with a median of 2,197. The distribution is severely right-skewed (skew 13.0, kurtosis 288.7) and 11.4% of rows are flagged as outliers, indicating a long tail of very large jurisdictions dominating the mean (6,497) versus the median. Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,614
min
3
max
548,175
mean
6497
median
2,197
std
1.822e+04
q1
917.2
q3
5261
iqr
4,344
skew
13.01
kurtosis
288.7
n_outliers
366
outlier_rate
0.1136
zero_rate
0

self_care_disability

numeric feature high_skew outliers
Numeric counts of people with a self-care disability, likely aggregated per geographic or demographic unit given the 3222 rows and 1961 unique values. The distribution is severely right-skewed (skew 16.8, kurtosis 478.7) with a median of 772.5 but a max of 281,611, and 355 outliers (11.0% of rows) sit far above the Q3 of 1948.5. Near-zero null and zero rates suggest the field is consistently populated. Treatment: Log-transform before modelling and consider winsorising the long upper tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,961
min
0
max
281,611
mean
2504
median
772.5
std
8061
q1
307
q3
1948
iqr
1642
skew
16.82
kurtosis
478.7
n_outliers
355
outlier_rate
0.1102
zero_rate
0.001241

independent_living_disability

numeric feature high_skew outliers
Counts of people with an independent-living disability per geographic unit, ranging from 2 to 1,417,825 with a median of 3,135. The distribution is severely right-skewed (skew 14.09, kurtosis 329.97) and 13.8% of rows (445) flag as outliers, suggesting the column mixes small areas with very large aggregates. No nulls or zeros are present. Treatment: Log-transform before modelling and consider normalising by area population to control the heavy skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,773
min
2
max
1.418e+06
mean
1.363e+04
median
3,135
std
4.56e+04
q1
1242
q3
8586
iqr
7344
skew
14.09
kurtosis
330
n_outliers
445
outlier_rate
0.1381
zero_rate
0