saturn·

healthcare healthcare desert merged

source /home/coolhand/datasets/us-inequality-atlas/healthcare/healthcare_desert_merged.csv 3,222 rows 10 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset profiles 3,222 U.S. counties (one row per county, keyed by FIPS) with population, uninsured counts and rates, poverty rate, a hospital closure risk score, and rural/urban flags. Population and uninsured figures are extremely right-skewed (total_pop skew 13.4, uninsured_pop skew 17.8), so a handful of large counties will dominate any raw totals — analysis should likely use rates or log scales. The hospital_closure_risk_score collapses to just 3 distinct values (with ~29% scoring 0), and risk_category is heavily imbalanced with 84% of counties labeled 'Low' and the rest 'Moderate', which is worth examining first. About 69% of counties are flagged Rural, so rural/urban comparisons of uninsured and poverty rates should be a productive next cut.

citing: total_pop · uninsured_pop · uninsured_rate · poverty_rate · hospital_closure_risk_score · risk_category · rural_category

Schema

10 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
total_pop numeric 0.0% 3,141
high_skew outliers
uninsured_pop numeric 0.0% 584
high_skew outliers
uninsured_rate numeric 0.0% 152
high_skew outliers
poverty_rate numeric 0.0% 1,719
high_skew
rural categorical 0.0% 2
rural_category categorical 0.0% 2
hospital_closure_risk_score numeric 0.0% 3
risk_category categorical 0.0% 2

fips

numeric identifier
This is the FIPS county code: 3222 rows with 3222 unique values, no nulls, and a min of 1001 / max of 72153 consistent with the U.S. county FIPS scheme (state prefix * 1000 + county). Distribution is near-uniform across that range (skew 0.16, kurtosis -0.63, no outliers), confirming it indexes geography rather than measuring anything. Treat it as a categorical key, not a quantity, despite the numeric dtype. Treatment: Cast to zero-padded string and left-join on this county FIPS code; do not use as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g. 'X County, State'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, confirming a 'County, ' format, while the remaining ~223 rows likely use alternate suffixes like Parish or Borough. Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with national county counts. Treatment: Use as a join key after splitting into county and state components. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_pop

numeric feature high_skew outliers
This is almost certainly a population count per geographic unit (likely US counties given n=3222), with values ranging from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with the mean (102,232) nearly four times the median and 453 outliers (14.06%) — the standard deviation of 326,934 dwarfs the IQR of 54,579. No nulls or zeros, and 3,141 of 3,222 values are unique. Treatment: Log-transform before any modelling or distance-based analysis to tame the extreme right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,141
min
47
max
9.867e+06
mean
1.022e+05
median
25,328
std
3.269e+05
q1
1.061e+04
q3
65,190
iqr
5.458e+04
skew
13.38
kurtosis
298.7
n_outliers
453
outlier_rate
0.1406
zero_rate
0

uninsured_pop

numeric feature high_skew outliers
Counts of uninsured residents per record, with values ranging from 0 to 20,915 across 3,222 rows and no nulls. The distribution is severely right-skewed (skew 17.81, kurtosis 462.87): the median is 36 while the mean is 159.95, and 17.2% of rows are zero. 368 outliers (11.4%) sit far above the Q3 of 120, consistent with a few very large populations dominating the tail. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
584
min
0
max
20,915
mean
159.9
median
36
std
627.2
q1
7
q3
120
iqr
113
skew
17.81
kurtosis
462.9
n_outliers
368
outlier_rate
0.1142
zero_rate
0.1723

uninsured_rate

numeric feature high_skew outliers
This appears to be an uninsured rate per record, expressed as a proportion ranging from 0.0 to 3.7 with a median of 0.12. The maximum of 3.7 is suspicious for a rate that should cap at 1.0, and the distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. Treatment: Investigate values >1.0 for unit errors, then log-transform or winsorize before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
152
min
0
max
3.7
mean
0.2002
median
0.12
std
0.2829
q1
0.04
q3
0.25
iqr
0.21
skew
4.095
kurtosis
27.7
n_outliers
230
outlier_rate
0.07138
zero_rate
0.1754

poverty_rate

numeric feature high_skew
This is a numeric poverty rate (likely percentage of population in poverty) across 3222 rows with no nulls and 1719 unique values. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with a median of 13.55 and mean 15.10, ranging from 1.6 to 66.32; 137 outliers (4.25%) sit in the upper tail. The high skew alert means a long tail of high-poverty units pulls the mean above the median. Treatment: Consider a log or sqrt transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0

rural

categorical feature
Binary flag indicating whether a record is rural, stored as the strings "True"/"False" rather than booleans. The split is imbalanced toward rural at 68.7% (2212 of 3222) versus 1010 non-rural, with no nulls. Entropy ratio of 0.897 confirms a meaningful but skewed distribution. Treatment: Cast string "True"/"False" to a 0/1 boolean and use directly as a feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2
top_value
True
top_rate
0.6865
cardinality
2
entropy
0.8971
entropy_ratio
0.8971

rural_category

categorical feature
Binary categorical flag splitting records into 'Rural' (2212, 68.7%) and 'Urban/Suburban' (1010), with no nulls across 3222 rows. The split is moderately imbalanced but entropy ratio of 0.90 indicates both classes are well represented. Clean two-level partition suitable as a stratifier or feature. Treatment: One-hot or binary-encode for modelling; consider stratifying splits on this flag. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2
top_value
Rural
top_rate
0.6865
cardinality
2
entropy
0.8971
entropy_ratio
0.8971

hospital_closure_risk_score

numeric feature
Despite being typed as numeric, hospital_closure_risk_score takes only 3 distinct values across 3222 rows, spanning 0 to 50 with a median of 25 and roughly 28.8% zeros. This is effectively an ordinal risk band (likely 0/25/50) masquerading as a continuous score, so the reported mean of 21.69 and std of 16.34 reflect category mix rather than a smooth distribution. Treatment: Treat as an ordinal categorical (low/medium/high) rather than a continuous numeric. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3
min
0
max
50
mean
21.69
median
25
std
16.34
q1
0
q3
25
iqr
25
skew
0.1414
kurtosis
-0.6949
n_outliers
0
outlier_rate
0
zero_rate
0.2883

risk_category

categorical label
Binary risk classification flagging records as either Low or Moderate, with no nulls across 3,222 rows. The distribution is heavily imbalanced: 84.4% fall into Low (2,719) versus only 503 Moderate, and no High tier appears at all. Entropy ratio of 0.62 confirms the skew. Treatment: Treat as binary target; account for class imbalance via stratified sampling or class weighting. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2
top_value
Low
top_rate
0.8439
cardinality
2
entropy
0.6249
entropy_ratio
0.6249