data trove healthcare deserts

source /home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv 3,222 rows 10 columns profiled 2026-06-22 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:default

This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.

citing: row_count · column_count · total_pop.stats.median · total_pop.stats.max · uninsured_pop.stats.median · uninsured_pop.stats.max · uninsured_rate.stats.max · uninsured_rate.stats.median · hospital_closure_risk_score.n_unique · hospital_closure_risk_score.stats.zero_rate · risk_category.top_value · risk_category.top_rate · rural_category.top_value · rural_category.top_rate · poverty_rate.stats.median · poverty_rate.stats.max

Charts the summary said to look at first

uninsured_rate · Look for the heavy right tail — most counties cluster near low uninsured rates, but extreme outliers signal counties with severe coverage gaps.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

risk_category · Notice how overwhelmingly 'Low' dominates — only about 16% of counties carry Moderate hospital closure risk.

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

rural_category · Nearly 7 in 10 counties are Rural, which sets important context for interpreting healthcare access shortfalls across the dataset.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

poverty_rate · The distribution is right-skewed with a median near 13.6% — watch for the tail of counties exceeding 40–66% poverty rates.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

hospital_closure_risk_score · With only 3 unique values (0, 25, 50), this chart reveals that nearly 29% of counties score zero, suggesting a coarse or binary underlying measure.

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

Schema

10 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	numeric	0.0%	3,222
county_name	text	0.0%	3,222	near_unique
total_pop	numeric	0.0%	3,141	high_skew outliers
uninsured_pop	numeric	0.0%	584	high_skew outliers
uninsured_rate	numeric	0.0%	152	high_skew outliers
poverty_rate	numeric	0.0%	1,719	high_skew
rural	categorical	0.0%	2
rural_category	categorical	0.0%	2
hospital_closure_risk_score	numeric	0.0%	3
risk_category	categorical	0.0%	2

fips

numeric identifier

This column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes. Treatment: Cast to string/categorical and use as a geographic join key; do not use as a numeric feature. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
min: 1,001
max: 72,153
mean: 3.138e+04
median: 30,022
std: 1.63e+04
q1: 1.903e+04
q3: 4.61e+04
iqr: 27,075
skew: 0.1574
kurtosis: -0.6314
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_name

text label near_unique

This column contains fully-qualified county name strings, almost certainly formatted as ' County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset. Treatment: Parse into county and state components for join or groupby operations; do not treat as a free-text feature. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
len_min: 16
len_max: 59
len_mean: 24.32
len_median: 24
len_p95: 31
word_mean: 3.248
word_median: 3
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,990
readability_flesch_mean: 10.28
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

total_pop

numeric feature high_skew outliers

This column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model. Treatment: Log-transform (log1p) before modelling to compress the extreme right tail. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 3,141
min: 47
max: 9.867e+06
mean: 1.022e+05
median: 25,328
std: 3.269e+05
q1: 1.061e+04
q3: 65,190
iqr: 5.458e+04
skew: 13.38
kurtosis: 298.7
n_outliers: 453
outlier_rate: 0.1406
zero_rate: 0

uninsured_pop

numeric feature high_skew outliers

This column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations. Treatment: Log-transform (e.g., log1p) before regression or clustering to reduce skew; consider normalizing by total population to produce an uninsured rate. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 584
min: 0
max: 20,915
mean: 159.9
median: 36
std: 627.2
q1: 7
q3: 120
iqr: 113
skew: 17.81
kurtosis: 462.9
n_outliers: 368
outlier_rate: 0.1142
zero_rate: 0.1723

uninsured_rate

numeric feature high_skew outliers

This column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units. Treatment: Investigate values > 1.0 for scale inconsistency, recode zeros if they represent missingness, then log-transform or apply a bounded transformation before modelling. medium · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 152
min: 0
max: 3.7
mean: 0.2002
median: 0.12
std: 0.2829
q1: 0.04
q3: 0.25
iqr: 0.21
skew: 4.095
kurtosis: 27.7
n_outliers: 230
outlier_rate: 0.07138
zero_rate: 0.1754

poverty_rate

numeric feature high_skew

This column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model. Treatment: Log-transform or apply a Box-Cox transformation before regression to reduce skew; inspect the 137 outliers above the upper fence for data quality or genuine extreme cases. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 1,719
min: 1.6
max: 66.32
mean: 15.1
median: 13.55
std: 7.706
q1: 10.16
q3: 17.91
iqr: 7.75
skew: 2.096
kurtosis: 6.891
n_outliers: 137
outlier_rate: 0.04252
zero_rate: 0

rural

categorical feature

This column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis. Treatment: Cast to boolean, then use as a binary feature or stratification variable; monitor class imbalance (~2:1 rural vs. non-rural) during modelling. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 2
top_value: True
top_rate: 0.6865
cardinality: 2
entropy: 0.8971
entropy_ratio: 0.8971

rural_category

categorical label

This column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting. Treatment: One-hot or binary encode; consider stratified sampling or class weighting to address 69/31 imbalance. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 2
top_value: Rural
top_rate: 0.6865
cardinality: 2
entropy: 0.8971
entropy_ratio: 0.8971

hospital_closure_risk_score

numeric feature

This column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score. Treatment: Treat as ordinal categorical with three levels (0/25/50); one-hot encode or ordinal-encode before modelling rather than using raw numeric values. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 3
min: 0
max: 50
mean: 21.69
median: 25
std: 16.34
q1: 0
q3: 25
iqr: 25
skew: 0.1414
kurtosis: -0.6949
n_outliers: 0
outlier_rate: 0
zero_rate: 0.2883

risk_category

categorical label

This column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced. Treatment: Encode as binary (0/1) and apply class-imbalance handling (e.g., SMOTE or class weights) before modelling. high · anthropic:default

n: 3,222
nulls: 0 (0.0%)
unique: 2
top_value: Low
top_rate: 0.8439
cardinality: 2
entropy: 0.6249
entropy_ratio: 0.6249