data trove healthcare deserts
Reading
This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.
citing: row_count · column_count · total_pop.stats.median · total_pop.stats.max · uninsured_pop.stats.median · uninsured_pop.stats.max · uninsured_rate.stats.max · uninsured_rate.stats.median · hospital_closure_risk_score.n_unique · hospital_closure_risk_score.stats.zero_rate · risk_category.top_value · risk_category.top_rate · rural_category.top_value · rural_category.top_rate · poverty_rate.stats.median · poverty_rate.stats.max
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 0 – 0.0925 | 1403 |
| 0.0925 – 0.185 | 704 |
| 0.185 – 0.2775 | 403 |
| 0.2775 – 0.37 | 213 |
| 0.37 – 0.4625 | 158 |
| 0.4625 – 0.555 | 101 |
| 0.555 – 0.6475 | 65 |
| 0.6475 – 0.74 | 43 |
| 0.74 – 0.8325 | 27 |
| 0.8325 – 0.925 | 23 |
| 0.925 – 1.018 | 9 |
| 1.018 – 1.11 | 15 |
| 1.11 – 1.202 | 14 |
| 1.202 – 1.295 | 5 |
| 1.295 – 1.387 | 7 |
| 1.387 – 1.48 | 7 |
| 1.48 – 1.573 | 5 |
| 1.573 – 1.665 | 2 |
| 1.665 – 1.758 | 4 |
| 1.758 – 1.85 | 1 |
| 1.85 – 1.942 | 1 |
| 1.942 – 2.035 | 1 |
| 2.035 – 2.127 | 2 |
| 2.127 – 2.22 | 2 |
| 2.22 – 2.312 | 1 |
| 2.312 – 2.405 | 0 |
| 2.405 – 2.498 | 0 |
| 2.498 – 2.59 | 1 |
| 2.59 – 2.683 | 0 |
| 2.683 – 2.775 | 1 |
| 2.775 – 2.868 | 0 |
| 2.868 – 2.96 | 1 |
| 2.96 – 3.052 | 1 |
| 3.052 – 3.145 | 0 |
| 3.145 – 3.237 | 1 |
| 3.237 – 3.33 | 0 |
| 3.33 – 3.422 | 0 |
| 3.422 – 3.515 | 0 |
| 3.515 – 3.607 | 0 |
| 3.607 – 3.7 | 1 |
Show data table
| value | count | share |
|---|---|---|
| Low | 2719 | 84.4% |
| Moderate | 503 | 15.6% |
Show data table
| value | count | share |
|---|---|---|
| Rural | 2212 | 68.7% |
| Urban/Suburban | 1010 | 31.3% |
Show data table
| bin | count |
|---|---|
| 1.6 – 3.218 | 7 |
| 3.218 – 4.836 | 34 |
| 4.836 – 6.454 | 106 |
| 6.454 – 8.072 | 246 |
| 8.072 – 9.69 | 320 |
| 9.69 – 11.31 | 354 |
| 11.31 – 12.93 | 393 |
| 12.93 – 14.54 | 364 |
| 14.54 – 16.16 | 306 |
| 16.16 – 17.78 | 262 |
| 17.78 – 19.4 | 192 |
| 19.4 – 21.02 | 149 |
| 21.02 – 22.63 | 123 |
| 22.63 – 24.25 | 91 |
| 24.25 – 25.87 | 52 |
| 25.87 – 27.49 | 44 |
| 27.49 – 29.11 | 34 |
| 29.11 – 30.72 | 23 |
| 30.72 – 32.34 | 18 |
| 32.34 – 33.96 | 14 |
| 33.96 – 35.58 | 6 |
| 35.58 – 37.2 | 8 |
| 37.2 – 38.81 | 3 |
| 38.81 – 40.43 | 8 |
| 40.43 – 42.05 | 5 |
| 42.05 – 43.67 | 9 |
| 43.67 – 45.29 | 4 |
| 45.29 – 46.9 | 11 |
| 46.9 – 48.52 | 7 |
| 48.52 – 50.14 | 8 |
| 50.14 – 51.76 | 2 |
| 51.76 – 53.38 | 6 |
| 53.38 – 54.99 | 5 |
| 54.99 – 56.61 | 5 |
| 56.61 – 58.23 | 1 |
| 58.23 – 59.85 | 0 |
| 59.85 – 61.47 | 0 |
| 61.47 – 63.08 | 0 |
| 63.08 – 64.7 | 1 |
| 64.7 – 66.32 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 1.25 | 929 |
| 1.25 – 2.5 | 0 |
| 2.5 – 3.75 | 0 |
| 3.75 – 5 | 0 |
| 5 – 6.25 | 0 |
| 6.25 – 7.5 | 0 |
| 7.5 – 8.75 | 0 |
| 8.75 – 10 | 0 |
| 10 – 11.25 | 0 |
| 11.25 – 12.5 | 0 |
| 12.5 – 13.75 | 0 |
| 13.75 – 15 | 0 |
| 15 – 16.25 | 0 |
| 16.25 – 17.5 | 0 |
| 17.5 – 18.75 | 0 |
| 18.75 – 20 | 0 |
| 20 – 21.25 | 0 |
| 21.25 – 22.5 | 0 |
| 22.5 – 23.75 | 0 |
| 23.75 – 25 | 0 |
| 25 – 26.25 | 1790 |
| 26.25 – 27.5 | 0 |
| 27.5 – 28.75 | 0 |
| 28.75 – 30 | 0 |
| 30 – 31.25 | 0 |
| 31.25 – 32.5 | 0 |
| 32.5 – 33.75 | 0 |
| 33.75 – 35 | 0 |
| 35 – 36.25 | 0 |
| 36.25 – 37.5 | 0 |
| 37.5 – 38.75 | 0 |
| 38.75 – 40 | 0 |
| 40 – 41.25 | 0 |
| 41.25 – 42.5 | 0 |
| 42.5 – 43.75 | 0 |
| 43.75 – 45 | 0 |
| 45 – 46.25 | 0 |
| 46.25 – 47.5 | 0 |
| 47.5 – 48.75 | 0 |
| 48.75 – 50 | 503 |
Schema
10 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| total_pop | numeric | 0.0% | 3,141 |
high_skew
outliers
|
| uninsured_pop | numeric | 0.0% | 584 |
high_skew
outliers
|
| uninsured_rate | numeric | 0.0% | 152 |
high_skew
outliers
|
| poverty_rate | numeric | 0.0% | 1,719 |
high_skew
|
| rural | categorical | 0.0% | 2 |
|
| rural_category | categorical | 0.0% | 2 |
|
| hospital_closure_risk_score | numeric | 0.0% | 3 |
|
| risk_category | categorical | 0.0% | 2 |
|
fips
numeric identifierThis column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes. Treatment: Cast to string/categorical and use as a geographic join key; do not use as a numeric feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text label near_uniqueThis column contains fully-qualified county name strings, almost certainly formatted as 'County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset. Treatment: Parse into county and state components for join or groupby operations; do not treat as a free-text feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
total_pop
numeric feature high_skew outliersThis column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model. Treatment: Log-transform (log1p) before modelling to compress the extreme right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 47
- max
- 9.867e+06
- mean
- 1.022e+05
- median
- 25,328
- std
- 3.269e+05
- q1
- 1.061e+04
- q3
- 65,190
- iqr
- 5.458e+04
- skew
- 13.38
- kurtosis
- 298.7
- n_outliers
- 453
- outlier_rate
- 0.1406
- zero_rate
- 0
uninsured_pop
numeric feature high_skew outliersThis column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations. Treatment: Log-transform (e.g., log1p) before regression or clustering to reduce skew; consider normalizing by total population to produce an uninsured rate.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 584
- min
- 0
- max
- 20,915
- mean
- 159.9
- median
- 36
- std
- 627.2
- q1
- 7
- q3
- 120
- iqr
- 113
- skew
- 17.81
- kurtosis
- 462.9
- n_outliers
- 368
- outlier_rate
- 0.1142
- zero_rate
- 0.1723
uninsured_rate
numeric feature high_skew outliersThis column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units. Treatment: Investigate values > 1.0 for scale inconsistency, recode zeros if they represent missingness, then log-transform or apply a bounded transformation before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 152
- min
- 0
- max
- 3.7
- mean
- 0.2002
- median
- 0.12
- std
- 0.2829
- q1
- 0.04
- q3
- 0.25
- iqr
- 0.21
- skew
- 4.095
- kurtosis
- 27.7
- n_outliers
- 230
- outlier_rate
- 0.07138
- zero_rate
- 0.1754
poverty_rate
numeric feature high_skewThis column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model. Treatment: Log-transform or apply a Box-Cox transformation before regression to reduce skew; inspect the 137 outliers above the upper fence for data quality or genuine extreme cases.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0
rural
categorical featureThis column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis. Treatment: Cast to boolean, then use as a binary feature or stratification variable; monitor class imbalance (~2:1 rural vs. non-rural) during modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- True
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971
rural_category
categorical labelThis column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting. Treatment: One-hot or binary encode; consider stratified sampling or class weighting to address 69/31 imbalance.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Rural
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971
hospital_closure_risk_score
numeric featureThis column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score. Treatment: Treat as ordinal categorical with three levels (0/25/50); one-hot encode or ordinal-encode before modelling rather than using raw numeric values.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3
- min
- 0
- max
- 50
- mean
- 21.69
- median
- 25
- std
- 16.34
- q1
- 0
- q3
- 25
- iqr
- 25
- skew
- 0.1414
- kurtosis
- -0.6949
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.2883
risk_category
categorical labelThis column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced. Treatment: Encode as binary (0/1) and apply class-imbalance handling (e.g., SMOTE or class weights) before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Low
- top_rate
- 0.8439
- cardinality
- 2
- entropy
- 0.6249
- entropy_ratio
- 0.6249