healthcare healthcare desert merged
Reading
This dataset profiles 3,222 U.S. counties (one row per county, keyed by FIPS) with population, uninsured counts and rates, poverty rate, a hospital closure risk score, and rural/urban flags. Population and uninsured figures are extremely right-skewed (total_pop skew 13.4, uninsured_pop skew 17.8), so a handful of large counties will dominate any raw totals — analysis should likely use rates or log scales. The hospital_closure_risk_score collapses to just 3 distinct values (with ~29% scoring 0), and risk_category is heavily imbalanced with 84% of counties labeled 'Low' and the rest 'Moderate', which is worth examining first. About 69% of counties are flagged Rural, so rural/urban comparisons of uninsured and poverty rates should be a productive next cut.
citing: total_pop · uninsured_pop · uninsured_rate · poverty_rate · hospital_closure_risk_score · risk_category · rural_category
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Low | 2719 | 84.4% |
| Moderate | 503 | 15.6% |
Show data table
| bin | count |
|---|---|
| 0 – 1.25 | 929 |
| 1.25 – 2.5 | 0 |
| 2.5 – 3.75 | 0 |
| 3.75 – 5 | 0 |
| 5 – 6.25 | 0 |
| 6.25 – 7.5 | 0 |
| 7.5 – 8.75 | 0 |
| 8.75 – 10 | 0 |
| 10 – 11.25 | 0 |
| 11.25 – 12.5 | 0 |
| 12.5 – 13.75 | 0 |
| 13.75 – 15 | 0 |
| 15 – 16.25 | 0 |
| 16.25 – 17.5 | 0 |
| 17.5 – 18.75 | 0 |
| 18.75 – 20 | 0 |
| 20 – 21.25 | 0 |
| 21.25 – 22.5 | 0 |
| 22.5 – 23.75 | 0 |
| 23.75 – 25 | 0 |
| 25 – 26.25 | 1790 |
| 26.25 – 27.5 | 0 |
| 27.5 – 28.75 | 0 |
| 28.75 – 30 | 0 |
| 30 – 31.25 | 0 |
| 31.25 – 32.5 | 0 |
| 32.5 – 33.75 | 0 |
| 33.75 – 35 | 0 |
| 35 – 36.25 | 0 |
| 36.25 – 37.5 | 0 |
| 37.5 – 38.75 | 0 |
| 38.75 – 40 | 0 |
| 40 – 41.25 | 0 |
| 41.25 – 42.5 | 0 |
| 42.5 – 43.75 | 0 |
| 43.75 – 45 | 0 |
| 45 – 46.25 | 0 |
| 46.25 – 47.5 | 0 |
| 47.5 – 48.75 | 0 |
| 48.75 – 50 | 503 |
Show data table
| value | count | share |
|---|---|---|
| Rural | 2212 | 68.7% |
| Urban/Suburban | 1010 | 31.3% |
Show data table
| bin | count |
|---|---|
| 0 – 0.0925 | 1403 |
| 0.0925 – 0.185 | 704 |
| 0.185 – 0.2775 | 403 |
| 0.2775 – 0.37 | 213 |
| 0.37 – 0.4625 | 158 |
| 0.4625 – 0.555 | 101 |
| 0.555 – 0.6475 | 65 |
| 0.6475 – 0.74 | 43 |
| 0.74 – 0.8325 | 27 |
| 0.8325 – 0.925 | 23 |
| 0.925 – 1.018 | 9 |
| 1.018 – 1.11 | 15 |
| 1.11 – 1.202 | 14 |
| 1.202 – 1.295 | 5 |
| 1.295 – 1.387 | 7 |
| 1.387 – 1.48 | 7 |
| 1.48 – 1.573 | 5 |
| 1.573 – 1.665 | 2 |
| 1.665 – 1.758 | 4 |
| 1.758 – 1.85 | 1 |
| 1.85 – 1.942 | 1 |
| 1.942 – 2.035 | 1 |
| 2.035 – 2.127 | 2 |
| 2.127 – 2.22 | 2 |
| 2.22 – 2.312 | 1 |
| 2.312 – 2.405 | 0 |
| 2.405 – 2.498 | 0 |
| 2.498 – 2.59 | 1 |
| 2.59 – 2.683 | 0 |
| 2.683 – 2.775 | 1 |
| 2.775 – 2.868 | 0 |
| 2.868 – 2.96 | 1 |
| 2.96 – 3.052 | 1 |
| 3.052 – 3.145 | 0 |
| 3.145 – 3.237 | 1 |
| 3.237 – 3.33 | 0 |
| 3.33 – 3.422 | 0 |
| 3.422 – 3.515 | 0 |
| 3.515 – 3.607 | 0 |
| 3.607 – 3.7 | 1 |
Show data table
| bin | count |
|---|---|
| 1.6 – 3.218 | 7 |
| 3.218 – 4.836 | 34 |
| 4.836 – 6.454 | 106 |
| 6.454 – 8.072 | 246 |
| 8.072 – 9.69 | 320 |
| 9.69 – 11.31 | 354 |
| 11.31 – 12.93 | 393 |
| 12.93 – 14.54 | 364 |
| 14.54 – 16.16 | 306 |
| 16.16 – 17.78 | 262 |
| 17.78 – 19.4 | 192 |
| 19.4 – 21.02 | 149 |
| 21.02 – 22.63 | 123 |
| 22.63 – 24.25 | 91 |
| 24.25 – 25.87 | 52 |
| 25.87 – 27.49 | 44 |
| 27.49 – 29.11 | 34 |
| 29.11 – 30.72 | 23 |
| 30.72 – 32.34 | 18 |
| 32.34 – 33.96 | 14 |
| 33.96 – 35.58 | 6 |
| 35.58 – 37.2 | 8 |
| 37.2 – 38.81 | 3 |
| 38.81 – 40.43 | 8 |
| 40.43 – 42.05 | 5 |
| 42.05 – 43.67 | 9 |
| 43.67 – 45.29 | 4 |
| 45.29 – 46.9 | 11 |
| 46.9 – 48.52 | 7 |
| 48.52 – 50.14 | 8 |
| 50.14 – 51.76 | 2 |
| 51.76 – 53.38 | 6 |
| 53.38 – 54.99 | 5 |
| 54.99 – 56.61 | 5 |
| 56.61 – 58.23 | 1 |
| 58.23 – 59.85 | 0 |
| 59.85 – 61.47 | 0 |
| 61.47 – 63.08 | 0 |
| 63.08 – 64.7 | 1 |
| 64.7 – 66.32 | 1 |
Schema
10 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| total_pop | numeric | 0.0% | 3,141 |
high_skew
outliers
|
| uninsured_pop | numeric | 0.0% | 584 |
high_skew
outliers
|
| uninsured_rate | numeric | 0.0% | 152 |
high_skew
outliers
|
| poverty_rate | numeric | 0.0% | 1,719 |
high_skew
|
| rural | categorical | 0.0% | 2 |
|
| rural_category | categorical | 0.0% | 2 |
|
| hospital_closure_risk_score | numeric | 0.0% | 3 |
|
| risk_category | categorical | 0.0% | 2 |
|
fips
numeric identifierThis is the FIPS county code: 3222 rows with 3222 unique values, no nulls, and a min of 1001 / max of 72153 consistent with the U.S. county FIPS scheme (state prefix * 1000 + county). Distribution is near-uniform across that range (skew 0.16, kurtosis -0.63, no outliers), confirming it indexes geography rather than measuring anything. Treat it as a categorical key, not a quantity, despite the numeric dtype. Treatment: Cast to zero-padded string and left-join on this county FIPS code; do not use as a numeric feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueThis column holds fully-qualified US county names (e.g. 'X County, State'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, confirming a 'County,' format, while the remaining ~223 rows likely use alternate suffixes like Parish or Borough. Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with national county counts. Treatment: Use as a join key after splitting into county and state components.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
total_pop
numeric feature high_skew outliersThis is almost certainly a population count per geographic unit (likely US counties given n=3222), with values ranging from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with the mean (102,232) nearly four times the median and 453 outliers (14.06%) — the standard deviation of 326,934 dwarfs the IQR of 54,579. No nulls or zeros, and 3,141 of 3,222 values are unique. Treatment: Log-transform before any modelling or distance-based analysis to tame the extreme right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 47
- max
- 9.867e+06
- mean
- 1.022e+05
- median
- 25,328
- std
- 3.269e+05
- q1
- 1.061e+04
- q3
- 65,190
- iqr
- 5.458e+04
- skew
- 13.38
- kurtosis
- 298.7
- n_outliers
- 453
- outlier_rate
- 0.1406
- zero_rate
- 0
uninsured_pop
numeric feature high_skew outliersCounts of uninsured residents per record, with values ranging from 0 to 20,915 across 3,222 rows and no nulls. The distribution is severely right-skewed (skew 17.81, kurtosis 462.87): the median is 36 while the mean is 159.95, and 17.2% of rows are zero. 368 outliers (11.4%) sit far above the Q3 of 120, consistent with a few very large populations dominating the tail. Treatment: Apply a log1p transform before modelling to tame the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 584
- min
- 0
- max
- 20,915
- mean
- 159.9
- median
- 36
- std
- 627.2
- q1
- 7
- q3
- 120
- iqr
- 113
- skew
- 17.81
- kurtosis
- 462.9
- n_outliers
- 368
- outlier_rate
- 0.1142
- zero_rate
- 0.1723
uninsured_rate
numeric feature high_skew outliersThis appears to be an uninsured rate per record, expressed as a proportion ranging from 0.0 to 3.7 with a median of 0.12. The maximum of 3.7 is suspicious for a rate that should cap at 1.0, and the distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. Treatment: Investigate values >1.0 for unit errors, then log-transform or winsorize before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 152
- min
- 0
- max
- 3.7
- mean
- 0.2002
- median
- 0.12
- std
- 0.2829
- q1
- 0.04
- q3
- 0.25
- iqr
- 0.21
- skew
- 4.095
- kurtosis
- 27.7
- n_outliers
- 230
- outlier_rate
- 0.07138
- zero_rate
- 0.1754
poverty_rate
numeric feature high_skewThis is a numeric poverty rate (likely percentage of population in poverty) across 3222 rows with no nulls and 1719 unique values. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with a median of 13.55 and mean 15.10, ranging from 1.6 to 66.32; 137 outliers (4.25%) sit in the upper tail. The high skew alert means a long tail of high-poverty units pulls the mean above the median. Treatment: Consider a log or sqrt transform before regression to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0
rural
categorical featureBinary flag indicating whether a record is rural, stored as the strings "True"/"False" rather than booleans. The split is imbalanced toward rural at 68.7% (2212 of 3222) versus 1010 non-rural, with no nulls. Entropy ratio of 0.897 confirms a meaningful but skewed distribution. Treatment: Cast string "True"/"False" to a 0/1 boolean and use directly as a feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- True
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971
rural_category
categorical featureBinary categorical flag splitting records into 'Rural' (2212, 68.7%) and 'Urban/Suburban' (1010), with no nulls across 3222 rows. The split is moderately imbalanced but entropy ratio of 0.90 indicates both classes are well represented. Clean two-level partition suitable as a stratifier or feature. Treatment: One-hot or binary-encode for modelling; consider stratifying splits on this flag.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Rural
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971
hospital_closure_risk_score
numeric featureDespite being typed as numeric, hospital_closure_risk_score takes only 3 distinct values across 3222 rows, spanning 0 to 50 with a median of 25 and roughly 28.8% zeros. This is effectively an ordinal risk band (likely 0/25/50) masquerading as a continuous score, so the reported mean of 21.69 and std of 16.34 reflect category mix rather than a smooth distribution. Treatment: Treat as an ordinal categorical (low/medium/high) rather than a continuous numeric.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3
- min
- 0
- max
- 50
- mean
- 21.69
- median
- 25
- std
- 16.34
- q1
- 0
- q3
- 25
- iqr
- 25
- skew
- 0.1414
- kurtosis
- -0.6949
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.2883
risk_category
categorical labelBinary risk classification flagging records as either Low or Moderate, with no nulls across 3,222 rows. The distribution is heavily imbalanced: 84.4% fall into Low (2,719) versus only 503 Moderate, and no High tier appears at all. Entropy ratio of 0.62 confirms the skew. Treatment: Treat as binary target; account for class imbalance via stratified sampling or class weighting.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Low
- top_rate
- 0.8439
- cardinality
- 2
- entropy
- 0.6249
- entropy_ratio
- 0.6249