county health rankings
Reading
This dataset contains 3,222 rows of US county-level health data, with each row identified by a unique county name and FIPS code, plus three numeric measures: total population, uninsured population, and uninsured rate. The population fields are extremely right-skewed — total_pop ranges from 47 to nearly 9.87 million with a median of 25,328, and uninsured_pop shows similar skew (median 36, max 20,915), so a few large counties dominate. The uninsured_rate is the most analytically interesting field: it has a median of 0.12 but stretches up to 3.7, with about 17% of counties reporting zero, suggesting either small/edge cases or data quality issues worth investigating. Start by examining the distribution of uninsured_rate and how it relates to total_pop.
citing: row_count · columns.total_pop.stats · columns.uninsured_pop.stats · columns.uninsured_rate.stats · columns.county_name.top_words
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 47 – 2.467e+05 | 2942 |
| 2.467e+05 – 4.934e+05 | 137 |
| 4.934e+05 – 7.4e+05 | 56 |
| 7.4e+05 – 9.867e+05 | 39 |
| 9.867e+05 – 1.233e+06 | 13 |
| 1.233e+06 – 1.48e+06 | 9 |
| 1.48e+06 – 1.727e+06 | 7 |
| 1.727e+06 – 1.973e+06 | 3 |
| 1.973e+06 – 2.22e+06 | 3 |
| 2.22e+06 – 2.467e+06 | 4 |
| 2.467e+06 – 2.713e+06 | 3 |
| 2.713e+06 – 2.96e+06 | 0 |
| 2.96e+06 – 3.207e+06 | 2 |
| 3.207e+06 – 3.453e+06 | 0 |
| 3.453e+06 – 3.7e+06 | 0 |
| 3.7e+06 – 3.947e+06 | 0 |
| 3.947e+06 – 4.193e+06 | 0 |
| 4.193e+06 – 4.44e+06 | 1 |
| 4.44e+06 – 4.687e+06 | 0 |
| 4.687e+06 – 4.933e+06 | 1 |
| 4.933e+06 – 5.18e+06 | 0 |
| 5.18e+06 – 5.427e+06 | 1 |
| 5.427e+06 – 5.673e+06 | 0 |
| 5.673e+06 – 5.92e+06 | 0 |
| 5.92e+06 – 6.167e+06 | 0 |
| 6.167e+06 – 6.413e+06 | 0 |
| 6.413e+06 – 6.66e+06 | 0 |
| 6.66e+06 – 6.907e+06 | 0 |
| 6.907e+06 – 7.153e+06 | 0 |
| 7.153e+06 – 7.4e+06 | 0 |
| 7.4e+06 – 7.647e+06 | 0 |
| 7.647e+06 – 7.893e+06 | 0 |
| 7.893e+06 – 8.14e+06 | 0 |
| 8.14e+06 – 8.387e+06 | 0 |
| 8.387e+06 – 8.633e+06 | 0 |
| 8.633e+06 – 8.88e+06 | 0 |
| 8.88e+06 – 9.127e+06 | 0 |
| 9.127e+06 – 9.373e+06 | 0 |
| 9.373e+06 – 9.62e+06 | 0 |
| 9.62e+06 – 9.867e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 0.0925 | 1403 |
| 0.0925 – 0.185 | 704 |
| 0.185 – 0.2775 | 403 |
| 0.2775 – 0.37 | 213 |
| 0.37 – 0.4625 | 158 |
| 0.4625 – 0.555 | 101 |
| 0.555 – 0.6475 | 65 |
| 0.6475 – 0.74 | 43 |
| 0.74 – 0.8325 | 27 |
| 0.8325 – 0.925 | 23 |
| 0.925 – 1.018 | 9 |
| 1.018 – 1.11 | 15 |
| 1.11 – 1.202 | 14 |
| 1.202 – 1.295 | 5 |
| 1.295 – 1.387 | 7 |
| 1.387 – 1.48 | 7 |
| 1.48 – 1.573 | 5 |
| 1.573 – 1.665 | 2 |
| 1.665 – 1.758 | 4 |
| 1.758 – 1.85 | 1 |
| 1.85 – 1.942 | 1 |
| 1.942 – 2.035 | 1 |
| 2.035 – 2.127 | 2 |
| 2.127 – 2.22 | 2 |
| 2.22 – 2.312 | 1 |
| 2.312 – 2.405 | 0 |
| 2.405 – 2.498 | 0 |
| 2.498 – 2.59 | 1 |
| 2.59 – 2.683 | 0 |
| 2.683 – 2.775 | 1 |
| 2.775 – 2.868 | 0 |
| 2.868 – 2.96 | 1 |
| 2.96 – 3.052 | 1 |
| 3.052 – 3.145 | 0 |
| 3.145 – 3.237 | 1 |
| 3.237 – 3.33 | 0 |
| 3.33 – 3.422 | 0 |
| 3.422 – 3.515 | 0 |
| 3.515 – 3.607 | 0 |
| 3.607 – 3.7 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 522.9 | 3022 |
| 522.9 – 1046 | 124 |
| 1046 – 1569 | 32 |
| 1569 – 2092 | 16 |
| 2092 – 2614 | 7 |
| 2614 – 3137 | 5 |
| 3137 – 3660 | 5 |
| 3660 – 4183 | 2 |
| 4183 – 4706 | 0 |
| 4706 – 5229 | 1 |
| 5229 – 5752 | 2 |
| 5752 – 6274 | 1 |
| 6274 – 6797 | 0 |
| 6797 – 7320 | 0 |
| 7320 – 7843 | 0 |
| 7843 – 8366 | 1 |
| 8366 – 8889 | 1 |
| 8889 – 9412 | 0 |
| 9412 – 9935 | 0 |
| 9935 – 1.046e+04 | 0 |
| 1.046e+04 – 1.098e+04 | 0 |
| 1.098e+04 – 1.15e+04 | 2 |
| 1.15e+04 – 1.203e+04 | 0 |
| 1.203e+04 – 1.255e+04 | 0 |
| 1.255e+04 – 1.307e+04 | 0 |
| 1.307e+04 – 1.359e+04 | 0 |
| 1.359e+04 – 1.412e+04 | 0 |
| 1.412e+04 – 1.464e+04 | 0 |
| 1.464e+04 – 1.516e+04 | 0 |
| 1.516e+04 – 1.569e+04 | 0 |
| 1.569e+04 – 1.621e+04 | 0 |
| 1.621e+04 – 1.673e+04 | 0 |
| 1.673e+04 – 1.725e+04 | 0 |
| 1.725e+04 – 1.778e+04 | 0 |
| 1.778e+04 – 1.83e+04 | 0 |
| 1.83e+04 – 1.882e+04 | 0 |
| 1.882e+04 – 1.935e+04 | 0 |
| 1.935e+04 – 1.987e+04 | 0 |
| 1.987e+04 – 2.039e+04 | 0 |
| 2.039e+04 – 2.092e+04 | 1 |
Show data table
| bin | count |
|---|---|
| 1001 – 2780 | 97 |
| 2780 – 4559 | 15 |
| 4559 – 6337 | 133 |
| 6337 – 8116 | 59 |
| 8116 – 9895 | 14 |
| 9895 – 1.167e+04 | 4 |
| 1.167e+04 – 1.345e+04 | 226 |
| 1.345e+04 – 1.523e+04 | 5 |
| 1.523e+04 – 1.701e+04 | 49 |
| 1.701e+04 – 1.879e+04 | 189 |
| 1.879e+04 – 2.057e+04 | 204 |
| 2.057e+04 – 2.235e+04 | 184 |
| 2.235e+04 – 2.413e+04 | 39 |
| 2.413e+04 – 2.59e+04 | 15 |
| 2.59e+04 – 2.768e+04 | 170 |
| 2.768e+04 – 2.946e+04 | 196 |
| 2.946e+04 – 3.124e+04 | 150 |
| 3.124e+04 – 3.302e+04 | 27 |
| 3.302e+04 – 3.48e+04 | 21 |
| 3.48e+04 – 3.658e+04 | 95 |
| 3.658e+04 – 3.836e+04 | 153 |
| 3.836e+04 – 4.013e+04 | 155 |
| 4.013e+04 – 4.191e+04 | 46 |
| 4.191e+04 – 4.369e+04 | 67 |
| 4.369e+04 – 4.547e+04 | 51 |
| 4.547e+04 – 4.725e+04 | 161 |
| 4.725e+04 – 4.903e+04 | 268 |
| 4.903e+04 – 5.081e+04 | 29 |
| 5.081e+04 – 5.259e+04 | 133 |
| 5.259e+04 – 5.436e+04 | 94 |
| 5.436e+04 – 5.614e+04 | 95 |
| 5.614e+04 – 5.792e+04 | 0 |
| 5.792e+04 – 5.97e+04 | 0 |
| 5.97e+04 – 6.148e+04 | 0 |
| 6.148e+04 – 6.326e+04 | 0 |
| 6.326e+04 – 6.504e+04 | 0 |
| 6.504e+04 – 6.682e+04 | 0 |
| 6.682e+04 – 6.86e+04 | 0 |
| 6.86e+04 – 7.037e+04 | 0 |
| 7.037e+04 – 7.215e+04 | 78 |
Schema
5 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| total_pop | numeric | 0.0% | 3,141 |
high_skew
outliers
|
| uninsured_pop | numeric | 0.0% | 584 |
high_skew
outliers
|
| uninsured_rate | numeric | 0.0% | 152 |
high_skew
outliers
|
fips
numeric identifierThis is the U.S. county FIPS code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. The distribution is near-symmetric (skew 0.16) with no statistical outliers, consistent with an identifier rather than a measured quantity. Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than feeding it into a model as numeric.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueThis column lists US county names paired with their state (e.g., 'County, Texas'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana or 'Borough/Census Area' in Alaska). State frequencies match expectations, with Texas (256) leading — consistent with Texas having the most counties nationally. Treatment: Split into county and state fields, then left-join on this key to geographic reference tables.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
total_pop
numeric feature high_skew outliersThis is a population count, almost certainly per geographic unit (likely US counties given n=3222), with values from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with 453 outliers (14.06%), reflecting a few massive metros dwarfing thousands of small areas. Mean (102,232) sits far above the median, confirming the heavy tail. Treatment: log-transform before any modelling or distance-based analysis.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 47
- max
- 9.867e+06
- mean
- 1.022e+05
- median
- 25,328
- std
- 3.269e+05
- q1
- 1.061e+04
- q3
- 65,190
- iqr
- 5.458e+04
- skew
- 13.38
- kurtosis
- 298.7
- n_outliers
- 453
- outlier_rate
- 0.1406
- zero_rate
- 0
uninsured_pop
numeric feature high_skew outliersLikely a county- or tract-level count of uninsured residents, with 3222 rows and 584 unique values. The distribution is extremely right-skewed (skew 17.8, kurtosis 462.9): median is 36 while the max hits 20915 and the mean is 159.9, and 17.2% of rows are zero. About 11.4% of values (368) flag as outliers, consistent with a few very populous areas dominating. Treatment: Log1p-transform before modelling to tame the heavy tail and zero inflation.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 584
- min
- 0
- max
- 20,915
- mean
- 159.9
- median
- 36
- std
- 627.2
- q1
- 7
- q3
- 120
- iqr
- 113
- skew
- 17.81
- kurtosis
- 462.9
- n_outliers
- 368
- outlier_rate
- 0.1142
- zero_rate
- 0.1723
uninsured_rate
numeric feature high_skew outliersLikely a per-record uninsured rate, expressed as a fraction (median 0.12, q3 0.25) but with a long tail reaching 3.7, which is implausible for a true rate and suggests mixed units or data entry errors. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. No nulls across 3222 rows and only 152 unique values, hinting at rounded or binned source data. Treatment: Validate units and cap or winsorize the >1.0 tail before log-transforming for modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 152
- min
- 0
- max
- 3.7
- mean
- 0.2002
- median
- 0.12
- std
- 0.2829
- q1
- 0.04
- q3
- 0.25
- iqr
- 0.21
- skew
- 4.095
- kurtosis
- 27.7
- n_outliers
- 230
- outlier_rate
- 0.07138
- zero_rate
- 0.1754