healthcare data poverty data 20260121
Reading
This dataset contains 3,222 rows describing U.S. county-level poverty, with three columns: a FIPS code, a county name, and a poverty rate. Each row is a unique county (3,222 unique FIPS codes and county names), so the analytical signal lives in the poverty_rate column. Poverty rates range from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and the distribution is right-skewed (skew ≈ 2.10) with 137 outliers on the high end. The county_name field also reveals geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties. Start by examining the shape of poverty_rate and which states the high-poverty outliers cluster in.
citing: row_count · column_count · columns.poverty_rate.stats · columns.county_name.top_words · columns.fips.n_unique
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1.6 – 3.218 | 7 |
| 3.218 – 4.836 | 34 |
| 4.836 – 6.454 | 106 |
| 6.454 – 8.072 | 246 |
| 8.072 – 9.69 | 320 |
| 9.69 – 11.31 | 354 |
| 11.31 – 12.93 | 393 |
| 12.93 – 14.54 | 364 |
| 14.54 – 16.16 | 306 |
| 16.16 – 17.78 | 262 |
| 17.78 – 19.4 | 192 |
| 19.4 – 21.02 | 149 |
| 21.02 – 22.63 | 123 |
| 22.63 – 24.25 | 91 |
| 24.25 – 25.87 | 52 |
| 25.87 – 27.49 | 44 |
| 27.49 – 29.11 | 34 |
| 29.11 – 30.72 | 23 |
| 30.72 – 32.34 | 18 |
| 32.34 – 33.96 | 14 |
| 33.96 – 35.58 | 6 |
| 35.58 – 37.2 | 8 |
| 37.2 – 38.81 | 3 |
| 38.81 – 40.43 | 8 |
| 40.43 – 42.05 | 5 |
| 42.05 – 43.67 | 9 |
| 43.67 – 45.29 | 4 |
| 45.29 – 46.9 | 11 |
| 46.9 – 48.52 | 7 |
| 48.52 – 50.14 | 8 |
| 50.14 – 51.76 | 2 |
| 51.76 – 53.38 | 6 |
| 53.38 – 54.99 | 5 |
| 54.99 – 56.61 | 5 |
| 56.61 – 58.23 | 1 |
| 58.23 – 59.85 | 0 |
| 59.85 – 61.47 | 0 |
| 61.47 – 63.08 | 0 |
| 63.08 – 64.7 | 1 |
| 64.7 – 66.32 | 1 |
Show data table
| chars | count |
|---|---|
| 16 – 17 | 26 |
| 17 – 18 | 72 |
| 18 – 19 | 121 |
| 19 – 20 | 190 |
| 20 – 21 | 264 |
| 21 – 22 | 407 |
| 22 – 24 | 420 |
| 24 – 25 | 363 |
| 25 – 26 | 320 |
| 26 – 27 | 240 |
| 27 – 28 | 231 |
| 28 – 29 | 152 |
| 29 – 30 | 139 |
| 30 – 31 | 165 |
| 31 – 32 | 41 |
| 32 – 33 | 28 |
| 33 – 34 | 16 |
| 34 – 35 | 10 |
| 35 – 36 | 5 |
| 36 – 38 | 0 |
| 38 – 39 | 1 |
| 39 – 40 | 1 |
| 40 – 41 | 0 |
| 41 – 42 | 1 |
| 42 – 43 | 1 |
| 43 – 44 | 0 |
| 44 – 45 | 2 |
| 45 – 46 | 0 |
| 46 – 47 | 1 |
| 47 – 48 | 1 |
| 48 – 49 | 0 |
| 49 – 50 | 0 |
| 50 – 51 | 0 |
| 51 – 53 | 0 |
| 53 – 54 | 2 |
| 54 – 55 | 1 |
| 55 – 56 | 0 |
| 56 – 57 | 0 |
| 57 – 58 | 0 |
| 58 – 59 | 1 |
Show data table
| chars | count |
|---|---|
| 16 – 17 | 26 |
| 17 – 18 | 72 |
| 18 – 19 | 121 |
| 19 – 20 | 190 |
| 20 – 21 | 264 |
| 21 – 22 | 407 |
| 22 – 24 | 420 |
| 24 – 25 | 363 |
| 25 – 26 | 320 |
| 26 – 27 | 240 |
| 27 – 28 | 231 |
| 28 – 29 | 152 |
| 29 – 30 | 139 |
| 30 – 31 | 165 |
| 31 – 32 | 41 |
| 32 – 33 | 28 |
| 33 – 34 | 16 |
| 34 – 35 | 10 |
| 35 – 36 | 5 |
| 36 – 38 | 0 |
| 38 – 39 | 1 |
| 39 – 40 | 1 |
| 40 – 41 | 0 |
| 41 – 42 | 1 |
| 42 – 43 | 1 |
| 43 – 44 | 0 |
| 44 – 45 | 2 |
| 45 – 46 | 0 |
| 46 – 47 | 1 |
| 47 – 48 | 1 |
| 48 – 49 | 0 |
| 49 – 50 | 0 |
| 50 – 51 | 0 |
| 51 – 53 | 0 |
| 53 – 54 | 2 |
| 54 – 55 | 1 |
| 55 – 56 | 0 |
| 56 – 57 | 0 |
| 57 – 58 | 0 |
| 58 – 59 | 1 |
Schema
3 columns| Alerts | ||||
|---|---|---|---|---|
| fips | text | 0.0% | 3,222 |
near_unique
one_word
allcaps
short_text
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| poverty_rate | numeric | 0.0% | 1,719 |
high_skew
|
fips
text identifier near_unique one_word allcaps short_textThis column holds 5-character FIPS codes uniquely identifying each of the 3222 rows (n_unique equals n, null_rate 0). Every value is exactly 5 characters, one word, all-caps/numeric, with zero duplicates or empties. Sample values like 01001, 01003, 01005 match the standard US county FIPS encoding (state+county). Treatment: Treat as a county-level key; left-join on this id and exclude from modelling features.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 5
- len_max
- 5
- len_mean
- 5
- len_median
- 5
- len_p95
- 5
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 3,222
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
county_name
text identifier near_uniqueThis is a county identifier string, likely formatted as "County, " — "county," appears in 2999 of 3222 rows and Texas (256), Virginia (189), and Georgia (159) lead the state mentions. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly (min 16, median 24, max 59), consistent with a clean US county roster. The 223 rows lacking the "county," token are worth checking — likely parishes, boroughs, or independent cities. Treatment: Split into county and state fields and use as a join key rather than a model feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
poverty_rate
numeric feature high_skewThis is a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and IQR of 7.75. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 high outliers (4.25%) reflecting pockets of severe poverty well above the typical 10–18% band. No nulls or zeros, and 1719 unique values across 3222 rows suggest one record per geographic unit. Treatment: Apply a log or sqrt transform before regression to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0