poverty data
Reading
This dataset contains 3,222 U.S. counties with three columns: a county name, a FIPS code identifier, and a poverty rate. Each row is unique by county_name and fips, so the analytical signal lives almost entirely in poverty_rate. Poverty rate ranges from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and it is right-skewed (skew 2.10) with 137 high-end outliers (~4.25% of counties). That long upper tail is the first thing worth a closer look, since a small number of counties have poverty rates several times the national median.
citing: row_count · column_count · columns.poverty_rate.stats.min · columns.poverty_rate.stats.max · columns.poverty_rate.stats.mean · columns.poverty_rate.stats.median · columns.poverty_rate.stats.skew · columns.poverty_rate.stats.n_outliers · columns.poverty_rate.stats.outlier_rate · columns.county_name.n_unique · columns.fips.n_unique
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1.6 – 3.218 | 7 |
| 3.218 – 4.836 | 34 |
| 4.836 – 6.454 | 106 |
| 6.454 – 8.072 | 246 |
| 8.072 – 9.69 | 320 |
| 9.69 – 11.31 | 354 |
| 11.31 – 12.93 | 393 |
| 12.93 – 14.54 | 364 |
| 14.54 – 16.16 | 306 |
| 16.16 – 17.78 | 262 |
| 17.78 – 19.4 | 192 |
| 19.4 – 21.02 | 149 |
| 21.02 – 22.63 | 123 |
| 22.63 – 24.25 | 91 |
| 24.25 – 25.87 | 52 |
| 25.87 – 27.49 | 44 |
| 27.49 – 29.11 | 34 |
| 29.11 – 30.72 | 23 |
| 30.72 – 32.34 | 18 |
| 32.34 – 33.96 | 14 |
| 33.96 – 35.58 | 6 |
| 35.58 – 37.2 | 8 |
| 37.2 – 38.81 | 3 |
| 38.81 – 40.43 | 8 |
| 40.43 – 42.05 | 5 |
| 42.05 – 43.67 | 9 |
| 43.67 – 45.29 | 4 |
| 45.29 – 46.9 | 11 |
| 46.9 – 48.52 | 7 |
| 48.52 – 50.14 | 8 |
| 50.14 – 51.76 | 2 |
| 51.76 – 53.38 | 6 |
| 53.38 – 54.99 | 5 |
| 54.99 – 56.61 | 5 |
| 56.61 – 58.23 | 1 |
| 58.23 – 59.85 | 0 |
| 59.85 – 61.47 | 0 |
| 61.47 – 63.08 | 0 |
| 63.08 – 64.7 | 1 |
| 64.7 – 66.32 | 1 |
Show data table
| bin | count |
|---|---|
| 1001 – 2780 | 97 |
| 2780 – 4559 | 15 |
| 4559 – 6337 | 133 |
| 6337 – 8116 | 59 |
| 8116 – 9895 | 14 |
| 9895 – 1.167e+04 | 4 |
| 1.167e+04 – 1.345e+04 | 226 |
| 1.345e+04 – 1.523e+04 | 5 |
| 1.523e+04 – 1.701e+04 | 49 |
| 1.701e+04 – 1.879e+04 | 189 |
| 1.879e+04 – 2.057e+04 | 204 |
| 2.057e+04 – 2.235e+04 | 184 |
| 2.235e+04 – 2.413e+04 | 39 |
| 2.413e+04 – 2.59e+04 | 15 |
| 2.59e+04 – 2.768e+04 | 170 |
| 2.768e+04 – 2.946e+04 | 196 |
| 2.946e+04 – 3.124e+04 | 150 |
| 3.124e+04 – 3.302e+04 | 27 |
| 3.302e+04 – 3.48e+04 | 21 |
| 3.48e+04 – 3.658e+04 | 95 |
| 3.658e+04 – 3.836e+04 | 153 |
| 3.836e+04 – 4.013e+04 | 155 |
| 4.013e+04 – 4.191e+04 | 46 |
| 4.191e+04 – 4.369e+04 | 67 |
| 4.369e+04 – 4.547e+04 | 51 |
| 4.547e+04 – 4.725e+04 | 161 |
| 4.725e+04 – 4.903e+04 | 268 |
| 4.903e+04 – 5.081e+04 | 29 |
| 5.081e+04 – 5.259e+04 | 133 |
| 5.259e+04 – 5.436e+04 | 94 |
| 5.436e+04 – 5.614e+04 | 95 |
| 5.614e+04 – 5.792e+04 | 0 |
| 5.792e+04 – 5.97e+04 | 0 |
| 5.97e+04 – 6.148e+04 | 0 |
| 6.148e+04 – 6.326e+04 | 0 |
| 6.326e+04 – 6.504e+04 | 0 |
| 6.504e+04 – 6.682e+04 | 0 |
| 6.682e+04 – 6.86e+04 | 0 |
| 6.86e+04 – 7.037e+04 | 0 |
| 7.037e+04 – 7.215e+04 | 78 |
Schema
3 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| poverty_rate | numeric | 0.0% | 1,719 |
high_skew
|
fips
numeric identifierThis is the U.S. county FIPS code, a 5-digit geographic identifier where the leading 1-2 digits encode state. All 3222 values are unique with zero nulls, and the range 1001 to 72153 spans Alabama through Puerto Rico, consistent with a complete county roster. Treating it as numeric is misleading despite the clean distribution (skew 0.16, no outliers) since the magnitudes are categorical codes, not measurements. Treatment: cast to zero-padded string and use as a join key to geographic reference tables.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueThis column holds fully-qualified US county names (e.g., ', '), with every one of the 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 entries don't follow the 'X County, State' pattern (likely parishes in Louisiana, boroughs in Alaska, or independent cities). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the actual US county counts. Treatment: Use as a join key against county-level reference data; split into county and state fields before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
poverty_rate
numeric feature high_skewNumeric poverty rate (likely percent of population below the poverty line) across 3222 rows with no nulls and 1719 unique values. Distribution is right-skewed (skew 2.10, kurtosis 6.89) with median 13.55 and mean 15.10, ranging from 1.6 to 66.32, and 137 outliers (4.25%) sit above the upper whisker. The long upper tail suggests a small set of high-poverty units pulling the mean above the median. Treatment: Apply a log or Box-Cox transform before linear modelling to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0