healthcare data poverty data 20260121

source /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet 3,222 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows describing U.S. county-level poverty, with three columns: a FIPS code, a county name, and a poverty rate. Each row is a unique county (3,222 unique FIPS codes and county names), so the analytical signal lives in the poverty_rate column. Poverty rates range from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and the distribution is right-skewed (skew ≈ 2.10) with 137 outliers on the high end. The county_name field also reveals geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties. Start by examining the shape of poverty_rate and which states the high-poverty outliers cluster in.

citing: row_count · column_count · columns.poverty_rate.stats · columns.county_name.top_words · columns.fips.n_unique

Charts the summary said to look at first

poverty_rate · Look for the right-skewed tail above ~28% where the 137 high-poverty outlier counties sit.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

county_name · Top words show which states dominate the county list, led by Texas, Virginia, and Georgia.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

county_name · Name length is tight (median 24 chars) — useful as a sanity check that entries follow a consistent 'X County, State' format.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Schema

3 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	text	0.0%	3,222	near_unique one_word allcaps short_text
county_name	text	0.0%	3,222	near_unique
poverty_rate	numeric	0.0%	1,719	high_skew

fips

text identifier near_unique one_word allcaps short_text

This column holds 5-character FIPS codes uniquely identifying each of the 3222 rows (n_unique equals n, null_rate 0). Every value is exactly 5 characters, one word, all-caps/numeric, with zero duplicates or empties. Sample values like 01001, 01003, 01005 match the standard US county FIPS encoding (state+county). Treatment: Treat as a county-level key; left-join on this id and exclude from modelling features. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
len_min: 5
len_max: 5
len_mean: 5
len_median: 5
len_p95: 5
word_mean: 1
word_median: 1
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 3,222
readability_flesch_mean: 121.2
emoji_rate: 0
url_rate: 0
one_word_rate: 1
allcaps_rate: 1
boilerplate_rate: 0

county_name

text identifier near_unique

This is a county identifier string, likely formatted as " County, " — "county," appears in 2999 of 3222 rows and Texas (256), Virginia (189), and Georgia (159) lead the state mentions. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly (min 16, median 24, max 59), consistent with a clean US county roster. The 223 rows lacking the "county," token are worth checking — likely parishes, boroughs, or independent cities. Treatment: Split into county and state fields and use as a join key rather than a model feature. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
len_min: 16
len_max: 59
len_mean: 24.32
len_median: 24
len_p95: 31
word_mean: 3.248
word_median: 3
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,990
readability_flesch_mean: 10.28
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

poverty_rate

numeric feature high_skew

This is a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and IQR of 7.75. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 high outliers (4.25%) reflecting pockets of severe poverty well above the typical 10–18% band. No nulls or zeros, and 1719 unique values across 3222 rows suggest one record per geographic unit. Treatment: Apply a log or sqrt transform before regression to tame the right skew. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,719
min: 1.6
max: 66.32
mean: 15.1
median: 13.55
std: 7.706
q1: 10.16
q3: 17.91
iqr: 7.75
skew: 2.096
kurtosis: 6.891
n_outliers: 137
outlier_rate: 0.04252
zero_rate: 0