saturn·

healthcare data poverty data 20260121

source /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet 3,222 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows describing U.S. county-level poverty, with three columns: a FIPS code, a county name, and a poverty rate. Each row is a unique county (3,222 unique FIPS codes and county names), so the analytical signal lives in the poverty_rate column. Poverty rates range from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and the distribution is right-skewed (skew ≈ 2.10) with 137 outliers on the high end. The county_name field also reveals geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties. Start by examining the shape of poverty_rate and which states the high-poverty outliers cluster in.

citing: row_count · column_count · columns.poverty_rate.stats · columns.county_name.top_words · columns.fips.n_unique

Schema

3 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips text 0.0% 3,222
near_unique one_word allcaps short_text
county_name text 0.0% 3,222
near_unique
poverty_rate numeric 0.0% 1,719
high_skew

fips

text identifier near_unique one_word allcaps short_text
This column holds 5-character FIPS codes uniquely identifying each of the 3222 rows (n_unique equals n, null_rate 0). Every value is exactly 5 characters, one word, all-caps/numeric, with zero duplicates or empties. Sample values like 01001, 01003, 01005 match the standard US county FIPS encoding (state+county). Treatment: Treat as a county-level key; left-join on this id and exclude from modelling features. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
5
len_max
5
len_mean
5
len_median
5
len_p95
5
word_mean
1
word_median
1
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
3,222
readability_flesch_mean
121.2
emoji_rate
0
url_rate
0
one_word_rate
1
allcaps_rate
1
boilerplate_rate
0

county_name

text identifier near_unique
This is a county identifier string, likely formatted as " County, " — "county," appears in 2999 of 3222 rows and Texas (256), Virginia (189), and Georgia (159) lead the state mentions. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly (min 16, median 24, max 59), consistent with a clean US county roster. The 223 rows lacking the "county," token are worth checking — likely parishes, boroughs, or independent cities. Treatment: Split into county and state fields and use as a join key rather than a model feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

poverty_rate

numeric feature high_skew
This is a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and IQR of 7.75. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 high outliers (4.25%) reflecting pockets of severe poverty well above the typical 10–18% band. No nulls or zeros, and 1719 unique values across 3222 rows suggest one record per geographic unit. Treatment: Apply a log or sqrt transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0