saturn·

poverty data

source /home/coolhand/html/datavis/data_trove/cache/poverty_data.parquet 3,222 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 U.S. counties with three columns: a county name, a FIPS code identifier, and a poverty rate. Each row is unique by county_name and fips, so the analytical signal lives almost entirely in poverty_rate. Poverty rate ranges from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and it is right-skewed (skew 2.10) with 137 high-end outliers (~4.25% of counties). That long upper tail is the first thing worth a closer look, since a small number of counties have poverty rates several times the national median.

citing: row_count · column_count · columns.poverty_rate.stats.min · columns.poverty_rate.stats.max · columns.poverty_rate.stats.mean · columns.poverty_rate.stats.median · columns.poverty_rate.stats.skew · columns.poverty_rate.stats.n_outliers · columns.poverty_rate.stats.outlier_rate · columns.county_name.n_unique · columns.fips.n_unique

Schema

3 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
poverty_rate numeric 0.0% 1,719
high_skew

fips

numeric identifier
This is the U.S. county FIPS code, a 5-digit geographic identifier where the leading 1-2 digits encode state. All 3222 values are unique with zero nulls, and the range 1001 to 72153 spans Alabama through Puerto Rico, consistent with a complete county roster. Treating it as numeric is misleading despite the clean distribution (skew 0.16, no outliers) since the magnitudes are categorical codes, not measurements. Treatment: cast to zero-padded string and use as a join key to geographic reference tables. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g., ', '), with every one of the 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 entries don't follow the 'X County, State' pattern (likely parishes in Louisiana, boroughs in Alaska, or independent cities). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the actual US county counts. Treatment: Use as a join key against county-level reference data; split into county and state fields before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

poverty_rate

numeric feature high_skew
Numeric poverty rate (likely percent of population below the poverty line) across 3222 rows with no nulls and 1719 unique values. Distribution is right-skewed (skew 2.10, kurtosis 6.89) with median 13.55 and mean 15.10, ranging from 1.6 to 66.32, and 137 outliers (4.25%) sit above the upper whisker. The long upper tail suggests a small set of high-poverty units pulling the mean above the median. Treatment: Apply a log or Box-Cox transform before linear modelling to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0