saturn·

economic poverty depth by county

source /home/coolhand/datasets/us-inequality-atlas/economic/poverty_depth_by_county.csv 3,222 rows 7 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows of US county-level poverty statistics, with each row identified by a FIPS code, county name, and state abbreviation, plus three poverty rate measures and a population total. The poverty measures are all right-skewed: pct_poverty ranges from 1.6% to 66.32% with a median of 13.55%, while pct_deep_poverty has a median of 5.82% but reaches as high as 34.7%. The total population column is extremely skewed (skew of 13.4, kurtosis ~297) with a median of 25,174 but a max near 9.8 million, so any aggregate analysis should account for this. Texas (254 counties), Georgia (159), and Virginia (133) dominate the state distribution, which matters for any state-level rollups.

citing: row_count · column_count · columns · kinds

Schema

7 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 1,960
short_text duplicates
state categorical 0.0% 52
total numeric 0.0% 3,173
high_skew outliers
pct_deep_poverty numeric 0.0% 1,131
high_skew outliers
pct_poverty numeric 0.0% 1,719
high_skew
pct_near_poverty numeric 0.0% 1,237

fips

numeric identifier
This column is the US county FIPS code: every one of the 3222 rows is unique, null-free, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. Treating it as numeric is misleading despite the clean distribution (skew 0.16, no outliers) — the digits are categorical identifiers, not measurements. Treatment: Cast to zero-padded string and use as a join key to county-level data. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text feature short_text duplicates
This column holds US county-level place names — virtually every value ends in 'County' (2999 occurrences), with smaller groups of Louisiana 'parish' (64) and Puerto Rican 'municipio' (78) entries. Despite 3222 rows, only 1960 are unique and 39.2% are duplicates, because common names like Washington County (30), Jefferson County (25) and Franklin County (24) recur across states. Values are short and uniform (mean 14.2 chars, ~2 words), so the name alone does not uniquely identify a county. Treatment: Pair with a state column to form a unique key before joining or grouping. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,960
len_min
10
len_max
46
len_mean
14.17
len_median
14
len_p95
18
word_mean
2.083
word_median
2
n_empty
0
n_duplicates
1,262
duplicate_rate
0.3917
vocab_size
1,963
readability_flesch_mean
33.36
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

categorical feature
This is a US state code field with 52 distinct values, consistent with the 50 states plus DC and likely one territory. Distribution is broad and near-uniform (entropy ratio 0.93), with TX leading at just 7.88% (254 of 3222 rows), followed by GA, VA, KY, and MO. No nulls, and the row count suggests multiple records per state rather than one-per-state. Treatment: One-hot or target-encode for modelling; usable as a join key to state-level reference data. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
top_value
TX
top_rate
0.07883
cardinality
52
entropy
5.314
entropy_ratio
0.9322

total

numeric feature high_skew outliers
A heavily right-skewed numeric measure (skew 13.36, kurtosis 297.59) ranging from 47 to 9,782,602 with a median of 25,174 but a mean of 101,340 — the upper tail dwarfs the center. Roughly 13.9% of rows (449) flag as outliers, and the standard deviation (324,628) is over three times the mean, signalling a few very large values dominate. With 3,173 unique values across 3,222 rows and no nulls or zeros, this looks like a per-record aggregate total rather than a category or flag. Treatment: log-transform before modelling and consider winsorising the extreme tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,173
min
47
max
9.783e+06
mean
1.013e+05
median
25,174
std
3.246e+05
q1
1.059e+04
q3
6.501e+04
iqr
5.442e+04
skew
13.36
kurtosis
297.6
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_deep_poverty

numeric feature high_skew outliers
This is a numeric feature representing the percent of population in deep poverty, likely at a county or similar geographic unit (n=3222 with 1131 unique values). The distribution is right-skewed (skew 2.67, kurtosis 10.4) with a median of 5.82 but a max of 34.7, and 176 outliers (5.46%) sit in the upper tail. Min is 0.0 but the zero rate is just 0.09%, so the floor is rarely hit. Treatment: Apply a log1p or similar transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,131
min
0
max
34.7
mean
6.743
median
5.82
std
4.154
q1
4.27
q3
7.918
iqr
3.648
skew
2.665
kurtosis
10.4
n_outliers
176
outlier_rate
0.05462
zero_rate
0.0009311

pct_poverty

numeric feature high_skew
Likely a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55. The distribution is heavily right-skewed (skew 2.10, kurtosis 6.89) with 137 high-end outliers (~4.3%) pulling the mean (15.10) above the median. No nulls or zeros across 3,222 rows. Treatment: Consider a log or Yeo-Johnson transform before linear modelling to tame the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0

pct_near_poverty

numeric feature
This column reports a percentage of population near the poverty line, ranging from 0.58 to 49.14 with a mean of 9.81 and median of 9.38. The distribution is right-skewed (skew 1.19, kurtosis 5.73) with 82 outliers (2.55%) on the high tail, but no nulls or zeros. The IQR is tight at 4.43, so most observations cluster between 7.33 and 11.76 with a long upper tail. Treatment: Consider a log or winsorization before regression to dampen the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,237
min
0.58
max
49.14
mean
9.813
median
9.38
std
3.644
q1
7.33
q3
11.76
iqr
4.43
skew
1.19
kurtosis
5.729
n_outliers
82
outlier_rate
0.02545
zero_rate
0