economic gini by county

source /home/coolhand/datasets/us-inequality-atlas/economic/gini_by_county.csv 3,222 rows 4 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 US county-level records with four fields: county name, FIPS code, Gini index, and state. The Gini index is the most analytically interesting column, with a mean of 0.448 and a max of 0.721, plus 56 outliers worth investigating for unusually high local inequality. The state distribution is broad (52 unique values), led by Texas (254 counties) and Georgia (159), so any state-level comparison should account for that imbalance. County names show a 39% duplicate rate, reflecting common names like Washington, Jefferson, and Franklin County that recur across states.

citing: row_count · column_count · columns.gini_index.stats · columns.state.top_values · columns.county_name.stats · columns.county_name.top_values

Charts the summary said to look at first

gini_index · Look at the right tail and the 56 flagged outliers to spot counties with unusually high inequality.

Show data table

Histogram bins for gini_index (median: 0.4457).
bin	count
0.2744 – 0.2856	1
0.2856 – 0.2967	1
0.2967 – 0.3079	1
0.3079 – 0.3191	0
0.3191 – 0.3302	1
0.3302 – 0.3414	3
0.3414 – 0.3526	6
0.3526 – 0.3637	10
0.3637 – 0.3749	29
0.3749 – 0.3861	66
0.3861 – 0.3972	123
0.3972 – 0.4084	202
0.4084 – 0.4195	277
0.4195 – 0.4307	365
0.4307 – 0.4419	375
0.4419 – 0.453	402
0.453 – 0.4642	370
0.4642 – 0.4754	299
0.4754 – 0.4865	227
0.4865 – 0.4977	162
0.4977 – 0.5089	104
0.5089 – 0.52	80
0.52 – 0.5312	37
0.5312 – 0.5424	26
0.5424 – 0.5535	22
0.5535 – 0.5647	14
0.5647 – 0.5759	6
0.5759 – 0.587	5
0.587 – 0.5982	2
0.5982 – 0.6093	3
0.6093 – 0.6205	2
0.6205 – 0.6317	0
0.6317 – 0.6428	0
0.6428 – 0.654	0
0.654 – 0.6652	0
0.6652 – 0.6763	0
0.6763 – 0.6875	0
0.6875 – 0.6987	0
0.6987 – 0.7098	0
0.7098 – 0.721	1

state · Texas and Georgia dominate the county counts; weight any state comparisons accordingly.

Show data table

Top values for state (20 unique shown, of 52 total).
value	count	share
TX	254	7.9%
GA	159	4.9%
VA	133	4.1%
KY	120	3.7%
MO	115	3.6%
KS	105	3.3%
IL	102	3.2%
NC	100	3.1%
IA	99	3.1%
TN	95	2.9%
NE	93	2.9%
IN	92	2.9%
OH	88	2.7%
MN	87	2.7%
MI	83	2.6%
MS	82	2.5%
PR	78	2.4%
OK	77	2.4%
AR	75	2.3%
WI	72	2.2%

county_name · Top recurring county names like Washington, Jefferson, and Franklin drive the 39% duplicate rate.

Show data table

Character-length distribution for county_name (mean: 14.172253258845437).
chars	count
10 – 11	29
11 – 12	255
12 – 13	465
13 – 14	682
14 – 14	588
14 – 15	493
15 – 16	291
16 – 17	219
17 – 18	67
18 – 19	0
19 – 20	49
20 – 21	23
21 – 22	16
22 – 23	14
23 – 24	8
24 – 24	4
24 – 25	5
25 – 26	2
26 – 27	1
27 – 28	0
28 – 29	1
29 – 30	0
30 – 31	0
31 – 32	2
32 – 32	1
32 – 33	1
33 – 34	1
34 – 35	1
35 – 36	0
36 – 37	0
37 – 38	0
38 – 39	0
39 – 40	0
40 – 41	2
41 – 42	1
42 – 42	0
42 – 43	0
43 – 44	0
44 – 45	0
45 – 46	1

fips · FIPS codes span 1,001 to 72,153 and act as a unique row identifier across states and territories.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

Schema

4 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	numeric	0.0%	3,222
county_name	text	0.0%	1,960	short_text duplicates
state	categorical	0.0%	52
gini_index	numeric	0.0%	1,317

fips

numeric identifier

This is the U.S. county FIPS code: a 5-digit numeric identifier where the first two digits encode state and the last three encode county. With 3222 unique values across 3222 rows, no nulls, and a range from 1001 to 72153 spanning the standard FIPS state prefixes, every row corresponds to a distinct county. Distribution stats (mean 31377, std 16299, near-zero skew) are artifacts of the prefix encoding and not meaningful as a numeric feature. Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than using as a numeric feature. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
min: 1,001
max: 72,153
mean: 3.138e+04
median: 30,022
std: 1.63e+04
q1: 1.903e+04
q3: 4.61e+04
iqr: 27,075
skew: 0.1574
kurtosis: -0.6314
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_name

text metadata short_text duplicates

This column holds US county-level place names: nearly every value ends in 'County' (2999 of 3222 rows), with smaller contingents of 'Parish' (64, Louisiana), 'Municipio' (78, Puerto Rico), and 'City' (47). Heavy duplication is expected and present — 39.2% duplicate rate with 1262 repeats — because common names like Washington, Jefferson, and Franklin County recur across states. Lengths are tight (10–46 chars, mean 14.2, ~2 words) and there are no nulls or empties. Treatment: Pair with a state column to form a unique geographic key before joining or grouping. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,960
len_min: 10
len_max: 46
len_mean: 14.17
len_median: 14
len_p95: 18
word_mean: 2.083
word_median: 2
n_empty: 0
n_duplicates: 1,262
duplicate_rate: 0.3917
vocab_size: 1,963
readability_flesch_mean: 33.36
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

state

categorical feature

This is a US state code field with 52 distinct values across 3222 rows and no nulls, consistent with the 50 states plus DC and likely a territory. Distribution closely tracks county counts: TX leads at 254 (7.88%), followed by GA (159) and VA (133), and entropy is high at 5.31 (ratio 0.93), indicating broad spread rather than concentration. The 52-value cardinality is the only mild surprise—worth confirming whether the extras are DC, PR, or stray codes. Treatment: One-hot or target-encode for modelling; verify the two codes beyond the 50 states. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 52
top_value: TX
top_rate: 0.07883
cardinality: 52
entropy: 5.314
entropy_ratio: 0.9322

gini_index

numeric feature

Numeric column holding Gini index values, all within the plausible 0.2744–0.721 range with no nulls or zeros across 3222 rows. The distribution is tight (IQR 0.049, std 0.038) and centred near 0.448, but a mild right skew (0.50) and 56 high-end outliers (1.7%) suggest a handful of unusually unequal observations. Treatment: Use as-is as a numeric feature; consider winsorising the upper outliers if downstream models are sensitive. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,317
min: 0.2744
max: 0.721
mean: 0.4481
median: 0.4457
std: 0.03841
q1: 0.422
q3: 0.4714
iqr: 0.04938
skew: 0.4999
kurtosis: 1.634
n_outliers: 56
outlier_rate: 0.01738
zero_rate: 0