saturn·

economic gini by county

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/us-inequality-atlas/economic/gini_by_county.csv

Saturn profiled 3,222 rows across 4 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-inequality-atlas/economic/gini_by_county.csv",
    "--findings", "economic-gini_by_county.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 US county-level records with four fields: county name, FIPS code, Gini index, and state. The Gini index is the most analytically interesting column, with a mean of 0.448 and a max of 0.721, plus 56 outliers worth investigating for unusually high local inequality. The state distribution is broad (52 unique values), led by Texas (254 counties) and Georgia (159), so any state-level comparison should account for that imbalance. County names show a 39% duplicate rate, reflecting common names like Washington, Jefferson, and Franklin County that recur across states.

citing: row_count · column_count · columns.gini_index.stats · columns.state.top_values · columns.county_name.stats · columns.county_name.top_values

Out[4]:

saturn.schema() · 4 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 1,960 short_text duplicates
state categorical 3,222 0.0% 52
gini_index numeric 3,222 0.0% 1,317
Fig 1.
gini_index · Look at the right tail and the 56 flagged outliers to spot counties with unusually high inequality.
Show data table
Histogram bins for gini_index (median: 0.4457).
bincount
0.2744 – 0.28561
0.2856 – 0.29671
0.2967 – 0.30791
0.3079 – 0.31910
0.3191 – 0.33021
0.3302 – 0.34143
0.3414 – 0.35266
0.3526 – 0.363710
0.3637 – 0.374929
0.3749 – 0.386166
0.3861 – 0.3972123
0.3972 – 0.4084202
0.4084 – 0.4195277
0.4195 – 0.4307365
0.4307 – 0.4419375
0.4419 – 0.453402
0.453 – 0.4642370
0.4642 – 0.4754299
0.4754 – 0.4865227
0.4865 – 0.4977162
0.4977 – 0.5089104
0.5089 – 0.5280
0.52 – 0.531237
0.5312 – 0.542426
0.5424 – 0.553522
0.5535 – 0.564714
0.5647 – 0.57596
0.5759 – 0.5875
0.587 – 0.59822
0.5982 – 0.60933
0.6093 – 0.62052
0.6205 – 0.63170
0.6317 – 0.64280
0.6428 – 0.6540
0.654 – 0.66520
0.6652 – 0.67630
0.6763 – 0.68750
0.6875 – 0.69870
0.6987 – 0.70980
0.7098 – 0.7211
Fig 2.
state · Texas and Georgia dominate the county counts; weight any state comparisons accordingly.
Show data table
Top values for state (20 unique shown, of 52 total).
valuecountshare
TX2547.9%
GA1594.9%
VA1334.1%
KY1203.7%
MO1153.6%
KS1053.3%
IL1023.2%
NC1003.1%
IA993.1%
TN952.9%
NE932.9%
IN922.9%
OH882.7%
MN872.7%
MI832.6%
MS822.5%
PR782.4%
OK772.4%
AR752.3%
WI722.2%
Fig 3.
county_name · Top recurring county names like Washington, Jefferson, and Franklin drive the 39% duplicate rate.
Show data table
Character-length distribution for county_name (mean: 14.172253258845437).
charscount
10 – 1129
11 – 12255
12 – 13465
13 – 14682
14 – 14588
14 – 15493
15 – 16291
16 – 17219
17 – 1867
18 – 190
19 – 2049
20 – 2123
21 – 2216
22 – 2314
23 – 248
24 – 244
24 – 255
25 – 262
26 – 271
27 – 280
28 – 291
29 – 300
30 – 310
31 – 322
32 – 321
32 – 331
33 – 341
34 – 351
35 – 360
36 – 370
37 – 380
38 – 390
39 – 400
40 – 412
41 – 421
42 – 420
42 – 430
43 – 440
44 – 450
45 – 461
Fig 4.
fips · FIPS codes span 1,001 to 72,153 and act as a unique row identifier across states and territories.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
statecategorical0.0%
gini_indexnumeric0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
fipsgini_index
fips+1.00+0.00
gini_index+0.00+1.00

fips numeric identifier

This is the U.S. county FIPS code: a 5-digit numeric identifier where the first two digits encode state and the last three encode county. With 3222 unique values across 3222 rows, no nulls, and a range from 1001 to 72153 spanning the standard FIPS state prefixes, every row corresponds to a distinct county. Distribution stats (mean 31377, std 16299, near-zero skew) are artifacts of the prefix encoding and not meaningful as a numeric feature.

Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than using as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 7.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text metadata

This column holds US county-level place names: nearly every value ends in 'County' (2999 of 3222 rows), with smaller contingents of 'Parish' (64, Louisiana), 'Municipio' (78, Puerto Rico), and 'City' (47). Heavy duplication is expected and present — 39.2% duplicate rate with 1262 repeats — because common names like Washington, Jefferson, and Franklin County recur across states. Lengths are tight (10–46 chars, mean 14.2, ~2 words) and there are no nulls or empties.

Treatment: Pair with a state column to form a unique geographic key before joining or grouping.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,960
len_min 10
len_max 46
len_mean 14.17
len_median 14
len_p95 18
word_mean 2.083
word_median 2
n_empty 0
n_duplicates 1,262
duplicate_rate 0.3917
vocab_size 1,963
readability_flesch_mean 33.36
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: short_text95th-percentile length under 20 chars
alert: duplicates39.2% duplicate strings
Fig 8.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 14.172253258845437).
charscount
10 – 1129
11 – 12255
12 – 13465
13 – 14682
14 – 14588
14 – 15493
15 – 16291
16 – 17219
17 – 1867
18 – 190
19 – 2049
20 – 2123
21 – 2216
22 – 2314
23 – 248
24 – 244
24 – 255
25 – 262
26 – 271
27 – 280
28 – 291
29 – 300
30 – 310
31 – 322
32 – 321
32 – 331
33 – 341
34 – 351
35 – 360
36 – 370
37 – 380
38 – 390
39 – 400
40 – 412
41 – 421
42 – 420
42 – 430
43 – 440
44 – 450
45 – 461

state categorical feature

This is a US state code field with 52 distinct values across 3222 rows and no nulls, consistent with the 50 states plus DC and likely a territory. Distribution closely tracks county counts: TX leads at 254 (7.88%), followed by GA (159) and VA (133), and entropy is high at 5.31 (ratio 0.93), indicating broad spread rather than concentration. The 52-value cardinality is the only mild surprise—worth confirming whether the extras are DC, PR, or stray codes.

Treatment: One-hot or target-encode for modelling; verify the two codes beyond the 50 states.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["state"].stats

statvalue
n3,222
nulls0 (0.0%)
unique52
top_value TX
top_rate 0.07883
cardinality 52
entropy 5.314
entropy_ratio 0.9322
Fig 9.
Top values for state.
Show data table
Top values for state (20 unique shown, of 52 total).
valuecountshare
TX2547.9%
GA1594.9%
VA1334.1%
KY1203.7%
MO1153.6%
KS1053.3%
IL1023.2%
NC1003.1%
IA993.1%
TN952.9%
NE932.9%
IN922.9%
OH882.7%
MN872.7%
MI832.6%
MS822.5%
PR782.4%
OK772.4%
AR752.3%
WI722.2%

gini_index numeric feature

Numeric column holding Gini index values, all within the plausible 0.2744–0.721 range with no nulls or zeros across 3222 rows. The distribution is tight (IQR 0.049, std 0.038) and centred near 0.448, but a mild right skew (0.50) and 56 high-end outliers (1.7%) suggest a handful of unusually unequal observations.

Treatment: Use as-is as a numeric feature; consider winsorising the upper outliers if downstream models are sensitive.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["gini_index"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,317
min 0.2744
max 0.721
mean 0.4481
median 0.4457
std 0.03841
q1 0.422
q3 0.4714
iqr 0.04938
skew 0.4999
kurtosis 1.634
n_outliers 56
outlier_rate 0.01738
zero_rate 0
Fig 10.
Distribution of gini_index. Vertical dash marks the median.
Show data table
Histogram bins for gini_index (median: 0.4457).
bincount
0.2744 – 0.28561
0.2856 – 0.29671
0.2967 – 0.30791
0.3079 – 0.31910
0.3191 – 0.33021
0.3302 – 0.34143
0.3414 – 0.35266
0.3526 – 0.363710
0.3637 – 0.374929
0.3749 – 0.386166
0.3861 – 0.3972123
0.3972 – 0.4084202
0.4084 – 0.4195277
0.4195 – 0.4307365
0.4307 – 0.4419375
0.4419 – 0.453402
0.453 – 0.4642370
0.4642 – 0.4754299
0.4754 – 0.4865227
0.4865 – 0.4977162
0.4977 – 0.5089104
0.5089 – 0.5280
0.52 – 0.531237
0.5312 – 0.542426
0.5424 – 0.553522
0.5535 – 0.564714
0.5647 – 0.57596
0.5759 – 0.5875
0.587 – 0.59822
0.5982 – 0.60933
0.6093 – 0.62052
0.6205 – 0.63170
0.6317 – 0.64280
0.6428 – 0.6540
0.654 – 0.66520
0.6652 – 0.67630
0.6763 – 0.68750
0.6875 – 0.69870
0.6987 – 0.70980
0.7098 – 0.7211

How to cite

click to copy

BibTeX
@misc{saturn-economic-gini-by-county-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: economic gini by county},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/economic-gini_by_county}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: economic gini by county. Source: /home/coolhand/datasets/us-inequality-atlas/economic/gini_by_county.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/economic-gini_by_county