saturn·

poverty data

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/poverty_data.parquet

Saturn profiled 3,222 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/poverty_data.parquet",
    "--findings", "poverty_data.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 U.S. counties with three columns: a county name, a FIPS code identifier, and a poverty rate. Each row is unique by county_name and fips, so the analytical signal lives almost entirely in poverty_rate. Poverty rate ranges from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and it is right-skewed (skew 2.10) with 137 high-end outliers (~4.25% of counties). That long upper tail is the first thing worth a closer look, since a small number of counties have poverty rates several times the national median.

citing: row_count · column_count · columns.poverty_rate.stats.min · columns.poverty_rate.stats.max · columns.poverty_rate.stats.mean · columns.poverty_rate.stats.median · columns.poverty_rate.stats.skew · columns.poverty_rate.stats.n_outliers · columns.poverty_rate.stats.outlier_rate · columns.county_name.n_unique · columns.fips.n_unique

Out[4]:

saturn.schema() · 3 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 3,222 near_unique
poverty_rate numeric 3,222 0.0% 1,719 high_skew
Fig 1.
poverty_rate · Look for the right-skewed shape and the long tail of high-poverty counties stretching toward 66%.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321
Fig 2.
fips · FIPS codes are state-prefixed identifiers, so the distribution roughly shows how counties are spread across states.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478
Fig 3.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
poverty_ratenumeric0.0%
Fig 4.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
fipspoverty_rate
fips+1.00+0.16
poverty_rate+0.16+1.00

fips numeric identifier

This is the U.S. county FIPS code, a 5-digit geographic identifier where the leading 1-2 digits encode state. All 3222 values are unique with zero nulls, and the range 1001 to 72153 spans Alabama through Puerto Rico, consistent with a complete county roster. Treating it as numeric is misleading despite the clean distribution (skew 0.16, no outliers) since the magnitudes are categorical codes, not measurements.

Treatment: cast to zero-padded string and use as a join key to geographic reference tables.

anthropic:claude-opus-4-7 · confidence high
Out[10]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 5.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text identifier

This column holds fully-qualified US county names (e.g., ', '), with every one of the 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 entries don't follow the 'X County, State' pattern (likely parishes in Louisiana, boroughs in Alaska, or independent cities). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the actual US county counts.

Treatment: Use as a join key against county-level reference data; split into county and state fields before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 6.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

poverty_rate numeric feature

Numeric poverty rate (likely percent of population below the poverty line) across 3222 rows with no nulls and 1719 unique values. Distribution is right-skewed (skew 2.10, kurtosis 6.89) with median 13.55 and mean 15.10, ranging from 1.6 to 66.32, and 137 outliers (4.25%) sit above the upper whisker. The long upper tail suggests a small set of high-poverty units pulling the mean above the median.

Treatment: Apply a log or Box-Cox transform before linear modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["poverty_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,719
min 1.6
max 66.32
mean 15.1
median 13.55
std 7.706
q1 10.16
q3 17.91
iqr 7.75
skew 2.096
kurtosis 6.891
n_outliers 137
outlier_rate 0.04252
zero_rate 0
alert: high_skewskew=+2.10
Fig 7.
Distribution of poverty_rate. Vertical dash marks the median.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321

How to cite

click to copy

BibTeX
@misc{saturn-poverty-data-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: poverty data},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/poverty_data}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: poverty data. Source: /home/coolhand/html/datavis/data_trove/cache/poverty_data.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/poverty_data