saturn·

healthcare data poverty data 20260121

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet

Saturn profiled 3,222 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet",
    "--findings", "healthcare_data-poverty_data_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows describing U.S. county-level poverty, with three columns: a FIPS code, a county name, and a poverty rate. Each row is a unique county (3,222 unique FIPS codes and county names), so the analytical signal lives in the poverty_rate column. Poverty rates range from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and the distribution is right-skewed (skew ≈ 2.10) with 137 outliers on the high end. The county_name field also reveals geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties. Start by examining the shape of poverty_rate and which states the high-poverty outliers cluster in.

citing: row_count · column_count · columns.poverty_rate.stats · columns.county_name.top_words · columns.fips.n_unique

Out[4]:

saturn.schema() · 3 columns

column kind n null% unique alerts
fips text 3,222 0.0% 3,222 near_unique one_word allcaps short_text
county_name text 3,222 0.0% 3,222 near_unique
poverty_rate numeric 3,222 0.0% 1,719 high_skew
Fig 1.
poverty_rate · Look for the right-skewed tail above ~28% where the 137 high-poverty outlier counties sit.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321
Fig 2.
county_name · Top words show which states dominate the county list, led by Texas, Virginia, and Georgia.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Fig 3.
county_name · Name length is tight (median 24 chars) — useful as a sanity check that entries follow a consistent 'X County, State' format.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipstext0.0%
county_nametext0.0%
poverty_ratenumeric0.0%

fips text identifier

This column holds 5-character FIPS codes uniquely identifying each of the 3222 rows (n_unique equals n, null_rate 0). Every value is exactly 5 characters, one word, all-caps/numeric, with zero duplicates or empties. Sample values like 01001, 01003, 01005 match the standard US county FIPS encoding (state+county).

Treatment: Treat as a county-level key; left-join on this id and exclude from modelling features.

anthropic:claude-opus-4-7 · confidence high
Out[10]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 5
len_max 5
len_mean 5
len_median 5
len_p95 5
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 3,222
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 5.
Character-length distribution for fips.
Show data table
Character-length distribution for fips (mean: 5.0).
charscount
4 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 53222
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60

county_name text identifier

This is a county identifier string, likely formatted as " County, " — "county," appears in 2999 of 3222 rows and Texas (256), Virginia (189), and Georgia (159) lead the state mentions. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly (min 16, median 24, max 59), consistent with a clean US county roster. The 223 rows lacking the "county," token are worth checking — likely parishes, boroughs, or independent cities.

Treatment: Split into county and state fields and use as a join key rather than a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 6.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

poverty_rate numeric feature

This is a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and IQR of 7.75. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 high outliers (4.25%) reflecting pockets of severe poverty well above the typical 10–18% band. No nulls or zeros, and 1719 unique values across 3222 rows suggest one record per geographic unit.

Treatment: Apply a log or sqrt transform before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["poverty_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,719
min 1.6
max 66.32
mean 15.1
median 13.55
std 7.706
q1 10.16
q3 17.91
iqr 7.75
skew 2.096
kurtosis 6.891
n_outliers 137
outlier_rate 0.04252
zero_rate 0
alert: high_skewskew=+2.10
Fig 7.
Distribution of poverty_rate. Vertical dash marks the median.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321

How to cite

click to copy

BibTeX
@misc{saturn-healthcare-data-poverty-data-20260121-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: healthcare data poverty data 20260121},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/healthcare_data-poverty_data_20260121}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: healthcare data poverty data 20260121. Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/healthcare_data-poverty_data_20260121