healthcare data poverty data 20260121

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet

Saturn profiled 3,222 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet",
    "--findings", "healthcare_data-poverty_data_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows describing U.S. county-level poverty, with three columns: a FIPS code, a county name, and a poverty rate. Each row is a unique county (3,222 unique FIPS codes and county names), so the analytical signal lives in the poverty_rate column. Poverty rates range from 1.6% to 66.32% with a mean of 15.1% and median of 13.55%, and the distribution is right-skewed (skew ≈ 2.10) with 137 outliers on the high end. The county_name field also reveals geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties. Start by examining the shape of poverty_rate and which states the high-poverty outliers cluster in.

citing: row_count · column_count · columns.poverty_rate.stats · columns.county_name.top_words · columns.fips.n_unique

Out[4]:

saturn.schema() · 3 columns

column	kind	n	null%	unique	alerts
fips	text	3,222	0.0%	3,222	near_unique one_word allcaps short_text
county_name	text	3,222	0.0%	3,222	near_unique
poverty_rate	numeric	3,222	0.0%	1,719	high_skew

Fig 1.

poverty_rate · Look for the right-skewed tail above ~28% where the 137 high-poverty outlier counties sit.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

Fig 2.

county_name · Top words show which states dominate the county list, led by Texas, Virginia, and Georgia.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Fig 3.

county_name · Name length is tight (median 24 chars) — useful as a sanity check that entries follow a consistent 'X County, State' format.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Fig 4.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	text	0.0%
county_name	text	0.0%
poverty_rate	numeric	0.0%

fips text identifier

This column holds 5-character FIPS codes uniquely identifying each of the 3222 rows (n_unique equals n, null_rate 0). Every value is exactly 5 characters, one word, all-caps/numeric, with zero duplicates or empties. Sample values like 01001, 01003, 01005 match the standard US county FIPS encoding (state+county).

Treatment: Treat as a county-level key; left-join on this id and exclude from modelling features.

anthropic:claude-opus-4-7 · confidence high

Out[10]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	5
len_max	5
len_mean	5
len_median	5
len_p95	5
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	3,222
readability_flesch_mean	121.2
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	1
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: allcaps	100.0% rows are all-caps
alert: short_text	95th-percentile length under 20 chars

Fig 5.

Character-length distribution for fips.

Show data table

Character-length distribution for fips (mean: 5.0).
chars	count
4 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	3222
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 6	0

county_name text identifier

This is a county identifier string, likely formatted as " County, " — "county," appears in 2999 of 3222 rows and Texas (256), Virginia (189), and Georgia (159) lead the state mentions. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly (min 16, median 24, max 59), consistent with a clean US county roster. The 223 rows lacking the "county," token are worth checking — likely parishes, boroughs, or independent cities.

Treatment: Split into county and state fields and use as a join key rather than a model feature.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["county_name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 6.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

poverty_rate numeric feature

This is a county- or tract-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and IQR of 7.75. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 high outliers (4.25%) reflecting pockets of severe poverty well above the typical 10–18% band. No nulls or zeros, and 1719 unique values across 3222 rows suggest one record per geographic unit.

Treatment: Apply a log or sqrt transform before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["poverty_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,719
min	1.6
max	66.32
mean	15.1
median	13.55
std	7.706
q1	10.16
q3	17.91
iqr	7.75
skew	2.096
kurtosis	6.891
n_outliers	137
outlier_rate	0.04252
zero_rate	0
alert: high_skew	skew=+2.10

Fig 7.

Distribution of poverty_rate. Vertical dash marks the median.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

How to cite

click to copy

BibTeX

@misc{saturn-healthcare-data-poverty-data-20260121-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: healthcare data poverty data 20260121},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/healthcare_data-poverty_data_20260121}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: healthcare data poverty data 20260121. Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/poverty_data_20260121.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/healthcare_data-poverty_data_20260121