saturn·

healthcare data county health rankings 20260121

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/county_health_rankings_20260121.parquet

Saturn profiled 3,222 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/healthcare_data/county_health_rankings_20260121.parquet",
    "--findings", "healthcare_data-county_health_rankings_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset covers 3,222 U.S. counties (one row per FIPS code) with population totals and uninsured counts and rates. Both total_pop and uninsured_pop are extremely right-skewed (skew 13.4 and 17.8) with hundreds of outliers, indicating a handful of very large counties dominate the raw counts — analysts should work in per-capita or log space. The uninsured_rate is the more comparable metric: median 0.12 with about 17.5% of counties reporting zero, and a long tail reaching 3.7 that warrants a data-quality check. The county_name field shows Texas, Virginia, and Georgia contributing the most counties, useful context for any state-level rollups.

citing: row_count · column_count · columns.total_pop.stats.skew · columns.total_pop.stats.median · columns.total_pop.stats.max · columns.total_pop.stats.n_outliers · columns.uninsured_pop.stats.skew · columns.uninsured_pop.stats.zero_rate · columns.uninsured_rate.stats.median · columns.uninsured_rate.stats.max · columns.uninsured_rate.stats.zero_rate · columns.county_name.top_words

Out[4]:

saturn.schema() · 5 columns

column kind n null% unique alerts
fips text 3,222 0.0% 3,222 near_unique one_word allcaps short_text
county_name text 3,222 0.0% 3,222 near_unique
total_pop numeric 3,222 0.0% 3,141 high_skew outliers
uninsured_pop numeric 3,222 0.0% 584 high_skew outliers
uninsured_rate numeric 3,222 0.0% 152 high_skew outliers
Fig 1.
total_pop · Look for the extreme right skew — most counties are small but a few exceed a million residents.
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061
Fig 2.
uninsured_rate · Check the distribution around the 0.12 median and the tail of high-rate outliers up to 3.7.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71
Fig 3.
uninsured_pop · Note how raw uninsured counts concentrate near zero with a long tail driven by populous counties.
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041
Fig 4.
county_name · Top words reveal which states contribute the most counties, led by Texas, Virginia, and Georgia.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipstext0.0%
county_nametext0.0%
total_popnumeric0.0%
uninsured_popnumeric0.0%
uninsured_ratenumeric0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
total_popuninsured_popuninsured_rate
total_pop+1.00+0.81-0.05
uninsured_pop+0.81+1.00+0.12
uninsured_rate-0.05+0.12+1.00

fips text identifier

This column is a 5-character FIPS code identifying U.S. counties, with every one of the 3222 rows holding a unique value (n_unique equals n) and zero nulls. Lengths are uniformly 5 (min, median, max all 5), values are single tokens (one_word_rate 1.0), and the leading samples like 01001, 01003, 01005 match Alabama county FIPS prefixes. It functions as a primary key rather than a feature.

Treatment: Use as a join key to county-level reference tables; do not feed into a model as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 5
len_max 5
len_mean 5
len_median 5
len_p95 5
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 3,222
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 7.
Character-length distribution for fips.
Show data table
Character-length distribution for fips (mean: 5.0).
charscount
4 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 53222
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60

county_name text identifier

This column holds U.S. county identifiers, likely formatted as 'County Name County, State' given that 'county,' appears in 2,999 of 3,222 rows and state names like Texas (256), Virginia (189), and Georgia (159) dominate the top tokens. Every one of the 3,222 rows is unique with zero nulls and zero duplicates, consistent with a canonical roster of U.S. counties. String lengths cluster tightly (min 16, median 24, max 59) and average 3.25 words, so formatting is highly regular.

Treatment: Use as a join key to state/county references; do not feed raw into a model.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 8.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

total_pop numeric feature

Likely a county- or region-level total population count: 3222 rows with 3141 unique values, no nulls, integer-scale magnitudes from 47 up to 9,866,623. The distribution is extremely right-skewed (skew 13.38, kurtosis 298.69) with median 25,328 far below mean 102,232, and 14.06% of rows flagged as outliers. The std of 326,933 dwarfs the IQR of 54,579, consistent with a few massive metros pulling the tail.

Treatment: log-transform before regression to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["total_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,141
min 47
max 9.867e+06
mean 1.022e+05
median 25,328
std 3.269e+05
q1 1.061e+04
q3 65,190
iqr 5.458e+04
skew 13.38
kurtosis 298.7
n_outliers 453
outlier_rate 0.1406
zero_rate 0
alert: high_skewskew=+13.38
alert: outliers14.1% rows beyond 1.5 IQR
Fig 9.
Distribution of total_pop. Vertical dash marks the median.
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061

uninsured_pop numeric feature

Counts of uninsured people per record, likely aggregated to a geographic unit (3222 rows hints at US counties). The distribution is brutally right-skewed: median is 36 but the mean is 159.9 and the max hits 20915, with skew 17.8 and kurtosis 462.9. Roughly 17.2% of rows are zero and 11.4% flag as outliers, so a handful of large jurisdictions dominate the totals.

Treatment: Apply a log1p transform and consider normalising by population before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["uninsured_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique584
min 0
max 20,915
mean 159.9
median 36
std 627.2
q1 7
q3 120
iqr 113
skew 17.81
kurtosis 462.9
n_outliers 368
outlier_rate 0.1142
zero_rate 0.1723
alert: high_skewskew=+17.81
alert: outliers11.4% rows beyond 1.5 IQR
Fig 10.
Distribution of uninsured_pop. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041

uninsured_rate numeric feature

This looks like a per-record uninsured rate, ranging from 0.0 to 3.7 with a median of 0.12 and IQR of 0.21. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.14%) and 17.54% exact zeros, and the max of 3.7 is implausible if this is meant to be a proportion bounded at 1.

Treatment: Validate the >1 values against the expected [0,1] range, then log- or logit-transform after winsorising before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[24]:

saturn.columns["uninsured_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique152
min 0
max 3.7
mean 0.2002
median 0.12
std 0.2829
q1 0.04
q3 0.25
iqr 0.21
skew 4.095
kurtosis 27.7
n_outliers 230
outlier_rate 0.07138
zero_rate 0.1754
alert: high_skewskew=+4.10
alert: outliers7.1% rows beyond 1.5 IQR
Fig 11.
Distribution of uninsured_rate. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71

How to cite

click to copy

BibTeX
@misc{saturn-healthcare-data-county-health-rankings-20260121-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: healthcare data county health rankings 20260121},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/healthcare_data-county_health_rankings_20260121}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: healthcare data county health rankings 20260121. Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/county_health_rankings_20260121.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/healthcare_data-county_health_rankings_20260121