saturn·

county health rankings

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet

Saturn profiled 3,222 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet",
    "--findings", "county_health_rankings.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows of US county-level health data, with each row identified by a unique county name and FIPS code, plus three numeric measures: total population, uninsured population, and uninsured rate. The population fields are extremely right-skewed — total_pop ranges from 47 to nearly 9.87 million with a median of 25,328, and uninsured_pop shows similar skew (median 36, max 20,915), so a few large counties dominate. The uninsured_rate is the most analytically interesting field: it has a median of 0.12 but stretches up to 3.7, with about 17% of counties reporting zero, suggesting either small/edge cases or data quality issues worth investigating. Start by examining the distribution of uninsured_rate and how it relates to total_pop.

citing: row_count · columns.total_pop.stats · columns.uninsured_pop.stats · columns.uninsured_rate.stats · columns.county_name.top_words

Out[4]:

saturn.schema() · 5 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 3,222 near_unique
total_pop numeric 3,222 0.0% 3,141 high_skew outliers
uninsured_pop numeric 3,222 0.0% 584 high_skew outliers
uninsured_rate numeric 3,222 0.0% 152 high_skew outliers
Fig 1.
total_pop · Heavy right skew — most counties are small but a handful exceed several million residents.
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061
Fig 2.
uninsured_rate · Look for the cluster near zero (17% of counties) and the long tail extending past 1.0, which warrants a data-quality check.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71
Fig 3.
uninsured_pop · Highly skewed counts of uninsured residents; the median is just 36 but a few counties report over 20,000.
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041
Fig 4.
fips · FIPS codes are roughly uniform across the 1,001–72,153 range, confirming nationwide coverage.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
total_popnumeric0.0%
uninsured_popnumeric0.0%
uninsured_ratenumeric0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
fipstotal_popuninsured_popuninsured_rate
fips+1.00-0.07-0.02+0.01
total_pop-0.07+1.00+0.81-0.05
uninsured_pop-0.02+0.81+1.00+0.12
uninsured_rate+0.01-0.05+0.12+1.00

fips numeric identifier

This is the U.S. county FIPS code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. The distribution is near-symmetric (skew 0.16) with no statistical outliers, consistent with an identifier rather than a measured quantity.

Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than feeding it into a model as numeric.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 7.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text identifier

This column lists US county names paired with their state (e.g., 'County, Texas'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana or 'Borough/Census Area' in Alaska). State frequencies match expectations, with Texas (256) leading — consistent with Texas having the most counties nationally.

Treatment: Split into county and state fields, then left-join on this key to geographic reference tables.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 8.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

total_pop numeric feature

This is a population count, almost certainly per geographic unit (likely US counties given n=3222), with values from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with 453 outliers (14.06%), reflecting a few massive metros dwarfing thousands of small areas. Mean (102,232) sits far above the median, confirming the heavy tail.

Treatment: log-transform before any modelling or distance-based analysis.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["total_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,141
min 47
max 9.867e+06
mean 1.022e+05
median 25,328
std 3.269e+05
q1 1.061e+04
q3 65,190
iqr 5.458e+04
skew 13.38
kurtosis 298.7
n_outliers 453
outlier_rate 0.1406
zero_rate 0
alert: high_skewskew=+13.38
alert: outliers14.1% rows beyond 1.5 IQR
Fig 9.
Distribution of total_pop. Vertical dash marks the median.
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061

uninsured_pop numeric feature

Likely a county- or tract-level count of uninsured residents, with 3222 rows and 584 unique values. The distribution is extremely right-skewed (skew 17.8, kurtosis 462.9): median is 36 while the max hits 20915 and the mean is 159.9, and 17.2% of rows are zero. About 11.4% of values (368) flag as outliers, consistent with a few very populous areas dominating.

Treatment: Log1p-transform before modelling to tame the heavy tail and zero inflation.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["uninsured_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique584
min 0
max 20,915
mean 159.9
median 36
std 627.2
q1 7
q3 120
iqr 113
skew 17.81
kurtosis 462.9
n_outliers 368
outlier_rate 0.1142
zero_rate 0.1723
alert: high_skewskew=+17.81
alert: outliers11.4% rows beyond 1.5 IQR
Fig 10.
Distribution of uninsured_pop. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041

uninsured_rate numeric feature

Likely a per-record uninsured rate, expressed as a fraction (median 0.12, q3 0.25) but with a long tail reaching 3.7, which is implausible for a true rate and suggests mixed units or data entry errors. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. No nulls across 3222 rows and only 152 unique values, hinting at rounded or binned source data.

Treatment: Validate units and cap or winsorize the >1.0 tail before log-transforming for modelling.

anthropic:claude-opus-4-7 · confidence medium
Out[24]:

saturn.columns["uninsured_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique152
min 0
max 3.7
mean 0.2002
median 0.12
std 0.2829
q1 0.04
q3 0.25
iqr 0.21
skew 4.095
kurtosis 27.7
n_outliers 230
outlier_rate 0.07138
zero_rate 0.1754
alert: high_skewskew=+4.10
alert: outliers7.1% rows beyond 1.5 IQR
Fig 11.
Distribution of uninsured_rate. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71

How to cite

click to copy

BibTeX
@misc{saturn-county-health-rankings-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: county health rankings},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/county_health_rankings}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: county health rankings. Source: /home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/county_health_rankings