county_health_rankings · saturn notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet

Saturn profiled 3,222 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet",
    "--findings", "county_health_rankings.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows of US county-level health data, with each row identified by a unique county name and FIPS code, plus three numeric measures: total population, uninsured population, and uninsured rate. The population fields are extremely right-skewed — total_pop ranges from 47 to nearly 9.87 million with a median of 25,328, and uninsured_pop shows similar skew (median 36, max 20,915), so a few large counties dominate. The uninsured_rate is the most analytically interesting field: it has a median of 0.12 but stretches up to 3.7, with about 17% of counties reporting zero, suggesting either small/edge cases or data quality issues worth investigating. Start by examining the distribution of uninsured_rate and how it relates to total_pop.

citing: row_count · columns.total_pop.stats · columns.uninsured_pop.stats · columns.uninsured_rate.stats · columns.county_name.top_words

Out[4]:

saturn.schema() · 5 columns

column	kind	n	null%	unique	alerts
fips	numeric	3,222	0.0%	3,222
county_name	text	3,222	0.0%	3,222	near_unique
total_pop	numeric	3,222	0.0%	3,141	high_skew outliers
uninsured_pop	numeric	3,222	0.0%	584	high_skew outliers
uninsured_rate	numeric	3,222	0.0%	152	high_skew outliers

Fig 1.

total_pop · Heavy right skew — most counties are small but a handful exceed several million residents.

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

Fig 2.

uninsured_rate · Look for the cluster near zero (17% of counties) and the long tail extending past 1.0, which warrants a data-quality check.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

Fig 3.

uninsured_pop · Highly skewed counts of uninsured residents; the median is just 36 but a few counties report over 20,000.

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

Fig 4.

fips · FIPS codes are roughly uniform across the 1,001–72,153 range, confirming nationwide coverage.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

Fig 5.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	numeric	0.0%
county_name	text	0.0%
total_pop	numeric	0.0%
uninsured_pop	numeric	0.0%
uninsured_rate	numeric	0.0%

Fig 6.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	fips	total_pop	uninsured_pop	uninsured_rate
fips	+1.00	-0.07	-0.02	+0.01
total_pop	-0.07	+1.00	+0.81	-0.05
uninsured_pop	-0.02	+0.81	+1.00	+0.12
uninsured_rate	+0.01	-0.05	+0.12	+1.00

fips numeric identifier

This is the U.S. county FIPS code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. The distribution is near-symmetric (skew 0.16) with no statistical outliers, consistent with an identifier rather than a measured quantity.

Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than feeding it into a model as numeric.

anthropic:claude-opus-4-7 · confidence high

Out[12]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
min	1,001
max	72,153
mean	3.138e+04
median	30,022
std	1.63e+04
q1	1.903e+04
q3	4.61e+04
iqr	27,075
skew	0.1574
kurtosis	-0.6314
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 7.

Distribution of fips. Vertical dash marks the median.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

county_name text identifier

This column lists US county names paired with their state (e.g., 'County, Texas'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana or 'Borough/Census Area' in Alaska). State frequencies match expectations, with Texas (256) leading — consistent with Texas having the most counties nationally.

Treatment: Split into county and state fields, then left-join on this key to geographic reference tables.

anthropic:claude-opus-4-7 · confidence high

Out[15]:

saturn.columns["county_name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 8.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

total_pop numeric feature

This is a population count, almost certainly per geographic unit (likely US counties given n=3222), with values from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with 453 outliers (14.06%), reflecting a few massive metros dwarfing thousands of small areas. Mean (102,232) sits far above the median, confirming the heavy tail.

Treatment: log-transform before any modelling or distance-based analysis.

anthropic:claude-opus-4-7 · confidence high

Out[18]:

saturn.columns["total_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,141
min	47
max	9.867e+06
mean	1.022e+05
median	25,328
std	3.269e+05
q1	1.061e+04
q3	65,190
iqr	5.458e+04
skew	13.38
kurtosis	298.7
n_outliers	453
outlier_rate	0.1406
zero_rate	0
alert: high_skew	skew=+13.38
alert: outliers	14.1% rows beyond 1.5 IQR

Fig 9.

Distribution of total_pop. Vertical dash marks the median.

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

uninsured_pop numeric feature

Likely a county- or tract-level count of uninsured residents, with 3222 rows and 584 unique values. The distribution is extremely right-skewed (skew 17.8, kurtosis 462.9): median is 36 while the max hits 20915 and the mean is 159.9, and 17.2% of rows are zero. About 11.4% of values (368) flag as outliers, consistent with a few very populous areas dominating.

Treatment: Log1p-transform before modelling to tame the heavy tail and zero inflation.

anthropic:claude-opus-4-7 · confidence high

Out[21]:

saturn.columns["uninsured_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	584
min	0
max	20,915
mean	159.9
median	36
std	627.2
q1	7
q3	120
iqr	113
skew	17.81
kurtosis	462.9
n_outliers	368
outlier_rate	0.1142
zero_rate	0.1723
alert: high_skew	skew=+17.81
alert: outliers	11.4% rows beyond 1.5 IQR

Fig 10.

Distribution of uninsured_pop. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

uninsured_rate numeric feature

Likely a per-record uninsured rate, expressed as a fraction (median 0.12, q3 0.25) but with a long tail reaching 3.7, which is implausible for a true rate and suggests mixed units or data entry errors. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. No nulls across 3222 rows and only 152 unique values, hinting at rounded or binned source data.

Treatment: Validate units and cap or winsorize the >1.0 tail before log-transforming for modelling.

anthropic:claude-opus-4-7 · confidence medium

Out[24]:

saturn.columns["uninsured_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	152
min	0
max	3.7
mean	0.2002
median	0.12
std	0.2829
q1	0.04
q3	0.25
iqr	0.21
skew	4.095
kurtosis	27.7
n_outliers	230
outlier_rate	0.07138
zero_rate	0.1754
alert: high_skew	skew=+4.10
alert: outliers	7.1% rows beyond 1.5 IQR

Fig 11.

Distribution of uninsured_rate. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

county health rankings

Overview

Summary confidence: high

fips numeric identifier

county_name text identifier

total_pop numeric feature

uninsured_pop numeric feature

uninsured_rate numeric feature

How to cite