county health rankings

source /home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet 3,222 rows 5 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows of US county-level health data, with each row identified by a unique county name and FIPS code, plus three numeric measures: total population, uninsured population, and uninsured rate. The population fields are extremely right-skewed — total_pop ranges from 47 to nearly 9.87 million with a median of 25,328, and uninsured_pop shows similar skew (median 36, max 20,915), so a few large counties dominate. The uninsured_rate is the most analytically interesting field: it has a median of 0.12 but stretches up to 3.7, with about 17% of counties reporting zero, suggesting either small/edge cases or data quality issues worth investigating. Start by examining the distribution of uninsured_rate and how it relates to total_pop.

citing: row_count · columns.total_pop.stats · columns.uninsured_pop.stats · columns.uninsured_rate.stats · columns.county_name.top_words

Charts the summary said to look at first

total_pop · Heavy right skew — most counties are small but a handful exceed several million residents.

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

uninsured_rate · Look for the cluster near zero (17% of counties) and the long tail extending past 1.0, which warrants a data-quality check.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

uninsured_pop · Highly skewed counts of uninsured residents; the median is just 36 but a few counties report over 20,000.

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

fips · FIPS codes are roughly uniform across the 1,001–72,153 range, confirming nationwide coverage.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

Schema

5 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	numeric	0.0%	3,222
county_name	text	0.0%	3,222	near_unique
total_pop	numeric	0.0%	3,141	high_skew outliers
uninsured_pop	numeric	0.0%	584	high_skew outliers
uninsured_rate	numeric	0.0%	152	high_skew outliers

fips

numeric identifier

This is the U.S. county FIPS code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. The distribution is near-symmetric (skew 0.16) with no statistical outliers, consistent with an identifier rather than a measured quantity. Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than feeding it into a model as numeric. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
min: 1,001
max: 72,153
mean: 3.138e+04
median: 30,022
std: 1.63e+04
q1: 1.903e+04
q3: 4.61e+04
iqr: 27,075
skew: 0.1574
kurtosis: -0.6314
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_name

text identifier near_unique

This column lists US county names paired with their state (e.g., 'County, Texas'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana or 'Borough/Census Area' in Alaska). State frequencies match expectations, with Texas (256) leading — consistent with Texas having the most counties nationally. Treatment: Split into county and state fields, then left-join on this key to geographic reference tables. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
len_min: 16
len_max: 59
len_mean: 24.32
len_median: 24
len_p95: 31
word_mean: 3.248
word_median: 3
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,990
readability_flesch_mean: 10.28
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

total_pop

numeric feature high_skew outliers

This is a population count, almost certainly per geographic unit (likely US counties given n=3222), with values from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with 453 outliers (14.06%), reflecting a few massive metros dwarfing thousands of small areas. Mean (102,232) sits far above the median, confirming the heavy tail. Treatment: log-transform before any modelling or distance-based analysis. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,141
min: 47
max: 9.867e+06
mean: 1.022e+05
median: 25,328
std: 3.269e+05
q1: 1.061e+04
q3: 65,190
iqr: 5.458e+04
skew: 13.38
kurtosis: 298.7
n_outliers: 453
outlier_rate: 0.1406
zero_rate: 0

uninsured_pop

numeric feature high_skew outliers

Likely a county- or tract-level count of uninsured residents, with 3222 rows and 584 unique values. The distribution is extremely right-skewed (skew 17.8, kurtosis 462.9): median is 36 while the max hits 20915 and the mean is 159.9, and 17.2% of rows are zero. About 11.4% of values (368) flag as outliers, consistent with a few very populous areas dominating. Treatment: Log1p-transform before modelling to tame the heavy tail and zero inflation. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 584
min: 0
max: 20,915
mean: 159.9
median: 36
std: 627.2
q1: 7
q3: 120
iqr: 113
skew: 17.81
kurtosis: 462.9
n_outliers: 368
outlier_rate: 0.1142
zero_rate: 0.1723

uninsured_rate

numeric feature high_skew outliers

Likely a per-record uninsured rate, expressed as a fraction (median 0.12, q3 0.25) but with a long tail reaching 3.7, which is implausible for a true rate and suggests mixed units or data entry errors. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. No nulls across 3222 rows and only 152 unique values, hinting at rounded or binned source data. Treatment: Validate units and cap or winsorize the >1.0 tail before log-transforming for modelling. medium · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 152
min: 0
max: 3.7
mean: 0.2002
median: 0.12
std: 0.2829
q1: 0.04
q3: 0.25
iqr: 0.21
skew: 4.095
kurtosis: 27.7
n_outliers: 230
outlier_rate: 0.07138
zero_rate: 0.1754