saturn·

county health rankings

source /home/coolhand/html/datavis/data_trove/cache/county_health_rankings.parquet 3,222 rows 5 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows of US county-level health data, with each row identified by a unique county name and FIPS code, plus three numeric measures: total population, uninsured population, and uninsured rate. The population fields are extremely right-skewed — total_pop ranges from 47 to nearly 9.87 million with a median of 25,328, and uninsured_pop shows similar skew (median 36, max 20,915), so a few large counties dominate. The uninsured_rate is the most analytically interesting field: it has a median of 0.12 but stretches up to 3.7, with about 17% of counties reporting zero, suggesting either small/edge cases or data quality issues worth investigating. Start by examining the distribution of uninsured_rate and how it relates to total_pop.

citing: row_count · columns.total_pop.stats · columns.uninsured_pop.stats · columns.uninsured_rate.stats · columns.county_name.top_words

Schema

5 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
total_pop numeric 0.0% 3,141
high_skew outliers
uninsured_pop numeric 0.0% 584
high_skew outliers
uninsured_rate numeric 0.0% 152
high_skew outliers

fips

numeric identifier
This is the U.S. county FIPS code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county encoding. The distribution is near-symmetric (skew 0.16) with no statistical outliers, consistent with an identifier rather than a measured quantity. Treatment: Treat as a categorical key; left-join on this to bring in county-level attributes rather than feeding it into a model as numeric. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column lists US county names paired with their state (e.g., 'County, Texas'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana or 'Borough/Census Area' in Alaska). State frequencies match expectations, with Texas (256) leading — consistent with Texas having the most counties nationally. Treatment: Split into county and state fields, then left-join on this key to geographic reference tables. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_pop

numeric feature high_skew outliers
This is a population count, almost certainly per geographic unit (likely US counties given n=3222), with values from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with 453 outliers (14.06%), reflecting a few massive metros dwarfing thousands of small areas. Mean (102,232) sits far above the median, confirming the heavy tail. Treatment: log-transform before any modelling or distance-based analysis. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,141
min
47
max
9.867e+06
mean
1.022e+05
median
25,328
std
3.269e+05
q1
1.061e+04
q3
65,190
iqr
5.458e+04
skew
13.38
kurtosis
298.7
n_outliers
453
outlier_rate
0.1406
zero_rate
0

uninsured_pop

numeric feature high_skew outliers
Likely a county- or tract-level count of uninsured residents, with 3222 rows and 584 unique values. The distribution is extremely right-skewed (skew 17.8, kurtosis 462.9): median is 36 while the max hits 20915 and the mean is 159.9, and 17.2% of rows are zero. About 11.4% of values (368) flag as outliers, consistent with a few very populous areas dominating. Treatment: Log1p-transform before modelling to tame the heavy tail and zero inflation. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
584
min
0
max
20,915
mean
159.9
median
36
std
627.2
q1
7
q3
120
iqr
113
skew
17.81
kurtosis
462.9
n_outliers
368
outlier_rate
0.1142
zero_rate
0.1723

uninsured_rate

numeric feature high_skew outliers
Likely a per-record uninsured rate, expressed as a fraction (median 0.12, q3 0.25) but with a long tail reaching 3.7, which is implausible for a true rate and suggests mixed units or data entry errors. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros. No nulls across 3222 rows and only 152 unique values, hinting at rounded or binned source data. Treatment: Validate units and cap or winsorize the >1.0 tail before log-transforming for modelling. medium · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
152
min
0
max
3.7
mean
0.2002
median
0.12
std
0.2829
q1
0.04
q3
0.25
iqr
0.21
skew
4.095
kurtosis
27.7
n_outliers
230
outlier_rate
0.07138
zero_rate
0.1754