saturn

/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv 3,222 rows sample n=3,222 seed 42 2026-06-22T00:10:44+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv
Total rows	3,222
Profiled sample	3,222
Columns	10
Generated	2026-06-22T00:10:44+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	numeric	0.0%
county_name	text	0.0%
total_pop	numeric	0.0%
uninsured_pop	numeric	0.0%
uninsured_rate	numeric	0.0%
poverty_rate	numeric	0.0%
rural	categorical	0.0%
rural_category	categorical	0.0%
hospital_closure_risk_score	numeric	0.0%
risk_category	categorical	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.

total_pop high anthropic:default

This column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model.

uninsured_pop high anthropic:default

This column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations.

uninsured_rate medium anthropic:default

This column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units.

county_name high anthropic:default

This column contains fully-qualified county name strings, almost certainly formatted as ' County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset.

poverty_rate high anthropic:default

This column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model.

fips high anthropic:default

This column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes.

hospital_closure_risk_score high anthropic:default

This column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score.

risk_category high anthropic:default

This column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced.

rural high anthropic:default

This column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis.

rural_category high anthropic:default

This column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting.

Numeric correlation

Show data table

Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
	fips	total_pop	uninsured_pop	uninsured_rate	poverty_rate	hospital_closure_risk_score
fips	+1.00	-0.07	-0.02	+0.01	+0.16	+0.01
total_pop	-0.07	+1.00	+0.81	-0.05	-0.11	-0.31
uninsured_pop	-0.02	+0.81	+1.00	+0.12	-0.09	-0.27
uninsured_rate	+0.01	-0.05	+0.12	+1.00	-0.04	+0.05
poverty_rate	+0.16	-0.11	-0.09	-0.04	+1.00	+0.58
hospital_closure_risk_score	+0.01	-0.31	-0.27	+0.05	+0.58	+1.00

fips numeric

rows3,222

null0 (0.0%)

unique3,222

min1,001

max72,153

mean31,378

median30,022

std16,300

q119,030

q346,104

iqr27,075

skew0.157

kurtosis-0.631

n_outliers0

outlier_rate0.000

zero_rate0.000

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

county_name text

100.0% of rows are unique strings

rows3,222

null0 (0.0%)

unique3,222

len_min16

len_max59

len_mean24.324

len_median24.000

len_p9531.000

word_mean3.248

word_median3.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size1,990

readability_flesch_mean10.284

emoji_rate0.000

url_rate0.000

one_word_rate0.000

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Sample values (first 10)

Bibb County, Alabama
Cheatham County, Tennessee
Piute County, Utah
Lamb County, Texas
Martin County, Minnesota
Sheridan County, Wyoming
Chickasaw County, Mississippi
Rockingham County, Virginia
Liberty County, Texas
Clark County, Arkansas

total_pop numeric

skew=+13.38 14.1% rows beyond 1.5 IQR

rows3,222

null0 (0.0%)

unique3,141

min47.000

max9,866,623

mean102,232

median25,328

std326,934

q110,611

q365,190

iqr54,579

skew13.377

kurtosis298.689

n_outliers453

outlier_rate0.141

zero_rate0.000

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

uninsured_pop numeric

skew=+17.81 11.4% rows beyond 1.5 IQR

rows3,222

null0 (0.0%)

unique584

min0.000

max20,915

mean159.945

median36.000

std627.163

q17.000

q3120.000

iqr113.000

skew17.811

kurtosis462.866

n_outliers368

outlier_rate0.114

zero_rate0.172

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

uninsured_rate numeric

skew=+4.10 7.1% rows beyond 1.5 IQR

rows3,222

null0 (0.0%)

unique152

min0.000

max3.700

mean0.200

median0.120

std0.283

q10.040

q30.250

iqr0.210

skew4.095

kurtosis27.703

n_outliers230

outlier_rate0.071

zero_rate0.175

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

poverty_rate numeric

skew=+2.10

rows3,222

null0 (0.0%)

unique1,719

min1.600

max66.320

mean15.100

median13.550

std7.706

q110.160

q317.910

iqr7.750

skew2.096

kurtosis6.891

n_outliers137

outlier_rate0.043

zero_rate0.000

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

rural categorical

rows3,222

null0 (0.0%)

unique2

top_valueTrue

top_rate0.687

cardinality2

entropy0.897

entropy_ratio0.897

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

Top values (rank 1–20)

True — 2,212
False — 1,010

rural_category categorical

rows3,222

null0 (0.0%)

unique2

top_valueRural

top_rate0.687

cardinality2

entropy0.897

entropy_ratio0.897

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

Top values (rank 1–20)

Rural — 2,212
Urban/Suburban — 1,010

hospital_closure_risk_score numeric

rows3,222

null0 (0.0%)

unique3

min0.000

max50.000

mean21.695

median25.000

std16.338

q10.000

q325.000

iqr25.000

skew0.141

kurtosis-0.695

n_outliers0

outlier_rate0.000

zero_rate0.288

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

risk_category categorical

rows3,222

null0 (0.0%)

unique2

top_valueLow

top_rate0.844

cardinality2

entropy0.625

entropy_ratio0.625

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

Top values (rank 1–20)

Low — 2,719
Moderate — 503