data-trove-healthcare-deserts

Overview

Source: /home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv

Saturn profiled 3,222 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv",
    "--findings", "data-trove-healthcare-deserts.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.

citing: row_count · column_count · total_pop.stats.median · total_pop.stats.max · uninsured_pop.stats.median · uninsured_pop.stats.max · uninsured_rate.stats.max · uninsured_rate.stats.median · hospital_closure_risk_score.n_unique · hospital_closure_risk_score.stats.zero_rate · risk_category.top_value · risk_category.top_rate · rural_category.top_value · rural_category.top_rate · poverty_rate.stats.median · poverty_rate.stats.max

Out[4]:

saturn.schema() · 10 columns

column	kind	n	null%	unique	alerts
fips	numeric	3,222	0.0%	3,222
county_name	text	3,222	0.0%	3,222	near_unique
total_pop	numeric	3,222	0.0%	3,141	high_skew outliers
uninsured_pop	numeric	3,222	0.0%	584	high_skew outliers
uninsured_rate	numeric	3,222	0.0%	152	high_skew outliers
poverty_rate	numeric	3,222	0.0%	1,719	high_skew
rural	categorical	3,222	0.0%	2
rural_category	categorical	3,222	0.0%	2
hospital_closure_risk_score	numeric	3,222	0.0%	3
risk_category	categorical	3,222	0.0%	2

Fig 1.

uninsured_rate · Look for the heavy right tail — most counties cluster near low uninsured rates, but extreme outliers signal counties with severe coverage gaps.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

Fig 2.

risk_category · Notice how overwhelmingly 'Low' dominates — only about 16% of counties carry Moderate hospital closure risk.

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

Fig 3.

rural_category · Nearly 7 in 10 counties are Rural, which sets important context for interpreting healthcare access shortfalls across the dataset.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

Fig 4.

poverty_rate · The distribution is right-skewed with a median near 13.6% — watch for the tail of counties exceeding 40–66% poverty rates.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

Fig 5.

hospital_closure_risk_score · With only 3 unique values (0, 25, 50), this chart reveals that nearly 29% of counties score zero, suggesting a coarse or binary underlying measure.

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	numeric	0.0%
county_name	text	0.0%
total_pop	numeric	0.0%
uninsured_pop	numeric	0.0%
uninsured_rate	numeric	0.0%
poverty_rate	numeric	0.0%
rural	categorical	0.0%
rural_category	categorical	0.0%
hospital_closure_risk_score	numeric	0.0%
risk_category	categorical	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
	fips	total_pop	uninsured_pop	uninsured_rate	poverty_rate	hospital_closure_risk_score
fips	+1.00	-0.07	-0.02	+0.01	+0.16	+0.01
total_pop	-0.07	+1.00	+0.81	-0.05	-0.11	-0.31
uninsured_pop	-0.02	+0.81	+1.00	+0.12	-0.09	-0.27
uninsured_rate	+0.01	-0.05	+0.12	+1.00	-0.04	+0.05
poverty_rate	+0.16	-0.11	-0.09	-0.04	+1.00	+0.58
hospital_closure_risk_score	+0.01	-0.31	-0.27	+0.05	+0.58	+1.00

fips numeric identifier

This column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes.

Treatment: Cast to string/categorical and use as a geographic join key; do not use as a numeric feature.

anthropic:default · confidence high

Out[13]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
min	1,001
max	72,153
mean	3.138e+04
median	30,022
std	1.63e+04
q1	1.903e+04
q3	4.61e+04
iqr	27,075
skew	0.1574
kurtosis	-0.6314
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 8.

Distribution of fips. Vertical dash marks the median.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

county_name text label

This column contains fully-qualified county name strings, almost certainly formatted as ' County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset.

Treatment: Parse into county and state components for join or groupby operations; do not treat as a free-text feature.

anthropic:default · confidence high

Out[16]:

saturn.columns["county_name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 9.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

total_pop numeric feature

This column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model.

Treatment: Log-transform (log1p) before modelling to compress the extreme right tail.

anthropic:default · confidence high

Out[19]:

saturn.columns["total_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,141
min	47
max	9.867e+06
mean	1.022e+05
median	25,328
std	3.269e+05
q1	1.061e+04
q3	65,190
iqr	5.458e+04
skew	13.38
kurtosis	298.7
n_outliers	453
outlier_rate	0.1406
zero_rate	0
alert: high_skew	skew=+13.38
alert: outliers	14.1% rows beyond 1.5 IQR

Fig 10.

Distribution of total_pop. Vertical dash marks the median.

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

uninsured_pop numeric feature

This column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations.

Treatment: Log-transform (e.g., log1p) before regression or clustering to reduce skew; consider normalizing by total population to produce an uninsured rate.

anthropic:default · confidence high

Out[22]:

saturn.columns["uninsured_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	584
min	0
max	20,915
mean	159.9
median	36
std	627.2
q1	7
q3	120
iqr	113
skew	17.81
kurtosis	462.9
n_outliers	368
outlier_rate	0.1142
zero_rate	0.1723
alert: high_skew	skew=+17.81
alert: outliers	11.4% rows beyond 1.5 IQR

Fig 11.

Distribution of uninsured_pop. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

uninsured_rate numeric feature

This column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units.

Treatment: Investigate values > 1.0 for scale inconsistency, recode zeros if they represent missingness, then log-transform or apply a bounded transformation before modelling.

anthropic:default · confidence medium

Out[25]:

saturn.columns["uninsured_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	152
min	0
max	3.7
mean	0.2002
median	0.12
std	0.2829
q1	0.04
q3	0.25
iqr	0.21
skew	4.095
kurtosis	27.7
n_outliers	230
outlier_rate	0.07138
zero_rate	0.1754
alert: high_skew	skew=+4.10
alert: outliers	7.1% rows beyond 1.5 IQR

Fig 12.

Distribution of uninsured_rate. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

poverty_rate numeric feature

This column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model.

Treatment: Log-transform or apply a Box-Cox transformation before regression to reduce skew; inspect the 137 outliers above the upper fence for data quality or genuine extreme cases.

anthropic:default · confidence high

Out[28]:

saturn.columns["poverty_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,719
min	1.6
max	66.32
mean	15.1
median	13.55
std	7.706
q1	10.16
q3	17.91
iqr	7.75
skew	2.096
kurtosis	6.891
n_outliers	137
outlier_rate	0.04252
zero_rate	0
alert: high_skew	skew=+2.10

Fig 13.

Distribution of poverty_rate. Vertical dash marks the median.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

rural categorical feature

This column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis.

Treatment: Cast to boolean, then use as a binary feature or stratification variable; monitor class imbalance (~2:1 rural vs. non-rural) during modelling.

anthropic:default · confidence high

Out[31]:

saturn.columns["rural"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	True
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 14.

Top values for rural.

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

rural_category categorical label

This column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting.

Treatment: One-hot or binary encode; consider stratified sampling or class weighting to address 69/31 imbalance.

anthropic:default · confidence high

Out[34]:

saturn.columns["rural_category"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	Rural
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 15.

Top values for rural_category.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

hospital_closure_risk_score numeric feature

This column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score.

Treatment: Treat as ordinal categorical with three levels (0/25/50); one-hot encode or ordinal-encode before modelling rather than using raw numeric values.

anthropic:default · confidence high

Out[37]:

saturn.columns["hospital_closure_risk_score"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3
min	0
max	50
mean	21.69
median	25
std	16.34
q1	0
q3	25
iqr	25
skew	0.1414
kurtosis	-0.6949
n_outliers	0
outlier_rate	0
zero_rate	0.2883

Fig 16.

Distribution of hospital_closure_risk_score. Vertical dash marks the median.

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

risk_category categorical label

This column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced.

Treatment: Encode as binary (0/1) and apply class-imbalance handling (e.g., SMOTE or class weights) before modelling.

anthropic:default · confidence high

Out[40]:

saturn.columns["risk_category"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	Low
top_rate	0.8439
cardinality	2
entropy	0.6249
entropy_ratio	0.6249

Fig 17.

Top values for risk_category.

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

data trove healthcare deserts

Overview

Summary confidence: high

fips numeric identifier

county_name text label

total_pop numeric feature

uninsured_pop numeric feature

uninsured_rate numeric feature

poverty_rate numeric feature

rural categorical feature

rural_category categorical label

hospital_closure_risk_score numeric feature

risk_category categorical label

How to cite