healthcare-healthcare_desert_merged

Overview

Source: /home/coolhand/datasets/us-inequality-atlas/healthcare/healthcare_desert_merged.csv

Saturn profiled 3,222 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-inequality-atlas/healthcare/healthcare_desert_merged.csv",
    "--findings", "healthcare-healthcare_desert_merged.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset profiles 3,222 U.S. counties (one row per county, keyed by FIPS) with population, uninsured counts and rates, poverty rate, a hospital closure risk score, and rural/urban flags. Population and uninsured figures are extremely right-skewed (total_pop skew 13.4, uninsured_pop skew 17.8), so a handful of large counties will dominate any raw totals — analysis should likely use rates or log scales. The hospital_closure_risk_score collapses to just 3 distinct values (with ~29% scoring 0), and risk_category is heavily imbalanced with 84% of counties labeled 'Low' and the rest 'Moderate', which is worth examining first. About 69% of counties are flagged Rural, so rural/urban comparisons of uninsured and poverty rates should be a productive next cut.

citing: total_pop · uninsured_pop · uninsured_rate · poverty_rate · hospital_closure_risk_score · risk_category · rural_category

Out[4]:

saturn.schema() · 10 columns

column	kind	n	null%	unique	alerts
fips	numeric	3,222	0.0%	3,222
county_name	text	3,222	0.0%	3,222	near_unique
total_pop	numeric	3,222	0.0%	3,141	high_skew outliers
uninsured_pop	numeric	3,222	0.0%	584	high_skew outliers
uninsured_rate	numeric	3,222	0.0%	152	high_skew outliers
poverty_rate	numeric	3,222	0.0%	1,719	high_skew
rural	categorical	3,222	0.0%	2
rural_category	categorical	3,222	0.0%	2
hospital_closure_risk_score	numeric	3,222	0.0%	3
risk_category	categorical	3,222	0.0%	2

Fig 1.

risk_category · Shows the strong class imbalance — about 84% of counties fall in the Low risk bucket.

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

Fig 2.

hospital_closure_risk_score · Only three distinct score values appear, so a bar of value counts reveals the underlying scoring buckets.

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

Fig 3.

rural_category · Roughly two-thirds of counties are Rural, framing how to segment every other metric.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

Fig 4.

uninsured_rate · Right-skewed distribution of county uninsured rates; watch the long tail above the 0.25 third quartile.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

Fig 5.

poverty_rate · Poverty rate spreads from 1.6 to 66.3 with a median of 13.55 — useful for spotting high-poverty outlier counties.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	numeric	0.0%
county_name	text	0.0%
total_pop	numeric	0.0%
uninsured_pop	numeric	0.0%
uninsured_rate	numeric	0.0%
poverty_rate	numeric	0.0%
rural	categorical	0.0%
rural_category	categorical	0.0%
hospital_closure_risk_score	numeric	0.0%
risk_category	categorical	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
	fips	total_pop	uninsured_pop	uninsured_rate	poverty_rate	hospital_closure_risk_score
fips	+1.00	-0.07	-0.02	+0.01	+0.16	+0.01
total_pop	-0.07	+1.00	+0.81	-0.05	-0.11	-0.31
uninsured_pop	-0.02	+0.81	+1.00	+0.12	-0.09	-0.27
uninsured_rate	+0.01	-0.05	+0.12	+1.00	-0.04	+0.05
poverty_rate	+0.16	-0.11	-0.09	-0.04	+1.00	+0.58
hospital_closure_risk_score	+0.01	-0.31	-0.27	+0.05	+0.58	+1.00

fips numeric identifier

This is the FIPS county code: 3222 rows with 3222 unique values, no nulls, and a min of 1001 / max of 72153 consistent with the U.S. county FIPS scheme (state prefix * 1000 + county). Distribution is near-uniform across that range (skew 0.16, kurtosis -0.63, no outliers), confirming it indexes geography rather than measuring anything. Treat it as a categorical key, not a quantity, despite the numeric dtype.

Treatment: Cast to zero-padded string and left-join on this county FIPS code; do not use as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
min	1,001
max	72,153
mean	3.138e+04
median	30,022
std	1.63e+04
q1	1.903e+04
q3	4.61e+04
iqr	27,075
skew	0.1574
kurtosis	-0.6314
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 8.

Distribution of fips. Vertical dash marks the median.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

county_name text identifier

This column holds fully-qualified US county names (e.g. 'X County, State'), with all 3222 values unique and no nulls. The token 'county,' appears 2999 times, confirming a 'County, ' format, while the remaining ~223 rows likely use alternate suffixes like Parish or Borough. Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with national county counts.

Treatment: Use as a join key after splitting into county and state components.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["county_name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 9.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

total_pop numeric feature

This is almost certainly a population count per geographic unit (likely US counties given n=3222), with values ranging from 47 to 9,866,623 and a median of 25,328. The distribution is severely right-skewed (skew 13.38, kurtosis 298.69) with the mean (102,232) nearly four times the median and 453 outliers (14.06%) — the standard deviation of 326,934 dwarfs the IQR of 54,579. No nulls or zeros, and 3,141 of 3,222 values are unique.

Treatment: Log-transform before any modelling or distance-based analysis to tame the extreme right skew.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["total_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,141
min	47
max	9.867e+06
mean	1.022e+05
median	25,328
std	3.269e+05
q1	1.061e+04
q3	65,190
iqr	5.458e+04
skew	13.38
kurtosis	298.7
n_outliers	453
outlier_rate	0.1406
zero_rate	0
alert: high_skew	skew=+13.38
alert: outliers	14.1% rows beyond 1.5 IQR

Fig 10.

Distribution of total_pop. Vertical dash marks the median.

Show data table

Histogram bins for total_pop (median: 25328.0).
bin	count
47 – 2.467e+05	2942
2.467e+05 – 4.934e+05	137
4.934e+05 – 7.4e+05	56
7.4e+05 – 9.867e+05	39
9.867e+05 – 1.233e+06	13
1.233e+06 – 1.48e+06	9
1.48e+06 – 1.727e+06	7
1.727e+06 – 1.973e+06	3
1.973e+06 – 2.22e+06	3
2.22e+06 – 2.467e+06	4
2.467e+06 – 2.713e+06	3
2.713e+06 – 2.96e+06	0
2.96e+06 – 3.207e+06	2
3.207e+06 – 3.453e+06	0
3.453e+06 – 3.7e+06	0
3.7e+06 – 3.947e+06	0
3.947e+06 – 4.193e+06	0
4.193e+06 – 4.44e+06	1
4.44e+06 – 4.687e+06	0
4.687e+06 – 4.933e+06	1
4.933e+06 – 5.18e+06	0
5.18e+06 – 5.427e+06	1
5.427e+06 – 5.673e+06	0
5.673e+06 – 5.92e+06	0
5.92e+06 – 6.167e+06	0
6.167e+06 – 6.413e+06	0
6.413e+06 – 6.66e+06	0
6.66e+06 – 6.907e+06	0
6.907e+06 – 7.153e+06	0
7.153e+06 – 7.4e+06	0
7.4e+06 – 7.647e+06	0
7.647e+06 – 7.893e+06	0
7.893e+06 – 8.14e+06	0
8.14e+06 – 8.387e+06	0
8.387e+06 – 8.633e+06	0
8.633e+06 – 8.88e+06	0
8.88e+06 – 9.127e+06	0
9.127e+06 – 9.373e+06	0
9.373e+06 – 9.62e+06	0
9.62e+06 – 9.867e+06	1

uninsured_pop numeric feature

Counts of uninsured residents per record, with values ranging from 0 to 20,915 across 3,222 rows and no nulls. The distribution is severely right-skewed (skew 17.81, kurtosis 462.87): the median is 36 while the mean is 159.95, and 17.2% of rows are zero. 368 outliers (11.4%) sit far above the Q3 of 120, consistent with a few very large populations dominating the tail.

Treatment: Apply a log1p transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["uninsured_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	584
min	0
max	20,915
mean	159.9
median	36
std	627.2
q1	7
q3	120
iqr	113
skew	17.81
kurtosis	462.9
n_outliers	368
outlier_rate	0.1142
zero_rate	0.1723
alert: high_skew	skew=+17.81
alert: outliers	11.4% rows beyond 1.5 IQR

Fig 11.

Distribution of uninsured_pop. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_pop (median: 36.0).
bin	count
0 – 522.9	3022
522.9 – 1046	124
1046 – 1569	32
1569 – 2092	16
2092 – 2614	7
2614 – 3137	5
3137 – 3660	5
3660 – 4183	2
4183 – 4706	0
4706 – 5229	1
5229 – 5752	2
5752 – 6274	1
6274 – 6797	0
6797 – 7320	0
7320 – 7843	0
7843 – 8366	1
8366 – 8889	1
8889 – 9412	0
9412 – 9935	0
9935 – 1.046e+04	0
1.046e+04 – 1.098e+04	0
1.098e+04 – 1.15e+04	2
1.15e+04 – 1.203e+04	0
1.203e+04 – 1.255e+04	0
1.255e+04 – 1.307e+04	0
1.307e+04 – 1.359e+04	0
1.359e+04 – 1.412e+04	0
1.412e+04 – 1.464e+04	0
1.464e+04 – 1.516e+04	0
1.516e+04 – 1.569e+04	0
1.569e+04 – 1.621e+04	0
1.621e+04 – 1.673e+04	0
1.673e+04 – 1.725e+04	0
1.725e+04 – 1.778e+04	0
1.778e+04 – 1.83e+04	0
1.83e+04 – 1.882e+04	0
1.882e+04 – 1.935e+04	0
1.935e+04 – 1.987e+04	0
1.987e+04 – 2.039e+04	0
2.039e+04 – 2.092e+04	1

uninsured_rate numeric feature

This appears to be an uninsured rate per record, expressed as a proportion ranging from 0.0 to 3.7 with a median of 0.12. The maximum of 3.7 is suspicious for a rate that should cap at 1.0, and the distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with 230 outliers (7.1%) and 17.5% exact zeros.

Treatment: Investigate values >1.0 for unit errors, then log-transform or winsorize before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["uninsured_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	152
min	0
max	3.7
mean	0.2002
median	0.12
std	0.2829
q1	0.04
q3	0.25
iqr	0.21
skew	4.095
kurtosis	27.7
n_outliers	230
outlier_rate	0.07138
zero_rate	0.1754
alert: high_skew	skew=+4.10
alert: outliers	7.1% rows beyond 1.5 IQR

Fig 12.

Distribution of uninsured_rate. Vertical dash marks the median.

Show data table

Histogram bins for uninsured_rate (median: 0.12).
bin	count
0 – 0.0925	1403
0.0925 – 0.185	704
0.185 – 0.2775	403
0.2775 – 0.37	213
0.37 – 0.4625	158
0.4625 – 0.555	101
0.555 – 0.6475	65
0.6475 – 0.74	43
0.74 – 0.8325	27
0.8325 – 0.925	23
0.925 – 1.018	9
1.018 – 1.11	15
1.11 – 1.202	14
1.202 – 1.295	5
1.295 – 1.387	7
1.387 – 1.48	7
1.48 – 1.573	5
1.573 – 1.665	2
1.665 – 1.758	4
1.758 – 1.85	1
1.85 – 1.942	1
1.942 – 2.035	1
2.035 – 2.127	2
2.127 – 2.22	2
2.22 – 2.312	1
2.312 – 2.405	0
2.405 – 2.498	0
2.498 – 2.59	1
2.59 – 2.683	0
2.683 – 2.775	1
2.775 – 2.868	0
2.868 – 2.96	1
2.96 – 3.052	1
3.052 – 3.145	0
3.145 – 3.237	1
3.237 – 3.33	0
3.33 – 3.422	0
3.422 – 3.515	0
3.515 – 3.607	0
3.607 – 3.7	1

poverty_rate numeric feature

This is a numeric poverty rate (likely percentage of population in poverty) across 3222 rows with no nulls and 1719 unique values. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with a median of 13.55 and mean 15.10, ranging from 1.6 to 66.32; 137 outliers (4.25%) sit in the upper tail. The high skew alert means a long tail of high-poverty units pulls the mean above the median.

Treatment: Consider a log or sqrt transform before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["poverty_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,719
min	1.6
max	66.32
mean	15.1
median	13.55
std	7.706
q1	10.16
q3	17.91
iqr	7.75
skew	2.096
kurtosis	6.891
n_outliers	137
outlier_rate	0.04252
zero_rate	0
alert: high_skew	skew=+2.10

Fig 13.

Distribution of poverty_rate. Vertical dash marks the median.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

rural categorical feature

Binary flag indicating whether a record is rural, stored as the strings "True"/"False" rather than booleans. The split is imbalanced toward rural at 68.7% (2212 of 3222) versus 1010 non-rural, with no nulls. Entropy ratio of 0.897 confirms a meaningful but skewed distribution.

Treatment: Cast string "True"/"False" to a 0/1 boolean and use directly as a feature.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["rural"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	True
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 14.

Top values for rural.

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

rural_category categorical feature

Binary categorical flag splitting records into 'Rural' (2212, 68.7%) and 'Urban/Suburban' (1010), with no nulls across 3222 rows. The split is moderately imbalanced but entropy ratio of 0.90 indicates both classes are well represented. Clean two-level partition suitable as a stratifier or feature.

Treatment: One-hot or binary-encode for modelling; consider stratifying splits on this flag.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["rural_category"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	Rural
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 15.

Top values for rural_category.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

hospital_closure_risk_score numeric feature

Despite being typed as numeric, hospital_closure_risk_score takes only 3 distinct values across 3222 rows, spanning 0 to 50 with a median of 25 and roughly 28.8% zeros. This is effectively an ordinal risk band (likely 0/25/50) masquerading as a continuous score, so the reported mean of 21.69 and std of 16.34 reflect category mix rather than a smooth distribution.

Treatment: Treat as an ordinal categorical (low/medium/high) rather than a continuous numeric.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["hospital_closure_risk_score"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3
min	0
max	50
mean	21.69
median	25
std	16.34
q1	0
q3	25
iqr	25
skew	0.1414
kurtosis	-0.6949
n_outliers	0
outlier_rate	0
zero_rate	0.2883

Fig 16.

Distribution of hospital_closure_risk_score. Vertical dash marks the median.

Show data table

Histogram bins for hospital_closure_risk_score (median: 25.0).
bin	count
0 – 1.25	929
1.25 – 2.5	0
2.5 – 3.75	0
3.75 – 5	0
5 – 6.25	0
6.25 – 7.5	0
7.5 – 8.75	0
8.75 – 10	0
10 – 11.25	0
11.25 – 12.5	0
12.5 – 13.75	0
13.75 – 15	0
15 – 16.25	0
16.25 – 17.5	0
17.5 – 18.75	0
18.75 – 20	0
20 – 21.25	0
21.25 – 22.5	0
22.5 – 23.75	0
23.75 – 25	0
25 – 26.25	1790
26.25 – 27.5	0
27.5 – 28.75	0
28.75 – 30	0
30 – 31.25	0
31.25 – 32.5	0
32.5 – 33.75	0
33.75 – 35	0
35 – 36.25	0
36.25 – 37.5	0
37.5 – 38.75	0
38.75 – 40	0
40 – 41.25	0
41.25 – 42.5	0
42.5 – 43.75	0
43.75 – 45	0
45 – 46.25	0
46.25 – 47.5	0
47.5 – 48.75	0
48.75 – 50	503

risk_category categorical label

Binary risk classification flagging records as either Low or Moderate, with no nulls across 3,222 rows. The distribution is heavily imbalanced: 84.4% fall into Low (2,719) versus only 503 Moderate, and no High tier appears at all. Entropy ratio of 0.62 confirms the skew.

Treatment: Treat as binary target; account for class imbalance via stratified sampling or class weighting.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["risk_category"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	Low
top_rate	0.8439
cardinality	2
entropy	0.6249
entropy_ratio	0.6249

Fig 17.

Top values for risk_category.

Show data table

Top values for risk_category (2 unique shown, of 2 total).
value	count	share
Low	2719	84.4%
Moderate	503	15.6%

healthcare healthcare desert merged

Overview

Summary confidence: high

fips numeric identifier

county_name text identifier

total_pop numeric feature

uninsured_pop numeric feature

uninsured_rate numeric feature

poverty_rate numeric feature

rural categorical feature

rural_category categorical feature

hospital_closure_risk_score numeric feature

risk_category categorical label

How to cite