saturn·

data trove healthcare deserts

saturn notebook · generated 2026-06-22 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv

Saturn profiled 3,222 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv",
    "--findings", "data-trove-healthcare-deserts.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.

citing: row_count · column_count · total_pop.stats.median · total_pop.stats.max · uninsured_pop.stats.median · uninsured_pop.stats.max · uninsured_rate.stats.max · uninsured_rate.stats.median · hospital_closure_risk_score.n_unique · hospital_closure_risk_score.stats.zero_rate · risk_category.top_value · risk_category.top_rate · rural_category.top_value · rural_category.top_rate · poverty_rate.stats.median · poverty_rate.stats.max

Out[4]:

saturn.schema() · 10 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 3,222 near_unique
total_pop numeric 3,222 0.0% 3,141 high_skew outliers
uninsured_pop numeric 3,222 0.0% 584 high_skew outliers
uninsured_rate numeric 3,222 0.0% 152 high_skew outliers
poverty_rate numeric 3,222 0.0% 1,719 high_skew
rural categorical 3,222 0.0% 2
rural_category categorical 3,222 0.0% 2
hospital_closure_risk_score numeric 3,222 0.0% 3
risk_category categorical 3,222 0.0% 2
Fig 1.
uninsured_rate · Look for the heavy right tail — most counties cluster near low uninsured rates, but extreme outliers signal counties with severe coverage gaps.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71
Fig 2.
risk_category · Notice how overwhelmingly 'Low' dominates — only about 16% of counties carry Moderate hospital closure risk.
Show data table
Top values for risk_category (2 unique shown, of 2 total).
valuecountshare
Low271984.4%
Moderate50315.6%
Fig 3.
rural_category · Nearly 7 in 10 counties are Rural, which sets important context for interpreting healthcare access shortfalls across the dataset.
Show data table
Top values for rural_category (2 unique shown, of 2 total).
valuecountshare
Rural221268.7%
Urban/Suburban101031.3%
Fig 4.
poverty_rate · The distribution is right-skewed with a median near 13.6% — watch for the tail of counties exceeding 40–66% poverty rates.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321
Fig 5.
hospital_closure_risk_score · With only 3 unique values (0, 25, 50), this chart reveals that nearly 29% of counties score zero, suggesting a coarse or binary underlying measure.
Show data table
Histogram bins for hospital_closure_risk_score (median: 25.0).
bincount
0 – 1.25929
1.25 – 2.50
2.5 – 3.750
3.75 – 50
5 – 6.250
6.25 – 7.50
7.5 – 8.750
8.75 – 100
10 – 11.250
11.25 – 12.50
12.5 – 13.750
13.75 – 150
15 – 16.250
16.25 – 17.50
17.5 – 18.750
18.75 – 200
20 – 21.250
21.25 – 22.50
22.5 – 23.750
23.75 – 250
25 – 26.251790
26.25 – 27.50
27.5 – 28.750
28.75 – 300
30 – 31.250
31.25 – 32.50
32.5 – 33.750
33.75 – 350
35 – 36.250
36.25 – 37.50
37.5 – 38.750
38.75 – 400
40 – 41.250
41.25 – 42.50
42.5 – 43.750
43.75 – 450
45 – 46.250
46.25 – 47.50
47.5 – 48.750
48.75 – 50503
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
total_popnumeric0.0%
uninsured_popnumeric0.0%
uninsured_ratenumeric0.0%
poverty_ratenumeric0.0%
ruralcategorical0.0%
rural_categorycategorical0.0%
hospital_closure_risk_scorenumeric0.0%
risk_categorycategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
fipstotal_popuninsured_popuninsured_ratepoverty_ratehospital_closure_risk_score
fips+1.00-0.07-0.02+0.01+0.16+0.01
total_pop-0.07+1.00+0.81-0.05-0.11-0.31
uninsured_pop-0.02+0.81+1.00+0.12-0.09-0.27
uninsured_rate+0.01-0.05+0.12+1.00-0.04+0.05
poverty_rate+0.16-0.11-0.09-0.04+1.00+0.58
hospital_closure_risk_score+0.01-0.31-0.27+0.05+0.58+1.00

fips numeric identifier

This column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes.

Treatment: Cast to string/categorical and use as a geographic join key; do not use as a numeric feature.

anthropic:default · confidence high
Out[13]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text label

This column contains fully-qualified county name strings, almost certainly formatted as ' County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset.

Treatment: Parse into county and state components for join or groupby operations; do not treat as a free-text feature.

anthropic:default · confidence high
Out[16]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 9.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

total_pop numeric feature

This column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model.

Treatment: Log-transform (log1p) before modelling to compress the extreme right tail.

anthropic:default · confidence high
Out[19]:

saturn.columns["total_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,141
min 47
max 9.867e+06
mean 1.022e+05
median 25,328
std 3.269e+05
q1 1.061e+04
q3 65,190
iqr 5.458e+04
skew 13.38
kurtosis 298.7
n_outliers 453
outlier_rate 0.1406
zero_rate 0
alert: high_skewskew=+13.38
alert: outliers14.1% rows beyond 1.5 IQR
Fig 10.
Distribution of total_pop. Vertical dash marks the median.
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061

uninsured_pop numeric feature

This column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations.

Treatment: Log-transform (e.g., log1p) before regression or clustering to reduce skew; consider normalizing by total population to produce an uninsured rate.

anthropic:default · confidence high
Out[22]:

saturn.columns["uninsured_pop"].stats

statvalue
n3,222
nulls0 (0.0%)
unique584
min 0
max 20,915
mean 159.9
median 36
std 627.2
q1 7
q3 120
iqr 113
skew 17.81
kurtosis 462.9
n_outliers 368
outlier_rate 0.1142
zero_rate 0.1723
alert: high_skewskew=+17.81
alert: outliers11.4% rows beyond 1.5 IQR
Fig 11.
Distribution of uninsured_pop. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041

uninsured_rate numeric feature

This column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units.

Treatment: Investigate values > 1.0 for scale inconsistency, recode zeros if they represent missingness, then log-transform or apply a bounded transformation before modelling.

anthropic:default · confidence medium
Out[25]:

saturn.columns["uninsured_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique152
min 0
max 3.7
mean 0.2002
median 0.12
std 0.2829
q1 0.04
q3 0.25
iqr 0.21
skew 4.095
kurtosis 27.7
n_outliers 230
outlier_rate 0.07138
zero_rate 0.1754
alert: high_skewskew=+4.10
alert: outliers7.1% rows beyond 1.5 IQR
Fig 12.
Distribution of uninsured_rate. Vertical dash marks the median.
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71

poverty_rate numeric feature

This column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model.

Treatment: Log-transform or apply a Box-Cox transformation before regression to reduce skew; inspect the 137 outliers above the upper fence for data quality or genuine extreme cases.

anthropic:default · confidence high
Out[28]:

saturn.columns["poverty_rate"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,719
min 1.6
max 66.32
mean 15.1
median 13.55
std 7.706
q1 10.16
q3 17.91
iqr 7.75
skew 2.096
kurtosis 6.891
n_outliers 137
outlier_rate 0.04252
zero_rate 0
alert: high_skewskew=+2.10
Fig 13.
Distribution of poverty_rate. Vertical dash marks the median.
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321

rural categorical feature

This column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis.

Treatment: Cast to boolean, then use as a binary feature or stratification variable; monitor class imbalance (~2:1 rural vs. non-rural) during modelling.

anthropic:default · confidence high
Out[31]:

saturn.columns["rural"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2
top_value True
top_rate 0.6865
cardinality 2
entropy 0.8971
entropy_ratio 0.8971
Fig 14.
Top values for rural.
Show data table
Top values for rural (2 unique shown, of 2 total).
valuecountshare
True221268.7%
False101031.3%

rural_category categorical label

This column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting.

Treatment: One-hot or binary encode; consider stratified sampling or class weighting to address 69/31 imbalance.

anthropic:default · confidence high
Out[34]:

saturn.columns["rural_category"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2
top_value Rural
top_rate 0.6865
cardinality 2
entropy 0.8971
entropy_ratio 0.8971
Fig 15.
Top values for rural_category.
Show data table
Top values for rural_category (2 unique shown, of 2 total).
valuecountshare
Rural221268.7%
Urban/Suburban101031.3%

hospital_closure_risk_score numeric feature

This column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score.

Treatment: Treat as ordinal categorical with three levels (0/25/50); one-hot encode or ordinal-encode before modelling rather than using raw numeric values.

anthropic:default · confidence high
Out[37]:

saturn.columns["hospital_closure_risk_score"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3
min 0
max 50
mean 21.69
median 25
std 16.34
q1 0
q3 25
iqr 25
skew 0.1414
kurtosis -0.6949
n_outliers 0
outlier_rate 0
zero_rate 0.2883
Fig 16.
Distribution of hospital_closure_risk_score. Vertical dash marks the median.
Show data table
Histogram bins for hospital_closure_risk_score (median: 25.0).
bincount
0 – 1.25929
1.25 – 2.50
2.5 – 3.750
3.75 – 50
5 – 6.250
6.25 – 7.50
7.5 – 8.750
8.75 – 100
10 – 11.250
11.25 – 12.50
12.5 – 13.750
13.75 – 150
15 – 16.250
16.25 – 17.50
17.5 – 18.750
18.75 – 200
20 – 21.250
21.25 – 22.50
22.5 – 23.750
23.75 – 250
25 – 26.251790
26.25 – 27.50
27.5 – 28.750
28.75 – 300
30 – 31.250
31.25 – 32.50
32.5 – 33.750
33.75 – 350
35 – 36.250
36.25 – 37.50
37.5 – 38.750
38.75 – 400
40 – 41.250
41.25 – 42.50
42.5 – 43.750
43.75 – 450
45 – 46.250
46.25 – 47.50
47.5 – 48.750
48.75 – 50503

risk_category categorical label

This column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced.

Treatment: Encode as binary (0/1) and apply class-imbalance handling (e.g., SMOTE or class weights) before modelling.

anthropic:default · confidence high
Out[40]:

saturn.columns["risk_category"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2
top_value Low
top_rate 0.8439
cardinality 2
entropy 0.6249
entropy_ratio 0.6249
Fig 17.
Top values for risk_category.
Show data table
Top values for risk_category (2 unique shown, of 2 total).
valuecountshare
Low271984.4%
Moderate50315.6%

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-healthcare-deserts-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove healthcare deserts},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-healthcare-deserts}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove healthcare deserts. Source: /home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-healthcare-deserts