saturn

/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv 3,222 rows sample n=3,222 seed 42 2026-06-22T00:10:44+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/healthcare/healthcare_desert_merged.csv
Total rows3,222
Profiled sample3,222
Columns10
Generated2026-06-22T00:10:44+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
total_popnumeric0.0%
uninsured_popnumeric0.0%
uninsured_ratenumeric0.0%
poverty_ratenumeric0.0%
ruralcategorical0.0%
rural_categorycategorical0.0%
hospital_closure_risk_scorenumeric0.0%
risk_categorycategorical0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset covers healthcare access indicators for 3,222 U.S. counties, combining population size, uninsured rates, poverty rates, and hospital closure risk scores. The most striking pattern is the extreme skew in both total population and uninsured population — the median county has just 25,328 residents and 36 uninsured individuals, yet outliers push the max to nearly 10 million people and over 20,000 uninsured, meaning a small number of large counties dominate the raw counts. Two things warrant a closer look: first, 84% of counties are rated 'Low' hospital closure risk, but nearly 29% score exactly zero on the closure risk score, suggesting the scoring may be coarser than it appears (only 3 unique values exist); second, 69% of counties are classified as Rural, yet uninsured rates range from 0% to 370% of expected norms with heavy right skew, pointing to pockets of severe coverage gaps worth isolating geographically.

total_pop high anthropic:default

This column represents total population counts across geographic units (likely counties, census tracts, or similar administrative areas). The distribution is severely right-skewed (skew=13.38, kurtosis=298.69): the median is 25,328 while the mean is 102,232, indicating a long tail driven by a small number of very large population centers — the maximum of 9,866,623 is roughly 390× the median. With 453 outliers (14.1% of rows) and a standard deviation of 326,934, raw values will distort any distance- or variance-sensitive model.

uninsured_pop high anthropic:default

This column represents the count of uninsured individuals in a population unit (likely a census tract, zip code, or similar geographic subdivision). The distribution is extremely right-skewed (skew=17.81, kurtosis=462.87): the median is just 36 while the mean is 159.95 and the max reaches 20,915, indicating a small number of very large geographic units dominate the tail. Notably, 17.2% of rows have a zero value and 11.4% are flagged as outliers (368 rows), suggesting a mix of very small or fully-insured areas alongside a few densely populated uninsured concentrations.

uninsured_rate medium anthropic:default

This column represents an uninsured rate, likely a proportion or percentage of a population lacking insurance coverage (e.g., health, auto, or similar) at some geographic or demographic unit level. With only 152 unique values across 3,222 rows, the data appears discretized or rounded. The distribution is severely right-skewed (skew 4.10, kurtosis 27.70) with a median of 0.12 but a mean of 0.20 and a max of 3.7 — the max value exceeding 1.0 is surprising if this is a true rate/proportion, suggesting either a percentage-scale mix, a non-standard encoding, or genuine outliers among the 230 flagged cases (7.1% of rows). The 17.5% zero rate also warrants investigation as it may indicate missing data coded as zero or genuinely uninsured-free units.

county_name high anthropic:default

This column contains fully-qualified county name strings, almost certainly formatted as ' County, ' — evidenced by the word 'county,' appearing in 2,999 of 3,222 rows and a mean string length of ~24 characters with a mean word count of ~3.25. Every row is unique (n_unique = 3,222, duplicate_rate = 0.0), triggering the near-unique alert, which is expected for a geographic identifier combining county and state. The state distribution skews toward Texas (256), Virginia (189), and Georgia (159), suggesting those states are overrepresented in the dataset.

poverty_rate high anthropic:default

This column represents a poverty rate measure (likely percentage of population below a poverty threshold) across 3,222 records with no nulls and a reasonable 1,719 unique values. The distribution is heavily right-skewed (skew = 2.10, kurtosis = 6.89), with a median of 13.55% but a mean pulled up to 15.10% by a long upper tail reaching 66.32% — more than 4× the median. There are 137 flagged outliers (4.25% of rows), concentrated in that upper tail, which likely represent unusually deprived geographic units or communities and warrant special attention in any model.

fips high anthropic:default

This column contains US FIPS (Federal Information Processing Standard) county codes, which are 4–5 digit numeric identifiers assigned to every US county. Every one of the 3,222 rows has a unique FIPS code with no nulls, matching almost exactly the ~3,143 US counties plus territories (the max of 72153 indicates Puerto Rico territory codes are included). Despite being stored as a numeric type, FIPS codes are categorical identifiers — arithmetic on them is meaningless — and the near-uniform distribution (low skew of 0.157, kurtosis of -0.63) simply reflects the sequential geographic assignment of codes.

hospital_closure_risk_score high anthropic:default

This column purports to be a continuous risk score but contains only 3 unique values across 3,222 rows — almost certainly 0, 25, and 50, matching the min, Q1/median, and max exactly. This makes it a de-facto ordinal categorical variable (low/medium/high) despite its numeric framing. Notably, 28.8% of records are zero, and the near-symmetric distribution (skew 0.14) with no outliers further confirms a discrete tier structure rather than a true continuous score.

risk_category high anthropic:default

This column is a binary risk classification label with exactly two categories: 'Low' and 'Moderate'. The distribution is heavily skewed — 'Low' accounts for 84.4% of all 3,222 rows (2,719 records), leaving only 503 'Moderate' cases. The complete absence of higher risk tiers (e.g., 'High', 'Critical') is notable and may indicate data filtering, a low-risk population, or an incomplete taxonomy. With zero nulls and only two values, the column is clean but class-imbalanced.

rural high anthropic:default

This column is a binary flag indicating whether a record is associated with a rural location, stored as string 'True'/'False' rather than a native boolean. The dominant class is 'True' (2,212 of 3,222 rows, ~68.7%), meaning the dataset is notably skewed toward rural records — analysts should account for this class imbalance in any comparative or predictive analysis.

rural_category high anthropic:default

This column is a binary geographic classification indicating whether a record is from a Rural or Urban/Suburban setting. Across 3,222 records with no nulls, 'Rural' dominates at 68.7% (2,212 records) versus 'Urban/Suburban' at 31.3% (1,010 records), representing a meaningful class imbalance. The entropy ratio of 0.897 confirms the distribution is moderately skewed but not extreme. Analysts should be aware this imbalance could bias models trained without stratification or reweighting.

Numeric correlation

Show data table
Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
fipstotal_popuninsured_popuninsured_ratepoverty_ratehospital_closure_risk_score
fips+1.00-0.07-0.02+0.01+0.16+0.01
total_pop-0.07+1.00+0.81-0.05-0.11-0.31
uninsured_pop-0.02+0.81+1.00+0.12-0.09-0.27
uninsured_rate+0.01-0.05+0.12+1.00-0.04+0.05
poverty_rate+0.16-0.11-0.09-0.04+1.00+0.58
hospital_closure_risk_score+0.01-0.31-0.27+0.05+0.58+1.00

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

total_pop numeric

skew=+13.38 14.1% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,141
min47.000
max9,866,623
mean102,232
median25,328
std326,934
q110,611
q365,190
iqr54,579
skew13.377
kurtosis298.689
n_outliers453
outlier_rate0.141
zero_rate0.000
Show data table
Histogram bins for total_pop (median: 25328.0).
bincount
47 – 2.467e+052942
2.467e+05 – 4.934e+05137
4.934e+05 – 7.4e+0556
7.4e+05 – 9.867e+0539
9.867e+05 – 1.233e+0613
1.233e+06 – 1.48e+069
1.48e+06 – 1.727e+067
1.727e+06 – 1.973e+063
1.973e+06 – 2.22e+063
2.22e+06 – 2.467e+064
2.467e+06 – 2.713e+063
2.713e+06 – 2.96e+060
2.96e+06 – 3.207e+062
3.207e+06 – 3.453e+060
3.453e+06 – 3.7e+060
3.7e+06 – 3.947e+060
3.947e+06 – 4.193e+060
4.193e+06 – 4.44e+061
4.44e+06 – 4.687e+060
4.687e+06 – 4.933e+061
4.933e+06 – 5.18e+060
5.18e+06 – 5.427e+061
5.427e+06 – 5.673e+060
5.673e+06 – 5.92e+060
5.92e+06 – 6.167e+060
6.167e+06 – 6.413e+060
6.413e+06 – 6.66e+060
6.66e+06 – 6.907e+060
6.907e+06 – 7.153e+060
7.153e+06 – 7.4e+060
7.4e+06 – 7.647e+060
7.647e+06 – 7.893e+060
7.893e+06 – 8.14e+060
8.14e+06 – 8.387e+060
8.387e+06 – 8.633e+060
8.633e+06 – 8.88e+060
8.88e+06 – 9.127e+060
9.127e+06 – 9.373e+060
9.373e+06 – 9.62e+060
9.62e+06 – 9.867e+061

uninsured_pop numeric

skew=+17.81 11.4% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique584
min0.000
max20,915
mean159.945
median36.000
std627.163
q17.000
q3120.000
iqr113.000
skew17.811
kurtosis462.866
n_outliers368
outlier_rate0.114
zero_rate0.172
Show data table
Histogram bins for uninsured_pop (median: 36.0).
bincount
0 – 522.93022
522.9 – 1046124
1046 – 156932
1569 – 209216
2092 – 26147
2614 – 31375
3137 – 36605
3660 – 41832
4183 – 47060
4706 – 52291
5229 – 57522
5752 – 62741
6274 – 67970
6797 – 73200
7320 – 78430
7843 – 83661
8366 – 88891
8889 – 94120
9412 – 99350
9935 – 1.046e+040
1.046e+04 – 1.098e+040
1.098e+04 – 1.15e+042
1.15e+04 – 1.203e+040
1.203e+04 – 1.255e+040
1.255e+04 – 1.307e+040
1.307e+04 – 1.359e+040
1.359e+04 – 1.412e+040
1.412e+04 – 1.464e+040
1.464e+04 – 1.516e+040
1.516e+04 – 1.569e+040
1.569e+04 – 1.621e+040
1.621e+04 – 1.673e+040
1.673e+04 – 1.725e+040
1.725e+04 – 1.778e+040
1.778e+04 – 1.83e+040
1.83e+04 – 1.882e+040
1.882e+04 – 1.935e+040
1.935e+04 – 1.987e+040
1.987e+04 – 2.039e+040
2.039e+04 – 2.092e+041

uninsured_rate numeric

skew=+4.10 7.1% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique152
min0.000
max3.700
mean0.200
median0.120
std0.283
q10.040
q30.250
iqr0.210
skew4.095
kurtosis27.703
n_outliers230
outlier_rate0.071
zero_rate0.175
Show data table
Histogram bins for uninsured_rate (median: 0.12).
bincount
0 – 0.09251403
0.0925 – 0.185704
0.185 – 0.2775403
0.2775 – 0.37213
0.37 – 0.4625158
0.4625 – 0.555101
0.555 – 0.647565
0.6475 – 0.7443
0.74 – 0.832527
0.8325 – 0.92523
0.925 – 1.0189
1.018 – 1.1115
1.11 – 1.20214
1.202 – 1.2955
1.295 – 1.3877
1.387 – 1.487
1.48 – 1.5735
1.573 – 1.6652
1.665 – 1.7584
1.758 – 1.851
1.85 – 1.9421
1.942 – 2.0351
2.035 – 2.1272
2.127 – 2.222
2.22 – 2.3121
2.312 – 2.4050
2.405 – 2.4980
2.498 – 2.591
2.59 – 2.6830
2.683 – 2.7751
2.775 – 2.8680
2.868 – 2.961
2.96 – 3.0521
3.052 – 3.1450
3.145 – 3.2371
3.237 – 3.330
3.33 – 3.4220
3.422 – 3.5150
3.515 – 3.6070
3.607 – 3.71

poverty_rate numeric

skew=+2.10
rows3,222
null0 (0.0%)
unique1,719
min1.600
max66.320
mean15.100
median13.550
std7.706
q110.160
q317.910
iqr7.750
skew2.096
kurtosis6.891
n_outliers137
outlier_rate0.043
zero_rate0.000
Show data table
Histogram bins for poverty_rate (median: 13.55).
bincount
1.6 – 3.2187
3.218 – 4.83634
4.836 – 6.454106
6.454 – 8.072246
8.072 – 9.69320
9.69 – 11.31354
11.31 – 12.93393
12.93 – 14.54364
14.54 – 16.16306
16.16 – 17.78262
17.78 – 19.4192
19.4 – 21.02149
21.02 – 22.63123
22.63 – 24.2591
24.25 – 25.8752
25.87 – 27.4944
27.49 – 29.1134
29.11 – 30.7223
30.72 – 32.3418
32.34 – 33.9614
33.96 – 35.586
35.58 – 37.28
37.2 – 38.813
38.81 – 40.438
40.43 – 42.055
42.05 – 43.679
43.67 – 45.294
45.29 – 46.911
46.9 – 48.527
48.52 – 50.148
50.14 – 51.762
51.76 – 53.386
53.38 – 54.995
54.99 – 56.615
56.61 – 58.231
58.23 – 59.850
59.85 – 61.470
61.47 – 63.080
63.08 – 64.71
64.7 – 66.321

rural categorical

rows3,222
null0 (0.0%)
unique2
top_valueTrue
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Show data table
Top values for rural (2 unique shown, of 2 total).
valuecountshare
True221268.7%
False101031.3%
Top values (rank 1–20)
  1. True — 2,212
  2. False — 1,010

rural_category categorical

rows3,222
null0 (0.0%)
unique2
top_valueRural
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Show data table
Top values for rural_category (2 unique shown, of 2 total).
valuecountshare
Rural221268.7%
Urban/Suburban101031.3%
Top values (rank 1–20)
  1. Rural — 2,212
  2. Urban/Suburban — 1,010

hospital_closure_risk_score numeric

rows3,222
null0 (0.0%)
unique3
min0.000
max50.000
mean21.695
median25.000
std16.338
q10.000
q325.000
iqr25.000
skew0.141
kurtosis-0.695
n_outliers0
outlier_rate0.000
zero_rate0.288
Show data table
Histogram bins for hospital_closure_risk_score (median: 25.0).
bincount
0 – 1.25929
1.25 – 2.50
2.5 – 3.750
3.75 – 50
5 – 6.250
6.25 – 7.50
7.5 – 8.750
8.75 – 100
10 – 11.250
11.25 – 12.50
12.5 – 13.750
13.75 – 150
15 – 16.250
16.25 – 17.50
17.5 – 18.750
18.75 – 200
20 – 21.250
21.25 – 22.50
22.5 – 23.750
23.75 – 250
25 – 26.251790
26.25 – 27.50
27.5 – 28.750
28.75 – 300
30 – 31.250
31.25 – 32.50
32.5 – 33.750
33.75 – 350
35 – 36.250
36.25 – 37.50
37.5 – 38.750
38.75 – 400
40 – 41.250
41.25 – 42.50
42.5 – 43.750
43.75 – 450
45 – 46.250
46.25 – 47.50
47.5 – 48.750
48.75 – 50503

risk_category categorical

rows3,222
null0 (0.0%)
unique2
top_valueLow
top_rate0.844
cardinality2
entropy0.625
entropy_ratio0.625
Show data table
Top values for risk_category (2 unique shown, of 2 total).
valuecountshare
Low271984.4%
Moderate50315.6%
Top values (rank 1–20)
  1. Low — 2,719
  2. Moderate — 503