saturn·

healthcare data rural urban classification 20260121

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet

Saturn profiled 3,222 rows across 4 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet",
    "--findings", "healthcare_data-rural_urban_classification_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 3,222 U.S. counties, each identified by a unique 5-character FIPS code and county name, and classified as either rural or urban/suburban. The two classification columns (`rural` and `rural_category`) are perfectly redundant, both showing 2,212 counties (about 68.7%) flagged as Rural versus 1,010 as Urban/Suburban. The most useful angle here is the rural/urban split, since FIPS and county_name are unique identifiers with no aggregate signal. Top words in `county_name` hint at geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties.

citing: row_count · column_count · columns[2].top_values · columns[2].stats.top_rate · columns[3].top_values · columns[1].top_words · columns[0].n_unique

Out[4]:

saturn.schema() · 4 columns

column kind n null% unique alerts
fips text 3,222 0.0% 3,222 near_unique one_word allcaps short_text
county_name text 3,222 0.0% 3,222 near_unique
rural categorical 3,222 0.0% 2
rural_category categorical 3,222 0.0% 2
Fig 1.
rural_category · Roughly 69% of counties are classified Rural versus 31% Urban/Suburban.
Show data table
Top values for rural_category (2 unique shown, of 2 total).
valuecountshare
Rural221268.7%
Urban/Suburban101031.3%
Fig 2.
rural · Confirms the same 2,212 vs 1,010 split as rural_category — these two columns are redundant.
Show data table
Top values for rural (2 unique shown, of 2 total).
valuecountshare
True221268.7%
False101031.3%
Fig 3.
county_name · County name lengths cluster tightly around 24 characters, reflecting the consistent 'X County, State' format.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipstext0.0%
county_nametext0.0%
ruralcategorical0.0%
rural_categorycategorical0.0%

fips text identifier

This column holds 5-character FIPS codes, one per row across all 3222 records with zero nulls and zero duplicates. Every value is exactly 5 characters, single-word, and the sample tokens (01001, 01003, 01005...) match the standard 2-digit state + 3-digit county FIPS format. With n_unique equal to n, this is a row-level identifier rather than a feature.

Treatment: Treat as a county key and left-join to geographic reference tables; do not use as a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[10]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 5
len_max 5
len_mean 5
len_median 5
len_p95 5
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 3,222
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 5.
Character-length distribution for fips.
Show data table
Character-length distribution for fips (mean: 5.0).
charscount
4 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 53222
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60

county_name text identifier

Full county identifiers, almost certainly formatted like 'X County, ' — 2,999 of 3,222 rows contain the token 'county,' and the remaining top tokens (texas, virginia, georgia, north carolina) are US state names. Every one of the 3,222 values is unique with zero nulls, duplicates, or empty strings, and lengths cluster tightly between 16 and 31 characters. Texas (256) and Virginia (189) lead the state distribution, consistent with the known US county counts.

Treatment: Use as a geographic key; left-join to state/FIPS lookups rather than feeding to a model directly.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 6.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

rural categorical feature

Boolean flag indicating rural status, stored as the strings "True"/"False" with no nulls across 3222 rows. The split is imbalanced toward rural: 2212 True (68.7%) versus 1010 False, giving entropy 0.897 of the maximum 1.0.

Treatment: Cast string "True"/"False" to boolean or 0/1 before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["rural"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2
top_value True
top_rate 0.6865
cardinality 2
entropy 0.8971
entropy_ratio 0.8971
Fig 7.
Top values for rural.
Show data table
Top values for rural (2 unique shown, of 2 total).
valuecountshare
True221268.7%
False101031.3%

rural_category categorical feature

A binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) with no nulls across 3222 rows. The split is uneven at roughly 68.7% rural, but entropy ratio of 0.897 indicates the minority class is still well represented. Cardinality of 2 makes this a clean categorical feature with no dirty variants.

Treatment: Encode as a binary indicator (e.g., is_rural) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["rural_category"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2
top_value Rural
top_rate 0.6865
cardinality 2
entropy 0.8971
entropy_ratio 0.8971
Fig 8.
Top values for rural_category.
Show data table
Top values for rural_category (2 unique shown, of 2 total).
valuecountshare
Rural221268.7%
Urban/Suburban101031.3%

How to cite

click to copy

BibTeX
@misc{saturn-healthcare-data-rural-urban-classification-20260121-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: healthcare data rural urban classification 20260121},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/healthcare_data-rural_urban_classification_20260121}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: healthcare data rural urban classification 20260121. Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/healthcare_data-rural_urban_classification_20260121