healthcare data rural urban classification 20260121

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet

Saturn profiled 3,222 rows across 4 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet",
    "--findings", "healthcare_data-rural_urban_classification_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 3,222 U.S. counties, each identified by a unique 5-character FIPS code and county name, and classified as either rural or urban/suburban. The two classification columns (`rural` and `rural_category`) are perfectly redundant, both showing 2,212 counties (about 68.7%) flagged as Rural versus 1,010 as Urban/Suburban. The most useful angle here is the rural/urban split, since FIPS and county_name are unique identifiers with no aggregate signal. Top words in `county_name` hint at geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties.

citing: row_count · column_count · columns[2].top_values · columns[2].stats.top_rate · columns[3].top_values · columns[1].top_words · columns[0].n_unique

Out[4]:

saturn.schema() · 4 columns

column	kind	n	null%	unique	alerts
fips	text	3,222	0.0%	3,222	near_unique one_word allcaps short_text
county_name	text	3,222	0.0%	3,222	near_unique
rural	categorical	3,222	0.0%	2
rural_category	categorical	3,222	0.0%	2

Fig 1.

rural_category · Roughly 69% of counties are classified Rural versus 31% Urban/Suburban.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

Fig 2.

rural · Confirms the same 2,212 vs 1,010 split as rural_category — these two columns are redundant.

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

Fig 3.

county_name · County name lengths cluster tightly around 24 characters, reflecting the consistent 'X County, State' format.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Fig 4.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
fips	text	0.0%
county_name	text	0.0%
rural	categorical	0.0%
rural_category	categorical	0.0%

fips text identifier

This column holds 5-character FIPS codes, one per row across all 3222 records with zero nulls and zero duplicates. Every value is exactly 5 characters, single-word, and the sample tokens (01001, 01003, 01005...) match the standard 2-digit state + 3-digit county FIPS format. With n_unique equal to n, this is a row-level identifier rather than a feature.

Treatment: Treat as a county key and left-join to geographic reference tables; do not use as a model feature.

anthropic:claude-opus-4-7 · confidence high

Out[10]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	5
len_max	5
len_mean	5
len_median	5
len_p95	5
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	3,222
readability_flesch_mean	121.2
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	1
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: allcaps	100.0% rows are all-caps
alert: short_text	95th-percentile length under 20 chars

Fig 5.

Character-length distribution for fips.

Show data table

Character-length distribution for fips (mean: 5.0).
chars	count
4 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	3222
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 6	0

county_name text identifier

Full county identifiers, almost certainly formatted like 'X County, ' — 2,999 of 3,222 rows contain the token 'county,' and the remaining top tokens (texas, virginia, georgia, north carolina) are US state names. Every one of the 3,222 values is unique with zero nulls, duplicates, or empty strings, and lengths cluster tightly between 16 and 31 characters. Texas (256) and Virginia (189) lead the state distribution, consistent with the known US county counts.

Treatment: Use as a geographic key; left-join to state/FIPS lookups rather than feeding to a model directly.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["county_name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 6.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

rural categorical feature

Boolean flag indicating rural status, stored as the strings "True"/"False" with no nulls across 3222 rows. The split is imbalanced toward rural: 2212 True (68.7%) versus 1010 False, giving entropy 0.897 of the maximum 1.0.

Treatment: Cast string "True"/"False" to boolean or 0/1 before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["rural"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	True
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 7.

Top values for rural.

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

rural_category categorical feature

A binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) with no nulls across 3222 rows. The split is uneven at roughly 68.7% rural, but entropy ratio of 0.897 indicates the minority class is still well represented. Cardinality of 2 makes this a clean categorical feature with no dirty variants.

Treatment: Encode as a binary indicator (e.g., is_rural) before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["rural_category"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2
top_value	Rural
top_rate	0.6865
cardinality	2
entropy	0.8971
entropy_ratio	0.8971

Fig 8.

Top values for rural_category.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

How to cite

click to copy

BibTeX

@misc{saturn-healthcare-data-rural-urban-classification-20260121-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: healthcare data rural urban classification 20260121},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/healthcare_data-rural_urban_classification_20260121}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: healthcare data rural urban classification 20260121. Source: /home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/healthcare_data-rural_urban_classification_20260121