This dataset catalogs 3,222 U.S. counties, each identified by a unique 5-character FIPS code and county name, and classified as either rural or urban/suburban. The two classification columns (`rural` and `rural_category`) are perfectly redundant, both showing 2,212 counties (about 68.7%) flagged as Rural versus 1,010 as Urban/Suburban. The most useful angle here is the rural/urban split, since FIPS and county_name are unique identifiers with no aggregate signal. Top words in `county_name` hint at geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties.
saturn
/home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:51:11+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet |
| Total rows | 3,222 |
| Profiled sample | 3,222 |
| Columns | 4 |
| Generated | 2026-05-01T16:51:11+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
This column holds 5-character FIPS codes, one per row across all 3222 records with zero nulls and zero duplicates. Every value is exactly 5 characters, single-word, and the sample tokens (01001, 01003, 01005...) match the standard 2-digit state + 3-digit county FIPS format. With n_unique equal to n, this is a row-level identifier rather than a feature.
Full county identifiers, almost certainly formatted like 'X County,
Boolean flag indicating rural status, stored as the strings "True"/"False" with no nulls across 3222 rows. The split is imbalanced toward rural: 2212 True (68.7%) versus 1010 False, giving entropy 0.897 of the maximum 1.0.
A binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) with no nulls across 3222 rows. The split is uneven at roughly 68.7% rural, but entropy ratio of 0.897 indicates the minority class is still well represented. Cardinality of 2 makes this a clean categorical feature with no dirty variants.
fips text
Sample values (first 10)
- 01007
- 47021
- 49031
- 48279
- 27091
- 56033
- 28017
- 51165
- 48291
- 05019
county_name text
Sample values (first 10)
- Bibb County, Alabama
- Cheatham County, Tennessee
- Piute County, Utah
- Lamb County, Texas
- Martin County, Minnesota
- Sheridan County, Wyoming
- Chickasaw County, Mississippi
- Rockingham County, Virginia
- Liberty County, Texas
- Clark County, Arkansas
rural categorical
Top values (rank 1–20)
- True — 2,212
- False — 1,010
rural_category categorical
Top values (rank 1–20)
- Rural — 2,212
- Urban/Suburban — 1,010