This dataset is a county-level reference table covering 3,222 U.S. counties, with each row uniquely identified by a county name and FIPS code and labeled as either rural or urban/suburban. The headline finding is the rural skew: 2,212 counties (about 68.7%) are flagged Rural versus 1,010 Urban/Suburban, and the `rural` and `rural_category` columns are perfectly redundant duplicates of each other. County names are dominated by Texas (256), Virginia (189), and Georgia (159), reflecting how many counties those states contain rather than any data quality issue.
saturn
/home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:52:50+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet |
| Total rows | 3,222 |
| Profiled sample | 3,222 |
| Columns | 4 |
| Generated | 2026-05-01T16:52:50+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
This is the FIPS county/state code, with all 3222 rows unique and no nulls. Values span 1001 to 72153 with a near-symmetric distribution (skew 0.16, kurtosis -0.63), consistent with the standard 5-digit US county FIPS encoding rather than a measured quantity. Treat it as a categorical key, not a number.
Each of the 3,222 rows holds a unique county-plus-state string (e.g., 'X County, Texas'), with 'county,' appearing 2,999 times and state names like Texas (256), Virginia (189), and Georgia (159) dominating the top words. Lengths are tight (16-59 chars, median 24) and there are zero nulls or duplicates, consistent with a complete US county roster. The near_unique alert is expected here rather than a data-quality issue.
Binary boolean flag indicating whether a record is rural, fully populated across all 3222 rows. The split is roughly 69/31 in favour of True (2212 vs 1010), giving a high entropy ratio of 0.90 — imbalanced but far from degenerate.
Binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) across all 3222 rows with no nulls. The split is roughly 69/31 toward Rural, giving an entropy ratio of 0.897, so both classes are well represented despite the imbalance.
fips numeric
county_name text
Sample values (first 10)
- Bibb County, Alabama
- Cheatham County, Tennessee
- Piute County, Utah
- Lamb County, Texas
- Martin County, Minnesota
- Sheridan County, Wyoming
- Chickasaw County, Mississippi
- Rockingham County, Virginia
- Liberty County, Texas
- Clark County, Arkansas
rural categorical
Top values (rank 1–20)
- True — 2,212
- False — 1,010
rural_category categorical
Top values (rank 1–20)
- Rural — 2,212
- Urban/Suburban — 1,010