saturn

/home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:51:11+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/healthcare_data/rural_urban_classification_20260121.parquet
Total rows3,222
Profiled sample3,222
Columns4
Generated2026-05-01T16:51:11+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset catalogs 3,222 U.S. counties, each identified by a unique 5-character FIPS code and county name, and classified as either rural or urban/suburban. The two classification columns (`rural` and `rural_category`) are perfectly redundant, both showing 2,212 counties (about 68.7%) flagged as Rural versus 1,010 as Urban/Suburban. The most useful angle here is the rural/urban split, since FIPS and county_name are unique identifiers with no aggregate signal. Top words in `county_name` hint at geographic concentration, with Texas (256), Virginia (189), and Georgia (159) contributing the most counties.

fips high anthropic:claude-opus-4-7

This column holds 5-character FIPS codes, one per row across all 3222 records with zero nulls and zero duplicates. Every value is exactly 5 characters, single-word, and the sample tokens (01001, 01003, 01005...) match the standard 2-digit state + 3-digit county FIPS format. With n_unique equal to n, this is a row-level identifier rather than a feature.

county_name high anthropic:claude-opus-4-7

Full county identifiers, almost certainly formatted like 'X County, ' — 2,999 of 3,222 rows contain the token 'county,' and the remaining top tokens (texas, virginia, georgia, north carolina) are US state names. Every one of the 3,222 values is unique with zero nulls, duplicates, or empty strings, and lengths cluster tightly between 16 and 31 characters. Texas (256) and Virginia (189) lead the state distribution, consistent with the known US county counts.

rural high anthropic:claude-opus-4-7

Boolean flag indicating rural status, stored as the strings "True"/"False" with no nulls across 3222 rows. The split is imbalanced toward rural: 2212 True (68.7%) versus 1010 False, giving entropy 0.897 of the maximum 1.0.

rural_category high anthropic:claude-opus-4-7

A binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) with no nulls across 3222 rows. The split is uneven at roughly 68.7% rural, but entropy ratio of 0.897 indicates the minority class is still well represented. Cardinality of 2 makes this a clean categorical feature with no dirty variants.

fips text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows3,222
null0 (0.0%)
unique3,222
len_min5
len_max5
len_mean5.000
len_median5.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size3,222
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 01007
  2. 47021
  3. 49031
  4. 48279
  5. 27091
  6. 56033
  7. 28017
  8. 51165
  9. 48291
  10. 05019

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

rural categorical

rows3,222
null0 (0.0%)
unique2
top_valueTrue
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Top values (rank 1–20)
  1. True — 2,212
  2. False — 1,010

rural_category categorical

rows3,222
null0 (0.0%)
unique2
top_valueRural
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Top values (rank 1–20)
  1. Rural — 2,212
  2. Urban/Suburban — 1,010