saturn

/home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:52:50+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet
Total rows3,222
Profiled sample3,222
Columns4
Generated2026-05-01T16:52:50+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset is a county-level reference table covering 3,222 U.S. counties, with each row uniquely identified by a county name and FIPS code and labeled as either rural or urban/suburban. The headline finding is the rural skew: 2,212 counties (about 68.7%) are flagged Rural versus 1,010 Urban/Suburban, and the `rural` and `rural_category` columns are perfectly redundant duplicates of each other. County names are dominated by Texas (256), Virginia (189), and Georgia (159), reflecting how many counties those states contain rather than any data quality issue.

fips high anthropic:claude-opus-4-7

This is the FIPS county/state code, with all 3222 rows unique and no nulls. Values span 1001 to 72153 with a near-symmetric distribution (skew 0.16, kurtosis -0.63), consistent with the standard 5-digit US county FIPS encoding rather than a measured quantity. Treat it as a categorical key, not a number.

county_name high anthropic:claude-opus-4-7

Each of the 3,222 rows holds a unique county-plus-state string (e.g., 'X County, Texas'), with 'county,' appearing 2,999 times and state names like Texas (256), Virginia (189), and Georgia (159) dominating the top words. Lengths are tight (16-59 chars, median 24) and there are zero nulls or duplicates, consistent with a complete US county roster. The near_unique alert is expected here rather than a data-quality issue.

rural high anthropic:claude-opus-4-7

Binary boolean flag indicating whether a record is rural, fully populated across all 3222 rows. The split is roughly 69/31 in favour of True (2212 vs 1010), giving a high entropy ratio of 0.90 — imbalanced but far from degenerate.

rural_category high anthropic:claude-opus-4-7

Binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) across all 3222 rows with no nulls. The split is roughly 69/31 toward Rural, giving an entropy ratio of 0.897, so both classes are well represented despite the imbalance.

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

rural categorical

rows3,222
null0 (0.0%)
unique2
top_valueTrue
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Top values (rank 1–20)
  1. True — 2,212
  2. False — 1,010

rural_category categorical

rows3,222
null0 (0.0%)
unique2
top_valueRural
top_rate0.687
cardinality2
entropy0.897
entropy_ratio0.897
Top values (rank 1–20)
  1. Rural — 2,212
  2. Urban/Suburban — 1,010