saturn·

rural urban

source /home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet 3,222 rows 4 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset is a county-level reference table covering 3,222 U.S. counties, with each row uniquely identified by a county name and FIPS code and labeled as either rural or urban/suburban. The headline finding is the rural skew: 2,212 counties (about 68.7%) are flagged Rural versus 1,010 Urban/Suburban, and the `rural` and `rural_category` columns are perfectly redundant duplicates of each other. County names are dominated by Texas (256), Virginia (189), and Georgia (159), reflecting how many counties those states contain rather than any data quality issue.

citing: row_count · column_count · columns.rural.top_values · columns.rural.stats.top_rate · columns.rural_category.top_values · columns.county_name.top_words · columns.fips.stats.min · columns.fips.stats.max

Schema

4 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
rural categorical 0.0% 2
rural_category categorical 0.0% 2

fips

numeric identifier
This is the FIPS county/state code, with all 3222 rows unique and no nulls. Values span 1001 to 72153 with a near-symmetric distribution (skew 0.16, kurtosis -0.63), consistent with the standard 5-digit US county FIPS encoding rather than a measured quantity. Treat it as a categorical key, not a number. Treatment: Cast to zero-padded string and use as a join key to geographic reference tables. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
Each of the 3,222 rows holds a unique county-plus-state string (e.g., 'X County, Texas'), with 'county,' appearing 2,999 times and state names like Texas (256), Virginia (189), and Georgia (159) dominating the top words. Lengths are tight (16-59 chars, median 24) and there are zero nulls or duplicates, consistent with a complete US county roster. The near_unique alert is expected here rather than a data-quality issue. Treatment: Use as a join key to county-level reference tables; do not feed raw into a model. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

rural

categorical feature
Binary boolean flag indicating whether a record is rural, fully populated across all 3222 rows. The split is roughly 69/31 in favour of True (2212 vs 1010), giving a high entropy ratio of 0.90 — imbalanced but far from degenerate. Treatment: Cast to 0/1 and use directly as a binary feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2
top_value
True
top_rate
0.6865
cardinality
2
entropy
0.8971
entropy_ratio
0.8971

rural_category

categorical feature
Binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) across all 3222 rows with no nulls. The split is roughly 69/31 toward Rural, giving an entropy ratio of 0.897, so both classes are well represented despite the imbalance. Treatment: Encode as a binary indicator (e.g., is_rural) for modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2
top_value
Rural
top_rate
0.6865
cardinality
2
entropy
0.8971
entropy_ratio
0.8971