rural urban

source /home/coolhand/html/datavis/data_trove/cache/rural_urban.parquet 3,222 rows 4 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset is a county-level reference table covering 3,222 U.S. counties, with each row uniquely identified by a county name and FIPS code and labeled as either rural or urban/suburban. The headline finding is the rural skew: 2,212 counties (about 68.7%) are flagged Rural versus 1,010 Urban/Suburban, and the `rural` and `rural_category` columns are perfectly redundant duplicates of each other. County names are dominated by Texas (256), Virginia (189), and Georgia (159), reflecting how many counties those states contain rather than any data quality issue.

citing: row_count · column_count · columns.rural.top_values · columns.rural.stats.top_rate · columns.rural_category.top_values · columns.county_name.top_words · columns.fips.stats.min · columns.fips.stats.max

Charts the summary said to look at first

rural_category · Shows the roughly 69/31 split between Rural and Urban/Suburban counties.

Show data table

Top values for rural_category (2 unique shown, of 2 total).
value	count	share
Rural	2212	68.7%
Urban/Suburban	1010	31.3%

fips · Distribution of FIPS codes spans 1001 to 72153, indicating coverage from Alabama through Puerto Rico.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

rural · Confirms `rural` mirrors `rural_category` exactly — one of the two columns is redundant.

Show data table

Top values for rural (2 unique shown, of 2 total).
value	count	share
True	2212	68.7%
False	1010	31.3%

county_name · County name lengths cluster tightly around 24 characters, useful for sanity-checking display widths.

Show data table

Character-length distribution for county_name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

Schema

4 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	numeric	0.0%	3,222
county_name	text	0.0%	3,222	near_unique
rural	categorical	0.0%	2
rural_category	categorical	0.0%	2

fips

numeric identifier

This is the FIPS county/state code, with all 3222 rows unique and no nulls. Values span 1001 to 72153 with a near-symmetric distribution (skew 0.16, kurtosis -0.63), consistent with the standard 5-digit US county FIPS encoding rather than a measured quantity. Treat it as a categorical key, not a number. Treatment: Cast to zero-padded string and use as a join key to geographic reference tables. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
min: 1,001
max: 72,153
mean: 3.138e+04
median: 30,022
std: 1.63e+04
q1: 1.903e+04
q3: 4.61e+04
iqr: 27,075
skew: 0.1574
kurtosis: -0.6314
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_name

text identifier near_unique

Each of the 3,222 rows holds a unique county-plus-state string (e.g., 'X County, Texas'), with 'county,' appearing 2,999 times and state names like Texas (256), Virginia (189), and Georgia (159) dominating the top words. Lengths are tight (16-59 chars, median 24) and there are zero nulls or duplicates, consistent with a complete US county roster. The near_unique alert is expected here rather than a data-quality issue. Treatment: Use as a join key to county-level reference tables; do not feed raw into a model. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
len_min: 16
len_max: 59
len_mean: 24.32
len_median: 24
len_p95: 31
word_mean: 3.248
word_median: 3
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,990
readability_flesch_mean: 10.28
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

rural

categorical feature

Binary boolean flag indicating whether a record is rural, fully populated across all 3222 rows. The split is roughly 69/31 in favour of True (2212 vs 1010), giving a high entropy ratio of 0.90 — imbalanced but far from degenerate. Treatment: Cast to 0/1 and use directly as a binary feature. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 2
top_value: True
top_rate: 0.6865
cardinality: 2
entropy: 0.8971
entropy_ratio: 0.8971

rural_category

categorical feature

Binary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) across all 3222 rows with no nulls. The split is roughly 69/31 toward Rural, giving an entropy ratio of 0.897, so both classes are well represented despite the imbalance. Treatment: Encode as a binary indicator (e.g., is_rural) for modelling. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 2
top_value: Rural
top_rate: 0.6865
cardinality: 2
entropy: 0.8971
entropy_ratio: 0.8971