nationwide census counties nationwide

source /home/coolhand/html/datavis/data_trove/data/geographic/nationwide/census_counties_nationwide.csv 3,144 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 3,144 U.S. counties with demographic and socioeconomic indicators including population, median income, college attainment rate, and poverty rate, identified by FIPS codes and state. The most urgent issue is median_income: it has a minimum of -666,666,666 and a mean of -148,752, which are clearly sentinel values for missing data masquerading as numbers and must be cleaned before any analysis. Population is also extremely right-skewed (skew ~13, max ~9.9M vs median ~25,785), so log-scaling will be necessary for any visualization or modeling. State coverage is uneven, with Texas (254 counties), Georgia (159), and Virginia (133) dominating the row counts. College and poverty rates are the cleanest fields and behave roughly as expected for county-level distributions.

citing: median_income · population · state_name · college_rate · poverty_rate · county_fips

Charts the summary said to look at first

state_name · Counties per state — note Texas, Georgia, and Virginia are heavily over-represented.

Show data table

Top values for state_name (20 unique shown, of 51 total).
value	count	share
Texas	254	8.1%
Georgia	159	5.1%
Virginia	133	4.2%
Kentucky	120	3.8%
Missouri	115	3.7%
Kansas	105	3.3%
Illinois	102	3.2%
North Carolina	100	3.2%
Iowa	99	3.1%
Tennessee	95	3.0%
Nebraska	93	3.0%
Indiana	92	2.9%
Ohio	88	2.8%
Minnesota	87	2.8%
Michigan	83	2.6%
Mississippi	82	2.6%
Oklahoma	77	2.4%
Arkansas	75	2.4%
Wisconsin	72	2.3%
Alabama	67	2.1%

population · Population is extremely right-skewed; consider a log scale to see the bulk of small counties.

Show data table

Histogram bins for population (median: 25784.5).
bin	count
50 – 2.485e+05	2863
2.485e+05 – 4.969e+05	137
4.969e+05 – 7.453e+05	57
7.453e+05 – 9.937e+05	37
9.937e+05 – 1.242e+06	14
1.242e+06 – 1.491e+06	10
1.491e+06 – 1.739e+06	7
1.739e+06 – 1.987e+06	3
1.987e+06 – 2.236e+06	3
2.236e+06 – 2.484e+06	4
2.484e+06 – 2.733e+06	3
2.733e+06 – 2.981e+06	0
2.981e+06 – 3.229e+06	1
3.229e+06 – 3.478e+06	1
3.478e+06 – 3.726e+06	0
3.726e+06 – 3.975e+06	0
3.975e+06 – 4.223e+06	0
4.223e+06 – 4.472e+06	1
4.472e+06 – 4.72e+06	0
4.72e+06 – 4.968e+06	1
4.968e+06 – 5.217e+06	0
5.217e+06 – 5.465e+06	1
5.465e+06 – 5.714e+06	0
5.714e+06 – 5.962e+06	0
5.962e+06 – 6.21e+06	0
6.21e+06 – 6.459e+06	0
6.459e+06 – 6.707e+06	0
6.707e+06 – 6.956e+06	0
6.956e+06 – 7.204e+06	0
7.204e+06 – 7.453e+06	0
7.453e+06 – 7.701e+06	0
7.701e+06 – 7.949e+06	0
7.949e+06 – 8.198e+06	0
8.198e+06 – 8.446e+06	0
8.446e+06 – 8.695e+06	0
8.695e+06 – 8.943e+06	0
8.943e+06 – 9.191e+06	0
9.191e+06 – 9.44e+06	0
9.44e+06 – 9.688e+06	0
9.688e+06 – 9.937e+06	1

median_income · Watch for sentinel values like -666666666 dragging the mean negative — these need to be filtered.

Show data table

Histogram bins for median_income (median: 60931.0).
bin	count
-6.667e+08 – -6.5e+08	1
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.167e+08	0
-6.167e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.666e+08	0
-5.666e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.166e+08	0
-5.166e+08 – -5e+08	0
-5e+08 – -4.833e+08	0
-4.833e+08 – -4.666e+08	0
-4.666e+08 – -4.499e+08	0
-4.499e+08 – -4.333e+08	0
-4.333e+08 – -4.166e+08	0
-4.166e+08 – -3.999e+08	0
-3.999e+08 – -3.833e+08	0
-3.833e+08 – -3.666e+08	0
-3.666e+08 – -3.499e+08	0
-3.499e+08 – -3.332e+08	0
-3.332e+08 – -3.166e+08	0
-3.166e+08 – -2.999e+08	0
-2.999e+08 – -2.832e+08	0
-2.832e+08 – -2.666e+08	0
-2.666e+08 – -2.499e+08	0
-2.499e+08 – -2.332e+08	0
-2.332e+08 – -2.166e+08	0
-2.166e+08 – -1.999e+08	0
-1.999e+08 – -1.832e+08	0
-1.832e+08 – -1.665e+08	0
-1.665e+08 – -1.499e+08	0
-1.499e+08 – -1.332e+08	0
-1.332e+08 – -1.165e+08	0
-1.165e+08 – -9.986e+07	0
-9.986e+07 – -8.318e+07	0
-8.318e+07 – -6.651e+07	0
-6.651e+07 – -4.984e+07	0
-4.984e+07 – -3.317e+07	0
-3.317e+07 – -1.65e+07	0
-1.65e+07 – 1.705e+05	3143

poverty_rate · A cleaner distribution centered near 13% with a moderate right tail of high-poverty counties.

Show data table

Histogram bins for poverty_rate (median: 12.951799620723307).
bin	count
1.603 – 2.941	6
2.941 – 4.278	20
4.278 – 5.616	64
5.616 – 6.953	149
6.953 – 8.291	227
8.291 – 9.628	300
9.628 – 10.97	313
10.97 – 12.3	361
12.3 – 13.64	308
13.64 – 14.98	260
14.98 – 16.32	259
16.32 – 17.65	216
17.65 – 18.99	151
18.99 – 20.33	118
20.33 – 21.66	95
21.66 – 23	94
23 – 24.34	51
24.34 – 25.68	40
25.68 – 27.01	32
27.01 – 28.35	20
28.35 – 29.69	18
29.69 – 31.03	12
31.03 – 32.36	9
32.36 – 33.7	6
33.7 – 35.04	2
35.04 – 36.38	3
36.38 – 37.71	2
37.71 – 39.05	1
39.05 – 40.39	1
40.39 – 41.73	1
41.73 – 43.06	1
43.06 – 44.4	1
44.4 – 45.74	0
45.74 – 47.08	1
47.08 – 48.41	0
48.41 – 49.75	0
49.75 – 51.09	0
51.09 – 52.43	1
52.43 – 53.76	0
53.76 – 55.1	1

college_rate · Right-skewed distribution showing most counties below 20% college attainment with a long tail of educated outliers.

Show data table

Histogram bins for college_rate (median: 14.596317224797893).
bin	count
0 – 1.409	1
1.409 – 2.817	1
2.817 – 4.226	4
4.226 – 5.635	13
5.635 – 7.043	44
7.043 – 8.452	142
8.452 – 9.861	225
9.861 – 11.27	305
11.27 – 12.68	368
12.68 – 14.09	357
14.09 – 15.5	296
15.5 – 16.9	273
16.9 – 18.31	202
18.31 – 19.72	161
19.72 – 21.13	143
21.13 – 22.54	103
22.54 – 23.95	95
23.95 – 25.36	88
25.36 – 26.76	57
26.76 – 28.17	51
28.17 – 29.58	40
29.58 – 30.99	37
30.99 – 32.4	28
32.4 – 33.81	22
33.81 – 35.22	12
35.22 – 36.63	17
36.63 – 38.03	12
38.03 – 39.44	12
39.44 – 40.85	12
40.85 – 42.26	7
42.26 – 43.67	2
43.67 – 45.08	3
45.08 – 46.49	2
46.49 – 47.89	1
47.89 – 49.3	3
49.3 – 50.71	3
50.71 – 52.12	0
52.12 – 53.53	1
53.53 – 54.94	0
54.94 – 56.35	1

Schema

8 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
name	text	0.0%	3,144	near_unique
state_fips	numeric	0.0%	51
county_fips	numeric	0.0%	329	high_skew outliers
state_name	categorical	0.0%	51
median_income	numeric	0.0%	3,021	high_skew
poverty_rate	numeric	0.0%	3,144
college_rate	numeric	0.0%	3,143
population	numeric	0.0%	3,080	high_skew outliers

name

text identifier near_unique

This is the full name of a US county-state pair: 2999 of 3144 rows contain the word 'county,' and the remaining top tokens are state names (Texas 256, Virginia 189, Georgia 159). Every value is unique (n_unique=3144, duplicate_rate=0.0) with no nulls and a tight length band (min 16, mean 24.2, max 59). It functions as a row identifier rather than a modelling feature. Treatment: Use as the row key for joins; do not feed into models. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 3,144
len_min: 16
len_max: 59
len_mean: 24.16
len_median: 24
len_p95: 30.85
word_mean: 3.224
word_median: 3
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 1,910
readability_flesch_mean: 6.826
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

state_fips

numeric foreign_key

Numeric column with exactly 51 unique values across 3144 rows, ranging 1 to 56 with no nulls — this is the U.S. state FIPS code (50 states plus DC), and 3144 matches the U.S. county count. The mean (30.26) and median (29) sit near the middle of the code range, and the near-zero skew (-0.08) reflects roughly uniform coverage of states. Despite being stored as numeric, the values are categorical identifiers, not measurements. Treatment: Cast to categorical or zero-padded string and use as a join key to state-level reference tables; do not treat as a continuous feature. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 51
min: 1
max: 56
mean: 30.26
median: 29
std: 15.15
q1: 18
q3: 45
iqr: 27
skew: -0.08128
kurtosis: -1.099
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_fips

numeric identifier high_skew outliers

This is the county-level component of a FIPS code stored as an integer, with 3144 rows and only 329 unique values, suggesting many counties share the same within-state numeric suffix. Values run from 1 to 840 with a median of 79, but the high skew (2.84) and 176 outliers (5.6%) reflect the long tail of larger county codes used in a few states rather than a true distribution. There are no nulls or zeros. Treatment: Treat as a categorical code; concatenate with a state FIPS to form a unique county key for joins. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 329
min: 1
max: 840
mean: 103.9
median: 79
std: 107.6
q1: 35
q3: 133.5
iqr: 98.5
skew: 2.841
kurtosis: 11.38
n_outliers: 176
outlier_rate: 0.05598
zero_rate: 0

state_name

categorical feature

This column holds US state names, with 51 distinct values across 3,144 rows and no nulls — consistent with a county-level dataset covering all states plus DC. Distribution mirrors county counts: Texas leads at 254 (8.08%), followed by Georgia (159) and Virginia (133), and entropy ratio of 0.93 indicates a fairly even spread across states. No anomalies flagged. Treatment: Use as a categorical grouping key or one-hot/target-encode for modelling. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 51
top_value: Texas
top_rate: 0.08079
cardinality: 51
entropy: 5.277
entropy_ratio: 0.9304

median_income

numeric feature high_skew

This column appears to be county-level median household income in dollars, with a median of 60931 and IQR spanning 52544.5 to 70605.25. The minimum of -666666666 is a sentinel value masquerading as data, dragging the mean to -148752.33 and producing a skew of -56.04 and kurtosis of 3138.99. Aside from that contamination, 3021 unique values across 3144 rows and 135 outliers (4.29%) suggest an otherwise plausible distribution capped at 170463. Treatment: Replace the -666666666 sentinel with null, then consider log or robust scaling before modelling. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 3,021
min: -6.667e+08
max: 170,463
mean: -1.488e+05
median: 60,931
std: 1.189e+07
q1: 5.254e+04
q3: 7.061e+04
iqr: 1.806e+04
skew: -56.04
kurtosis: 3139
n_outliers: 135
outlier_rate: 0.04294
zero_rate: 0

poverty_rate

numeric feature

Numeric poverty_rate spanning 1.60 to 55.10 with mean 13.82 and median 12.95, suggesting a percentage-style measure across 3144 rows (no nulls, no zeros). Distribution is right-skewed (skew 1.15, kurtosis 2.90) with 74 high-end outliers (2.35%) stretching the tail well past the Q3 of 16.77. Every one of the 3144 values is unique, consistent with a per-geography rate (e.g., one row per US county). Treatment: Consider a log or winsorising transform before regression to tame the right tail. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 3,144
min: 1.603
max: 55.1
mean: 13.82
median: 12.95
std: 5.702
q1: 9.699
q3: 16.77
iqr: 7.074
skew: 1.15
kurtosis: 2.901
n_outliers: 74
outlier_rate: 0.02354
zero_rate: 0

college_rate

numeric feature

Likely a percentage of college-educated residents per row (probably a US county-level rate given n=3144). Values range from 0.0 to 56.35 with mean 16.26 and median 14.60, right-skewed (skew 1.42) with 134 outliers (4.26%) on the high tail. Near-unique (3143/3144) and no nulls, with only a single zero observation. Treatment: Use as-is or apply a mild log/sqrt transform to dampen the right skew before regression. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 3,143
min: 0
max: 56.35
mean: 16.26
median: 14.6
std: 7.005
q1: 11.48
q3: 19.37
iqr: 7.892
skew: 1.422
kurtosis: 2.751
n_outliers: 134
outlier_rate: 0.04262
zero_rate: 0.0003181

population

numeric feature high_skew outliers

This column reports a population count for 3,144 rows with no nulls and 3,080 unique values, consistent with one row per US county. The distribution is extremely right-skewed (skew 13.17, kurtosis 289.76): the median is 25,784.5 yet the mean is 105,310.94 and the max reaches 9,936,690, with 440 rows (14.0%) flagged as outliers. The std of 333,792 dwarfs the IQR of 57,244, confirming a heavy upper tail driven by a few very large jurisdictions. Treatment: log-transform before regression or modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 3,144
nulls: 0 (0.0%)
unique: 3,080
min: 50
max: 9.937e+06
mean: 1.053e+05
median: 2.578e+04
std: 3.338e+05
q1: 1.084e+04
q3: 6.808e+04
iqr: 57,244
skew: 13.17
kurtosis: 289.8
n_outliers: 440
outlier_rate: 0.1399
zero_rate: 0