nationwide census counties nationwide
Reading
This dataset covers 3,144 U.S. counties with demographic and socioeconomic indicators including population, median income, college attainment rate, and poverty rate, identified by FIPS codes and state. The most urgent issue is median_income: it has a minimum of -666,666,666 and a mean of -148,752, which are clearly sentinel values for missing data masquerading as numbers and must be cleaned before any analysis. Population is also extremely right-skewed (skew ~13, max ~9.9M vs median ~25,785), so log-scaling will be necessary for any visualization or modeling. State coverage is uneven, with Texas (254 counties), Georgia (159), and Virginia (133) dominating the row counts. College and poverty rates are the cleanest fields and behave roughly as expected for county-level distributions.
citing: median_income · population · state_name · college_rate · poverty_rate · county_fips
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Texas | 254 | 8.1% |
| Georgia | 159 | 5.1% |
| Virginia | 133 | 4.2% |
| Kentucky | 120 | 3.8% |
| Missouri | 115 | 3.7% |
| Kansas | 105 | 3.3% |
| Illinois | 102 | 3.2% |
| North Carolina | 100 | 3.2% |
| Iowa | 99 | 3.1% |
| Tennessee | 95 | 3.0% |
| Nebraska | 93 | 3.0% |
| Indiana | 92 | 2.9% |
| Ohio | 88 | 2.8% |
| Minnesota | 87 | 2.8% |
| Michigan | 83 | 2.6% |
| Mississippi | 82 | 2.6% |
| Oklahoma | 77 | 2.4% |
| Arkansas | 75 | 2.4% |
| Wisconsin | 72 | 2.3% |
| Alabama | 67 | 2.1% |
Show data table
| bin | count |
|---|---|
| 50 – 2.485e+05 | 2863 |
| 2.485e+05 – 4.969e+05 | 137 |
| 4.969e+05 – 7.453e+05 | 57 |
| 7.453e+05 – 9.937e+05 | 37 |
| 9.937e+05 – 1.242e+06 | 14 |
| 1.242e+06 – 1.491e+06 | 10 |
| 1.491e+06 – 1.739e+06 | 7 |
| 1.739e+06 – 1.987e+06 | 3 |
| 1.987e+06 – 2.236e+06 | 3 |
| 2.236e+06 – 2.484e+06 | 4 |
| 2.484e+06 – 2.733e+06 | 3 |
| 2.733e+06 – 2.981e+06 | 0 |
| 2.981e+06 – 3.229e+06 | 1 |
| 3.229e+06 – 3.478e+06 | 1 |
| 3.478e+06 – 3.726e+06 | 0 |
| 3.726e+06 – 3.975e+06 | 0 |
| 3.975e+06 – 4.223e+06 | 0 |
| 4.223e+06 – 4.472e+06 | 1 |
| 4.472e+06 – 4.72e+06 | 0 |
| 4.72e+06 – 4.968e+06 | 1 |
| 4.968e+06 – 5.217e+06 | 0 |
| 5.217e+06 – 5.465e+06 | 1 |
| 5.465e+06 – 5.714e+06 | 0 |
| 5.714e+06 – 5.962e+06 | 0 |
| 5.962e+06 – 6.21e+06 | 0 |
| 6.21e+06 – 6.459e+06 | 0 |
| 6.459e+06 – 6.707e+06 | 0 |
| 6.707e+06 – 6.956e+06 | 0 |
| 6.956e+06 – 7.204e+06 | 0 |
| 7.204e+06 – 7.453e+06 | 0 |
| 7.453e+06 – 7.701e+06 | 0 |
| 7.701e+06 – 7.949e+06 | 0 |
| 7.949e+06 – 8.198e+06 | 0 |
| 8.198e+06 – 8.446e+06 | 0 |
| 8.446e+06 – 8.695e+06 | 0 |
| 8.695e+06 – 8.943e+06 | 0 |
| 8.943e+06 – 9.191e+06 | 0 |
| 9.191e+06 – 9.44e+06 | 0 |
| 9.44e+06 – 9.688e+06 | 0 |
| 9.688e+06 – 9.937e+06 | 1 |
Show data table
| bin | count |
|---|---|
| -6.667e+08 – -6.5e+08 | 1 |
| -6.5e+08 – -6.333e+08 | 0 |
| -6.333e+08 – -6.167e+08 | 0 |
| -6.167e+08 – -6e+08 | 0 |
| -6e+08 – -5.833e+08 | 0 |
| -5.833e+08 – -5.666e+08 | 0 |
| -5.666e+08 – -5.5e+08 | 0 |
| -5.5e+08 – -5.333e+08 | 0 |
| -5.333e+08 – -5.166e+08 | 0 |
| -5.166e+08 – -5e+08 | 0 |
| -5e+08 – -4.833e+08 | 0 |
| -4.833e+08 – -4.666e+08 | 0 |
| -4.666e+08 – -4.499e+08 | 0 |
| -4.499e+08 – -4.333e+08 | 0 |
| -4.333e+08 – -4.166e+08 | 0 |
| -4.166e+08 – -3.999e+08 | 0 |
| -3.999e+08 – -3.833e+08 | 0 |
| -3.833e+08 – -3.666e+08 | 0 |
| -3.666e+08 – -3.499e+08 | 0 |
| -3.499e+08 – -3.332e+08 | 0 |
| -3.332e+08 – -3.166e+08 | 0 |
| -3.166e+08 – -2.999e+08 | 0 |
| -2.999e+08 – -2.832e+08 | 0 |
| -2.832e+08 – -2.666e+08 | 0 |
| -2.666e+08 – -2.499e+08 | 0 |
| -2.499e+08 – -2.332e+08 | 0 |
| -2.332e+08 – -2.166e+08 | 0 |
| -2.166e+08 – -1.999e+08 | 0 |
| -1.999e+08 – -1.832e+08 | 0 |
| -1.832e+08 – -1.665e+08 | 0 |
| -1.665e+08 – -1.499e+08 | 0 |
| -1.499e+08 – -1.332e+08 | 0 |
| -1.332e+08 – -1.165e+08 | 0 |
| -1.165e+08 – -9.986e+07 | 0 |
| -9.986e+07 – -8.318e+07 | 0 |
| -8.318e+07 – -6.651e+07 | 0 |
| -6.651e+07 – -4.984e+07 | 0 |
| -4.984e+07 – -3.317e+07 | 0 |
| -3.317e+07 – -1.65e+07 | 0 |
| -1.65e+07 – 1.705e+05 | 3143 |
Show data table
| bin | count |
|---|---|
| 1.603 – 2.941 | 6 |
| 2.941 – 4.278 | 20 |
| 4.278 – 5.616 | 64 |
| 5.616 – 6.953 | 149 |
| 6.953 – 8.291 | 227 |
| 8.291 – 9.628 | 300 |
| 9.628 – 10.97 | 313 |
| 10.97 – 12.3 | 361 |
| 12.3 – 13.64 | 308 |
| 13.64 – 14.98 | 260 |
| 14.98 – 16.32 | 259 |
| 16.32 – 17.65 | 216 |
| 17.65 – 18.99 | 151 |
| 18.99 – 20.33 | 118 |
| 20.33 – 21.66 | 95 |
| 21.66 – 23 | 94 |
| 23 – 24.34 | 51 |
| 24.34 – 25.68 | 40 |
| 25.68 – 27.01 | 32 |
| 27.01 – 28.35 | 20 |
| 28.35 – 29.69 | 18 |
| 29.69 – 31.03 | 12 |
| 31.03 – 32.36 | 9 |
| 32.36 – 33.7 | 6 |
| 33.7 – 35.04 | 2 |
| 35.04 – 36.38 | 3 |
| 36.38 – 37.71 | 2 |
| 37.71 – 39.05 | 1 |
| 39.05 – 40.39 | 1 |
| 40.39 – 41.73 | 1 |
| 41.73 – 43.06 | 1 |
| 43.06 – 44.4 | 1 |
| 44.4 – 45.74 | 0 |
| 45.74 – 47.08 | 1 |
| 47.08 – 48.41 | 0 |
| 48.41 – 49.75 | 0 |
| 49.75 – 51.09 | 0 |
| 51.09 – 52.43 | 1 |
| 52.43 – 53.76 | 0 |
| 53.76 – 55.1 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 1.409 | 1 |
| 1.409 – 2.817 | 1 |
| 2.817 – 4.226 | 4 |
| 4.226 – 5.635 | 13 |
| 5.635 – 7.043 | 44 |
| 7.043 – 8.452 | 142 |
| 8.452 – 9.861 | 225 |
| 9.861 – 11.27 | 305 |
| 11.27 – 12.68 | 368 |
| 12.68 – 14.09 | 357 |
| 14.09 – 15.5 | 296 |
| 15.5 – 16.9 | 273 |
| 16.9 – 18.31 | 202 |
| 18.31 – 19.72 | 161 |
| 19.72 – 21.13 | 143 |
| 21.13 – 22.54 | 103 |
| 22.54 – 23.95 | 95 |
| 23.95 – 25.36 | 88 |
| 25.36 – 26.76 | 57 |
| 26.76 – 28.17 | 51 |
| 28.17 – 29.58 | 40 |
| 29.58 – 30.99 | 37 |
| 30.99 – 32.4 | 28 |
| 32.4 – 33.81 | 22 |
| 33.81 – 35.22 | 12 |
| 35.22 – 36.63 | 17 |
| 36.63 – 38.03 | 12 |
| 38.03 – 39.44 | 12 |
| 39.44 – 40.85 | 12 |
| 40.85 – 42.26 | 7 |
| 42.26 – 43.67 | 2 |
| 43.67 – 45.08 | 3 |
| 45.08 – 46.49 | 2 |
| 46.49 – 47.89 | 1 |
| 47.89 – 49.3 | 3 |
| 49.3 – 50.71 | 3 |
| 50.71 – 52.12 | 0 |
| 52.12 – 53.53 | 1 |
| 53.53 – 54.94 | 0 |
| 54.94 – 56.35 | 1 |
Schema
8 columns| Alerts | ||||
|---|---|---|---|---|
| name | text | 0.0% | 3,144 |
near_unique
|
| state_fips | numeric | 0.0% | 51 |
|
| county_fips | numeric | 0.0% | 329 |
high_skew
outliers
|
| state_name | categorical | 0.0% | 51 |
|
| median_income | numeric | 0.0% | 3,021 |
high_skew
|
| poverty_rate | numeric | 0.0% | 3,144 |
|
| college_rate | numeric | 0.0% | 3,143 |
|
| population | numeric | 0.0% | 3,080 |
high_skew
outliers
|
name
text identifier near_uniqueThis is the full name of a US county-state pair: 2999 of 3144 rows contain the word 'county,' and the remaining top tokens are state names (Texas 256, Virginia 189, Georgia 159). Every value is unique (n_unique=3144, duplicate_rate=0.0) with no nulls and a tight length band (min 16, mean 24.2, max 59). It functions as a row identifier rather than a modelling feature. Treatment: Use as the row key for joins; do not feed into models.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- len_min
- 16
- len_max
- 59
- len_mean
- 24.16
- len_median
- 24
- len_p95
- 30.85
- word_mean
- 3.224
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,910
- readability_flesch_mean
- 6.826
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state_fips
numeric foreign_keyNumeric column with exactly 51 unique values across 3144 rows, ranging 1 to 56 with no nulls — this is the U.S. state FIPS code (50 states plus DC), and 3144 matches the U.S. county count. The mean (30.26) and median (29) sit near the middle of the code range, and the near-zero skew (-0.08) reflects roughly uniform coverage of states. Despite being stored as numeric, the values are categorical identifiers, not measurements. Treatment: Cast to categorical or zero-padded string and use as a join key to state-level reference tables; do not treat as a continuous feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- min
- 1
- max
- 56
- mean
- 30.26
- median
- 29
- std
- 15.15
- q1
- 18
- q3
- 45
- iqr
- 27
- skew
- -0.08128
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_fips
numeric identifier high_skew outliersThis is the county-level component of a FIPS code stored as an integer, with 3144 rows and only 329 unique values, suggesting many counties share the same within-state numeric suffix. Values run from 1 to 840 with a median of 79, but the high skew (2.84) and 176 outliers (5.6%) reflect the long tail of larger county codes used in a few states rather than a true distribution. There are no nulls or zeros. Treatment: Treat as a categorical code; concatenate with a state FIPS to form a unique county key for joins.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 329
- min
- 1
- max
- 840
- mean
- 103.9
- median
- 79
- std
- 107.6
- q1
- 35
- q3
- 133.5
- iqr
- 98.5
- skew
- 2.841
- kurtosis
- 11.38
- n_outliers
- 176
- outlier_rate
- 0.05598
- zero_rate
- 0
state_name
categorical featureThis column holds US state names, with 51 distinct values across 3,144 rows and no nulls — consistent with a county-level dataset covering all states plus DC. Distribution mirrors county counts: Texas leads at 254 (8.08%), followed by Georgia (159) and Virginia (133), and entropy ratio of 0.93 indicates a fairly even spread across states. No anomalies flagged. Treatment: Use as a categorical grouping key or one-hot/target-encode for modelling.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- Texas
- top_rate
- 0.08079
- cardinality
- 51
- entropy
- 5.277
- entropy_ratio
- 0.9304
median_income
numeric feature high_skewThis column appears to be county-level median household income in dollars, with a median of 60931 and IQR spanning 52544.5 to 70605.25. The minimum of -666666666 is a sentinel value masquerading as data, dragging the mean to -148752.33 and producing a skew of -56.04 and kurtosis of 3138.99. Aside from that contamination, 3021 unique values across 3144 rows and 135 outliers (4.29%) suggest an otherwise plausible distribution capped at 170463. Treatment: Replace the -666666666 sentinel with null, then consider log or robust scaling before modelling.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,021
- min
- -6.667e+08
- max
- 170,463
- mean
- -1.488e+05
- median
- 60,931
- std
- 1.189e+07
- q1
- 5.254e+04
- q3
- 7.061e+04
- iqr
- 1.806e+04
- skew
- -56.04
- kurtosis
- 3139
- n_outliers
- 135
- outlier_rate
- 0.04294
- zero_rate
- 0
poverty_rate
numeric featureNumeric poverty_rate spanning 1.60 to 55.10 with mean 13.82 and median 12.95, suggesting a percentage-style measure across 3144 rows (no nulls, no zeros). Distribution is right-skewed (skew 1.15, kurtosis 2.90) with 74 high-end outliers (2.35%) stretching the tail well past the Q3 of 16.77. Every one of the 3144 values is unique, consistent with a per-geography rate (e.g., one row per US county). Treatment: Consider a log or winsorising transform before regression to tame the right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 1.603
- max
- 55.1
- mean
- 13.82
- median
- 12.95
- std
- 5.702
- q1
- 9.699
- q3
- 16.77
- iqr
- 7.074
- skew
- 1.15
- kurtosis
- 2.901
- n_outliers
- 74
- outlier_rate
- 0.02354
- zero_rate
- 0
college_rate
numeric featureLikely a percentage of college-educated residents per row (probably a US county-level rate given n=3144). Values range from 0.0 to 56.35 with mean 16.26 and median 14.60, right-skewed (skew 1.42) with 134 outliers (4.26%) on the high tail. Near-unique (3143/3144) and no nulls, with only a single zero observation. Treatment: Use as-is or apply a mild log/sqrt transform to dampen the right skew before regression.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,143
- min
- 0
- max
- 56.35
- mean
- 16.26
- median
- 14.6
- std
- 7.005
- q1
- 11.48
- q3
- 19.37
- iqr
- 7.892
- skew
- 1.422
- kurtosis
- 2.751
- n_outliers
- 134
- outlier_rate
- 0.04262
- zero_rate
- 0.0003181
population
numeric feature high_skew outliersThis column reports a population count for 3,144 rows with no nulls and 3,080 unique values, consistent with one row per US county. The distribution is extremely right-skewed (skew 13.17, kurtosis 289.76): the median is 25,784.5 yet the mean is 105,310.94 and the max reaches 9,936,690, with 440 rows (14.0%) flagged as outliers. The std of 333,792 dwarfs the IQR of 57,244, confirming a heavy upper tail driven by a few very large jurisdictions. Treatment: log-transform before regression or modelling to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,080
- min
- 50
- max
- 9.937e+06
- mean
- 1.053e+05
- median
- 2.578e+04
- std
- 3.338e+05
- q1
- 1.084e+04
- q3
- 6.808e+04
- iqr
- 57,244
- skew
- 13.17
- kurtosis
- 289.8
- n_outliers
- 440
- outlier_rate
- 0.1399
- zero_rate
- 0