acs 2022 county
Reading
This dataset covers 3,144 U.S. counties from the 2022 American Community Survey, with each row identified by FIPS, state, county code, and name, plus three Census table values: total population (B01003_001E), male veteran population (B21001_002E), and civilian labor force (B23025_002E). All three demographic measures are extremely right-skewed (skew of 13.2, 8.0, and 13.1) with hundreds of outlier counties — for example, total population ranges from 50 up to 9.94 million while the median is just 25,784. About 13-14% of counties register as outliers on each measure, reflecting the handful of very large metro counties dominating the tails. Start by looking at population and labor-force distributions on a log scale, and use the state field (51 unique values) to see how counties cluster geographically.
citing: row_count · column_count · columns.B01003_001E.stats · columns.B21001_002E.stats · columns.B23025_002E.stats · columns.state.n_unique · columns.NAME.top_words
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 50 – 2.485e+05 | 2863 |
| 2.485e+05 – 4.969e+05 | 137 |
| 4.969e+05 – 7.453e+05 | 57 |
| 7.453e+05 – 9.937e+05 | 37 |
| 9.937e+05 – 1.242e+06 | 14 |
| 1.242e+06 – 1.491e+06 | 10 |
| 1.491e+06 – 1.739e+06 | 7 |
| 1.739e+06 – 1.987e+06 | 3 |
| 1.987e+06 – 2.236e+06 | 3 |
| 2.236e+06 – 2.484e+06 | 4 |
| 2.484e+06 – 2.733e+06 | 3 |
| 2.733e+06 – 2.981e+06 | 0 |
| 2.981e+06 – 3.229e+06 | 1 |
| 3.229e+06 – 3.478e+06 | 1 |
| 3.478e+06 – 3.726e+06 | 0 |
| 3.726e+06 – 3.975e+06 | 0 |
| 3.975e+06 – 4.223e+06 | 0 |
| 4.223e+06 – 4.472e+06 | 1 |
| 4.472e+06 – 4.72e+06 | 0 |
| 4.72e+06 – 4.968e+06 | 1 |
| 4.968e+06 – 5.217e+06 | 0 |
| 5.217e+06 – 5.465e+06 | 1 |
| 5.465e+06 – 5.714e+06 | 0 |
| 5.714e+06 – 5.962e+06 | 0 |
| 5.962e+06 – 6.21e+06 | 0 |
| 6.21e+06 – 6.459e+06 | 0 |
| 6.459e+06 – 6.707e+06 | 0 |
| 6.707e+06 – 6.956e+06 | 0 |
| 6.956e+06 – 7.204e+06 | 0 |
| 7.204e+06 – 7.453e+06 | 0 |
| 7.453e+06 – 7.701e+06 | 0 |
| 7.701e+06 – 7.949e+06 | 0 |
| 7.949e+06 – 8.198e+06 | 0 |
| 8.198e+06 – 8.446e+06 | 0 |
| 8.446e+06 – 8.695e+06 | 0 |
| 8.695e+06 – 8.943e+06 | 0 |
| 8.943e+06 – 9.191e+06 | 0 |
| 9.191e+06 – 9.44e+06 | 0 |
| 9.44e+06 – 9.688e+06 | 0 |
| 9.688e+06 – 9.937e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 6104 | 2534 |
| 6104 – 1.221e+04 | 271 |
| 1.221e+04 – 1.831e+04 | 116 |
| 1.831e+04 – 2.442e+04 | 68 |
| 2.442e+04 – 3.052e+04 | 49 |
| 3.052e+04 – 3.662e+04 | 30 |
| 3.662e+04 – 4.273e+04 | 18 |
| 4.273e+04 – 4.883e+04 | 20 |
| 4.883e+04 – 5.494e+04 | 7 |
| 5.494e+04 – 6.104e+04 | 3 |
| 6.104e+04 – 6.714e+04 | 4 |
| 6.714e+04 – 7.325e+04 | 6 |
| 7.325e+04 – 7.935e+04 | 0 |
| 7.935e+04 – 8.546e+04 | 6 |
| 8.546e+04 – 9.156e+04 | 2 |
| 9.156e+04 – 9.766e+04 | 1 |
| 9.766e+04 – 1.038e+05 | 0 |
| 1.038e+05 – 1.099e+05 | 1 |
| 1.099e+05 – 1.16e+05 | 1 |
| 1.16e+05 – 1.221e+05 | 0 |
| 1.221e+05 – 1.282e+05 | 0 |
| 1.282e+05 – 1.343e+05 | 0 |
| 1.343e+05 – 1.404e+05 | 1 |
| 1.404e+05 – 1.465e+05 | 2 |
| 1.465e+05 – 1.526e+05 | 0 |
| 1.526e+05 – 1.587e+05 | 1 |
| 1.587e+05 – 1.648e+05 | 0 |
| 1.648e+05 – 1.709e+05 | 0 |
| 1.709e+05 – 1.77e+05 | 0 |
| 1.77e+05 – 1.831e+05 | 0 |
| 1.831e+05 – 1.892e+05 | 0 |
| 1.892e+05 – 1.953e+05 | 1 |
| 1.953e+05 – 2.014e+05 | 0 |
| 2.014e+05 – 2.075e+05 | 0 |
| 2.075e+05 – 2.136e+05 | 0 |
| 2.136e+05 – 2.197e+05 | 0 |
| 2.197e+05 – 2.258e+05 | 0 |
| 2.258e+05 – 2.32e+05 | 1 |
| 2.32e+05 – 2.381e+05 | 0 |
| 2.381e+05 – 2.442e+05 | 1 |
Show data table
| bin | count |
|---|---|
| 36 – 1.311e+05 | 2867 |
| 1.311e+05 – 2.621e+05 | 135 |
| 2.621e+05 – 3.931e+05 | 52 |
| 3.931e+05 – 5.241e+05 | 41 |
| 5.241e+05 – 6.551e+05 | 14 |
| 6.551e+05 – 7.862e+05 | 10 |
| 7.862e+05 – 9.172e+05 | 5 |
| 9.172e+05 – 1.048e+06 | 5 |
| 1.048e+06 – 1.179e+06 | 4 |
| 1.179e+06 – 1.31e+06 | 2 |
| 1.31e+06 – 1.441e+06 | 3 |
| 1.441e+06 – 1.572e+06 | 0 |
| 1.572e+06 – 1.703e+06 | 1 |
| 1.703e+06 – 1.834e+06 | 1 |
| 1.834e+06 – 1.965e+06 | 0 |
| 1.965e+06 – 2.096e+06 | 0 |
| 2.096e+06 – 2.227e+06 | 0 |
| 2.227e+06 – 2.358e+06 | 1 |
| 2.358e+06 – 2.489e+06 | 1 |
| 2.489e+06 – 2.62e+06 | 0 |
| 2.62e+06 – 2.751e+06 | 0 |
| 2.751e+06 – 2.882e+06 | 1 |
| 2.882e+06 – 3.013e+06 | 0 |
| 3.013e+06 – 3.145e+06 | 0 |
| 3.145e+06 – 3.276e+06 | 0 |
| 3.276e+06 – 3.407e+06 | 0 |
| 3.407e+06 – 3.538e+06 | 0 |
| 3.538e+06 – 3.669e+06 | 0 |
| 3.669e+06 – 3.8e+06 | 0 |
| 3.8e+06 – 3.931e+06 | 0 |
| 3.931e+06 – 4.062e+06 | 0 |
| 4.062e+06 – 4.193e+06 | 0 |
| 4.193e+06 – 4.324e+06 | 0 |
| 4.324e+06 – 4.455e+06 | 0 |
| 4.455e+06 – 4.586e+06 | 0 |
| 4.586e+06 – 4.717e+06 | 0 |
| 4.717e+06 – 4.848e+06 | 0 |
| 4.848e+06 – 4.979e+06 | 0 |
| 4.979e+06 – 5.11e+06 | 0 |
| 5.11e+06 – 5.241e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 1 – 2.375 | 97 |
| 2.375 – 3.75 | 0 |
| 3.75 – 5.125 | 90 |
| 5.125 – 6.5 | 58 |
| 6.5 – 7.875 | 0 |
| 7.875 – 9.25 | 73 |
| 9.25 – 10.62 | 3 |
| 10.62 – 12 | 1 |
| 12 – 13.38 | 226 |
| 13.38 – 14.75 | 0 |
| 14.75 – 16.12 | 49 |
| 16.12 – 17.5 | 102 |
| 17.5 – 18.88 | 92 |
| 18.88 – 20.25 | 204 |
| 20.25 – 21.62 | 120 |
| 21.62 – 23 | 64 |
| 23 – 24.38 | 40 |
| 24.38 – 25.75 | 14 |
| 25.75 – 27.12 | 170 |
| 27.12 – 28.5 | 82 |
| 28.5 – 29.88 | 115 |
| 29.88 – 31.25 | 149 |
| 31.25 – 32.62 | 17 |
| 32.62 – 34 | 10 |
| 34 – 35.38 | 54 |
| 35.38 – 36.75 | 62 |
| 36.75 – 38.12 | 153 |
| 38.12 – 39.5 | 88 |
| 39.5 – 40.88 | 77 |
| 40.88 – 42.25 | 103 |
| 42.25 – 43.62 | 0 |
| 43.62 – 45 | 5 |
| 45 – 46.38 | 112 |
| 46.38 – 47.75 | 95 |
| 47.75 – 49.12 | 283 |
| 49.12 – 50.5 | 14 |
| 50.5 – 51.88 | 133 |
| 51.88 – 53.25 | 39 |
| 53.25 – 54.62 | 55 |
| 54.62 – 56 | 95 |
Show data table
| chars | count |
|---|---|
| 16 – 17 | 26 |
| 17 – 18 | 72 |
| 18 – 19 | 121 |
| 19 – 20 | 190 |
| 20 – 21 | 264 |
| 21 – 22 | 407 |
| 22 – 24 | 420 |
| 24 – 25 | 363 |
| 25 – 26 | 320 |
| 26 – 27 | 240 |
| 27 – 28 | 230 |
| 28 – 29 | 142 |
| 29 – 30 | 126 |
| 30 – 31 | 133 |
| 31 – 32 | 35 |
| 32 – 33 | 23 |
| 33 – 34 | 12 |
| 34 – 35 | 6 |
| 35 – 36 | 2 |
| 36 – 38 | 0 |
| 38 – 39 | 1 |
| 39 – 40 | 1 |
| 40 – 41 | 0 |
| 41 – 42 | 1 |
| 42 – 43 | 1 |
| 43 – 44 | 0 |
| 44 – 45 | 2 |
| 45 – 46 | 0 |
| 46 – 47 | 1 |
| 47 – 48 | 1 |
| 48 – 49 | 0 |
| 49 – 50 | 0 |
| 50 – 51 | 0 |
| 51 – 53 | 0 |
| 53 – 54 | 2 |
| 54 – 55 | 1 |
| 55 – 56 | 0 |
| 56 – 57 | 0 |
| 57 – 58 | 0 |
| 58 – 59 | 1 |
Schema
7 columns| Alerts | ||||
|---|---|---|---|---|
| NAME | text | 0.0% | 3,144 |
near_unique
|
| B23025_002E | numeric | 0.0% | 3,028 |
high_skew
outliers
|
| B21001_002E | numeric | 0.0% | 2,424 |
high_skew
outliers
|
| B01003_001E | numeric | 0.0% | 3,080 |
high_skew
outliers
|
| state | numeric | 0.0% | 51 |
|
| county | numeric | 0.0% | 329 |
high_skew
outliers
|
| fips | numeric | 0.0% | 3,144 |
|
NAME
text identifier near_uniqueThis column holds full US county names with state suffix (e.g. 'X County, Texas'), as evidenced by 'county,' appearing in 2999 of 3144 rows and the next most common tokens being state names like Texas (256), Virginia (189), and Georgia (159). All 3144 values are unique with zero nulls or duplicates, and lengths cluster tightly between 16 and 59 characters (median 24). The 145 rows lacking the 'county,' token likely correspond to Louisiana parishes, Alaska boroughs, or independent cities, which is worth confirming. Treatment: Use as a join key after parsing into separate county and state fields.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- len_min
- 16
- len_max
- 59
- len_mean
- 24.16
- len_median
- 24
- len_p95
- 30.85
- word_mean
- 3.224
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,910
- readability_flesch_mean
- 6.826
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
B23025_002E
numeric feature high_skew outliersThis is the ACS variable B23025_002E, the count of people aged 16+ in the labor force, reported per row (likely one row per US county given n=3144). The distribution is extremely right-skewed (skew 13.14, kurtosis 288.57) with a median of 11,698 but a max of 5,240,842, and 14.3% of rows flagged as outliers — consistent with a few massive metros dwarfing thousands of small counties. No nulls or zeros, and 3028/3144 values are unique. Treatment: Log-transform before modelling to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,028
- min
- 36
- max
- 5.241e+06
- mean
- 5.378e+04
- median
- 11,698
- std
- 1.763e+05
- q1
- 4722
- q3
- 3.259e+04
- iqr
- 27,868
- skew
- 13.14
- kurtosis
- 288.6
- n_outliers
- 449
- outlier_rate
- 0.1428
- zero_rate
- 0
B21001_002E
numeric feature high_skew outliersThis is the ACS variable B21001_002E, a count of civilian veteran-eligible population per geographic unit (likely county, given n=3144). Values span 0 to 244,160 with a median of 1,547.5 but a mean of 5,419, and skew of 8.01 with kurtosis above 100 indicate a heavy right tail driven by 408 outlier rows (~13%). Near-zero null and zero rates confirm the count is populated nearly everywhere. Treatment: log-transform or normalize per-capita before modelling to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 2,424
- min
- 0
- max
- 244,160
- mean
- 5419
- median
- 1548
- std
- 1.311e+04
- q1
- 634.8
- q3
- 4428
- iqr
- 3,793
- skew
- 8.014
- kurtosis
- 100
- n_outliers
- 408
- outlier_rate
- 0.1298
- zero_rate
- 0.0003181
B01003_001E
numeric feature high_skew outliersThis is the ACS table B01003_001E, total population, reported for 3,144 rows — consistent with US counties. Values span 50 to 9,936,690 with median 25,784.5 versus mean 105,310.94, and skew of 13.17 with kurtosis 289.76 confirms an extreme long right tail (440 outliers, 14.0%). No nulls or zeros, and 3,080 of 3,144 values are unique. Treatment: log-transform before regression to tame the 13x skew.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,080
- min
- 50
- max
- 9.937e+06
- mean
- 1.053e+05
- median
- 2.578e+04
- std
- 3.338e+05
- q1
- 1.084e+04
- q3
- 6.808e+04
- iqr
- 57,244
- skew
- 13.17
- kurtosis
- 289.8
- n_outliers
- 440
- outlier_rate
- 0.1399
- zero_rate
- 0
state
numeric foreign_keyThis column holds 51 distinct integer codes ranging from 1 to 56 across 3144 rows with no nulls, matching the FIPS state code scheme (50 states plus DC, with gaps explaining why the max is 56). The near-uniform spread (IQR 27, skew -0.08, kurtosis -1.10) is consistent with a categorical identifier rather than a true numeric quantity. Row count of 3144 also aligns with US county-level data keyed by state. Treatment: Treat as categorical FIPS code; left-join to a state lookup rather than using as a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- min
- 1
- max
- 56
- mean
- 30.26
- median
- 29
- std
- 15.15
- q1
- 18
- q3
- 45
- iqr
- 27
- skew
- -0.08128
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric identifier high_skew outliersThis column is named 'county' and contains integer codes ranging from 1 to 840 across 3144 rows with only 329 unique values, consistent with FIPS-style county numbers that repeat across states. The distribution is heavily right-skewed (skew 2.84, kurtosis 11.38) with 176 outliers (5.6%) above the upper fence, which is expected when a handful of states use higher county numbers. Despite being stored as numeric, the values are categorical identifiers, not measurements. Treatment: Treat as a categorical code (likely county FIPS); do not model as a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 329
- min
- 1
- max
- 840
- mean
- 103.9
- median
- 79
- std
- 107.6
- q1
- 35
- q3
- 133.5
- iqr
- 98.5
- skew
- 2.841
- kurtosis
- 11.38
- n_outliers
- 176
- outlier_rate
- 0.05598
- zero_rate
- 0
fips
numeric identifierThis is the U.S. county FIPS code: a 3144-row column with 3144 unique integer values, no nulls, ranging from 1001 to 56045. The distribution is uniform-like (skew -0.08, kurtosis -1.10) which is exactly what you'd expect from state-prefixed county identifiers, not a measured quantity. Treat it as a key, not a numeric feature. Treatment: Left-join on this id to county-level reference tables; do not feed as a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 1,001
- max
- 56,045
- mean
- 3.037e+04
- median
- 29,174
- std
- 1.517e+04
- q1
- 1.817e+04
- q3
- 4.508e+04
- iqr
- 26,905
- skew
- -0.07923
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0