saturn·

acs 2022 county

source /home/coolhand/html/datavis/data_trove/demographic/veterans/cache/acs_2022_county.parquet 3,144 rows 7 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 3,144 U.S. counties from the 2022 American Community Survey, with each row identified by FIPS, state, county code, and name, plus three Census table values: total population (B01003_001E), male veteran population (B21001_002E), and civilian labor force (B23025_002E). All three demographic measures are extremely right-skewed (skew of 13.2, 8.0, and 13.1) with hundreds of outlier counties — for example, total population ranges from 50 up to 9.94 million while the median is just 25,784. About 13-14% of counties register as outliers on each measure, reflecting the handful of very large metro counties dominating the tails. Start by looking at population and labor-force distributions on a log scale, and use the state field (51 unique values) to see how counties cluster geographically.

citing: row_count · column_count · columns.B01003_001E.stats · columns.B21001_002E.stats · columns.B23025_002E.stats · columns.state.n_unique · columns.NAME.top_words

Schema

7 columns
Per-column summary. Click column name to jump to its detail.
Alerts
NAME text 0.0% 3,144
near_unique
B23025_002E numeric 0.0% 3,028
high_skew outliers
B21001_002E numeric 0.0% 2,424
high_skew outliers
B01003_001E numeric 0.0% 3,080
high_skew outliers
state numeric 0.0% 51
county numeric 0.0% 329
high_skew outliers
fips numeric 0.0% 3,144

NAME

text identifier near_unique
This column holds full US county names with state suffix (e.g. 'X County, Texas'), as evidenced by 'county,' appearing in 2999 of 3144 rows and the next most common tokens being state names like Texas (256), Virginia (189), and Georgia (159). All 3144 values are unique with zero nulls or duplicates, and lengths cluster tightly between 16 and 59 characters (median 24). The 145 rows lacking the 'county,' token likely correspond to Louisiana parishes, Alaska boroughs, or independent cities, which is worth confirming. Treatment: Use as a join key after parsing into separate county and state fields. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
3,144
len_min
16
len_max
59
len_mean
24.16
len_median
24
len_p95
30.85
word_mean
3.224
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,910
readability_flesch_mean
6.826
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

B23025_002E

numeric feature high_skew outliers
This is the ACS variable B23025_002E, the count of people aged 16+ in the labor force, reported per row (likely one row per US county given n=3144). The distribution is extremely right-skewed (skew 13.14, kurtosis 288.57) with a median of 11,698 but a max of 5,240,842, and 14.3% of rows flagged as outliers — consistent with a few massive metros dwarfing thousands of small counties. No nulls or zeros, and 3028/3144 values are unique. Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
3,028
min
36
max
5.241e+06
mean
5.378e+04
median
11,698
std
1.763e+05
q1
4722
q3
3.259e+04
iqr
27,868
skew
13.14
kurtosis
288.6
n_outliers
449
outlier_rate
0.1428
zero_rate
0

B21001_002E

numeric feature high_skew outliers
This is the ACS variable B21001_002E, a count of civilian veteran-eligible population per geographic unit (likely county, given n=3144). Values span 0 to 244,160 with a median of 1,547.5 but a mean of 5,419, and skew of 8.01 with kurtosis above 100 indicate a heavy right tail driven by 408 outlier rows (~13%). Near-zero null and zero rates confirm the count is populated nearly everywhere. Treatment: log-transform or normalize per-capita before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
2,424
min
0
max
244,160
mean
5419
median
1548
std
1.311e+04
q1
634.8
q3
4428
iqr
3,793
skew
8.014
kurtosis
100
n_outliers
408
outlier_rate
0.1298
zero_rate
0.0003181

B01003_001E

numeric feature high_skew outliers
This is the ACS table B01003_001E, total population, reported for 3,144 rows — consistent with US counties. Values span 50 to 9,936,690 with median 25,784.5 versus mean 105,310.94, and skew of 13.17 with kurtosis 289.76 confirms an extreme long right tail (440 outliers, 14.0%). No nulls or zeros, and 3,080 of 3,144 values are unique. Treatment: log-transform before regression to tame the 13x skew. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
3,080
min
50
max
9.937e+06
mean
1.053e+05
median
2.578e+04
std
3.338e+05
q1
1.084e+04
q3
6.808e+04
iqr
57,244
skew
13.17
kurtosis
289.8
n_outliers
440
outlier_rate
0.1399
zero_rate
0

state

numeric foreign_key
This column holds 51 distinct integer codes ranging from 1 to 56 across 3144 rows with no nulls, matching the FIPS state code scheme (50 states plus DC, with gaps explaining why the max is 56). The near-uniform spread (IQR 27, skew -0.08, kurtosis -1.10) is consistent with a categorical identifier rather than a true numeric quantity. Row count of 3144 also aligns with US county-level data keyed by state. Treatment: Treat as categorical FIPS code; left-join to a state lookup rather than using as a numeric feature. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
51
min
1
max
56
mean
30.26
median
29
std
15.15
q1
18
q3
45
iqr
27
skew
-0.08128
kurtosis
-1.099
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric identifier high_skew outliers
This column is named 'county' and contains integer codes ranging from 1 to 840 across 3144 rows with only 329 unique values, consistent with FIPS-style county numbers that repeat across states. The distribution is heavily right-skewed (skew 2.84, kurtosis 11.38) with 176 outliers (5.6%) above the upper fence, which is expected when a handful of states use higher county numbers. Despite being stored as numeric, the values are categorical identifiers, not measurements. Treatment: Treat as a categorical code (likely county FIPS); do not model as a numeric feature. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
329
min
1
max
840
mean
103.9
median
79
std
107.6
q1
35
q3
133.5
iqr
98.5
skew
2.841
kurtosis
11.38
n_outliers
176
outlier_rate
0.05598
zero_rate
0

fips

numeric identifier
This is the U.S. county FIPS code: a 3144-row column with 3144 unique integer values, no nulls, ranging from 1001 to 56045. The distribution is uniform-like (skew -0.08, kurtosis -1.10) which is exactly what you'd expect from state-prefixed county identifiers, not a measured quantity. Treat it as a key, not a numeric feature. Treatment: Left-join on this id to county-level reference tables; do not feed as a numeric feature. high · anthropic:claude-opus-4-7
n
3,144
nulls
0 (0.0%)
unique
3,144
min
1,001
max
56,045
mean
3.037e+04
median
29,174
std
1.517e+04
q1
1.817e+04
q3
4.508e+04
iqr
26,905
skew
-0.07923
kurtosis
-1.099
n_outliers
0
outlier_rate
0
zero_rate
0