economic unemployment by county

source /home/coolhand/datasets/us-inequality-atlas/economic/unemployment_by_county.csv 3,222 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 US county-level labor market records with 8 columns covering county identifiers (FIPS, name, state) and workforce statistics (labor force, total 16+, unemployed, unemployment rate, participation rate). The unemployment rate averages 5.13% with a median of 4.69%, ranging up to 31.99%, so the right tail is worth inspecting for distressed counties. Population-based counts (labor_force, total_16_plus, unemployed) are extremely right-skewed (skew >13) with hundreds of outliers — expected when a few large metros sit alongside small rural counties, but it means you should likely log-transform before modeling. Texas (254), Georgia (159), and Virginia (133) contribute the most counties, reflecting state geography rather than any sampling bias. County names show a 39% duplicate rate driven by repeated names like Washington, Jefferson, and Franklin Counties across states — join on FIPS, not name.

citing: row_count · column_count · columns.unemployment_rate.stats · columns.labor_force.stats · columns.total_16_plus.stats · columns.unemployed.stats · columns.labor_force_participation_rate.stats · columns.state.top_values · columns.county_name.stats · columns.county_name.top_values

Charts the summary said to look at first

unemployment_rate · Check the right tail beyond ~10% to find counties with persistent labor distress.

Show data table

Histogram bins for unemployment_rate (median: 4.69).
bin	count
0 – 0.7997	60
0.7997 – 1.599	90
1.599 – 2.399	197
2.399 – 3.199	333
3.199 – 3.999	464
3.999 – 4.798	539
4.798 – 5.598	492
5.598 – 6.398	361
6.398 – 7.198	217
7.198 – 7.997	142
7.997 – 8.797	85
8.797 – 9.597	61
9.597 – 10.4	36
10.4 – 11.2	36
11.2 – 12	13
12 – 12.8	18
12.8 – 13.6	13
13.6 – 14.4	12
14.4 – 15.2	11
15.2 – 15.99	8
15.99 – 16.79	5
16.79 – 17.59	4
17.59 – 18.39	2
18.39 – 19.19	1
19.19 – 19.99	3
19.99 – 20.79	3
20.79 – 21.59	4
21.59 – 22.39	3
22.39 – 23.19	2
23.19 – 23.99	3
23.99 – 24.79	1
24.79 – 25.59	0
25.59 – 26.39	0
26.39 – 27.19	0
27.19 – 27.99	0
27.99 – 28.79	0
28.79 – 29.59	0
29.59 – 30.39	0
30.39 – 31.19	1
31.19 – 31.99	2

labor_force_participation_rate · Roughly symmetric around 58% — outliers below 30% flag counties with weak workforce engagement.

Show data table

Histogram bins for labor_force_participation_rate (median: 58.724999999999994).
bin	count
18.63 – 20.27	1
20.27 – 21.9	0
21.9 – 23.54	0
23.54 – 25.17	1
25.17 – 26.81	1
26.81 – 28.44	1
28.44 – 30.08	0
30.08 – 31.71	4
31.71 – 33.35	6
33.35 – 34.98	8
34.98 – 36.62	10
36.62 – 38.25	24
38.25 – 39.89	30
39.89 – 41.52	37
41.52 – 43.16	51
43.16 – 44.79	61
44.79 – 46.43	60
46.43 – 48.06	76
48.06 – 49.7	109
49.7 – 51.34	141
51.34 – 52.97	186
52.97 – 54.61	174
54.61 – 56.24	235
56.24 – 57.88	245
57.88 – 59.51	272
59.51 – 61.15	270
61.15 – 62.78	277
62.78 – 64.42	252
64.42 – 66.05	221
66.05 – 67.69	187
67.69 – 69.32	118
69.32 – 70.96	82
70.96 – 72.59	41
72.59 – 74.23	19
74.23 – 75.86	10
75.86 – 77.5	4
77.5 – 79.13	6
79.13 – 80.77	1
80.77 – 82.4	0
82.4 – 84.04	1

state · Counts per state show TX, GA, and VA dominate the row count simply due to having more counties.

Show data table

Top values for state (20 unique shown, of 52 total).
value	count	share
TX	254	7.9%
GA	159	4.9%
VA	133	4.1%
KY	120	3.7%
MO	115	3.6%
KS	105	3.3%
IL	102	3.2%
NC	100	3.1%
IA	99	3.1%
TN	95	2.9%
NE	93	2.9%
IN	92	2.9%
OH	88	2.7%
MN	87	2.7%
MI	83	2.6%
MS	82	2.5%
PR	78	2.4%
OK	77	2.4%
AR	75	2.3%
WI	72	2.2%

labor_force · Severe right skew (max 5.2M vs median 11.6K) — consider a log scale before any modeling.

Show data table

Histogram bins for labor_force (median: 11608.5).
bin	count
36 – 1.311e+05	2944
1.311e+05 – 2.621e+05	136
2.621e+05 – 3.931e+05	52
3.931e+05 – 5.241e+05	41
5.241e+05 – 6.551e+05	14
6.551e+05 – 7.862e+05	10
7.862e+05 – 9.172e+05	5
9.172e+05 – 1.048e+06	5
1.048e+06 – 1.179e+06	4
1.179e+06 – 1.31e+06	2
1.31e+06 – 1.441e+06	3
1.441e+06 – 1.572e+06	0
1.572e+06 – 1.703e+06	1
1.703e+06 – 1.834e+06	1
1.834e+06 – 1.965e+06	0
1.965e+06 – 2.096e+06	0
2.096e+06 – 2.227e+06	0
2.227e+06 – 2.358e+06	1
2.358e+06 – 2.489e+06	1
2.489e+06 – 2.62e+06	0
2.62e+06 – 2.751e+06	0
2.751e+06 – 2.882e+06	1
2.882e+06 – 3.013e+06	0
3.013e+06 – 3.145e+06	0
3.145e+06 – 3.276e+06	0
3.276e+06 – 3.407e+06	0
3.407e+06 – 3.538e+06	0
3.538e+06 – 3.669e+06	0
3.669e+06 – 3.8e+06	0
3.8e+06 – 3.931e+06	0
3.931e+06 – 4.062e+06	0
4.062e+06 – 4.193e+06	0
4.193e+06 – 4.324e+06	0
4.324e+06 – 4.455e+06	0
4.455e+06 – 4.586e+06	0
4.586e+06 – 4.717e+06	0
4.717e+06 – 4.848e+06	0
4.848e+06 – 4.979e+06	0
4.979e+06 – 5.11e+06	0
5.11e+06 – 5.241e+06	1

unemployed · Heavy-tailed with 417 outliers; a handful of large counties account for most absolute unemployment.

Show data table

Histogram bins for unemployed (median: 589.0).
bin	count
0 – 9139	3018
9139 – 1.828e+04	96
1.828e+04 – 2.742e+04	53
2.742e+04 – 3.655e+04	23
3.655e+04 – 4.569e+04	8
4.569e+04 – 5.483e+04	4
5.483e+04 – 6.397e+04	3
6.397e+04 – 7.311e+04	4
7.311e+04 – 8.225e+04	4
8.225e+04 – 9.139e+04	3
9.139e+04 – 1.005e+05	1
1.005e+05 – 1.097e+05	2
1.097e+05 – 1.188e+05	0
1.188e+05 – 1.279e+05	0
1.279e+05 – 1.371e+05	0
1.371e+05 – 1.462e+05	0
1.462e+05 – 1.554e+05	0
1.554e+05 – 1.645e+05	1
1.645e+05 – 1.736e+05	0
1.736e+05 – 1.828e+05	0
1.828e+05 – 1.919e+05	0
1.919e+05 – 2.01e+05	1
2.01e+05 – 2.102e+05	0
2.102e+05 – 2.193e+05	0
2.193e+05 – 2.285e+05	0
2.285e+05 – 2.376e+05	0
2.376e+05 – 2.467e+05	0
2.467e+05 – 2.559e+05	0
2.559e+05 – 2.65e+05	0
2.65e+05 – 2.742e+05	0
2.742e+05 – 2.833e+05	0
2.833e+05 – 2.924e+05	0
2.924e+05 – 3.016e+05	0
3.016e+05 – 3.107e+05	0
3.107e+05 – 3.199e+05	0
3.199e+05 – 3.29e+05	0
3.29e+05 – 3.381e+05	0
3.381e+05 – 3.473e+05	0
3.473e+05 – 3.564e+05	0
3.564e+05 – 3.655e+05	1

Schema

8 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
fips	numeric	0.0%	3,222
county_name	text	0.0%	1,960	short_text duplicates
state	categorical	0.0%	52
total_16_plus	numeric	0.0%	3,148	high_skew outliers
labor_force	numeric	0.0%	3,099	high_skew outliers
unemployed	numeric	0.0%	1,859	high_skew outliers
labor_force_participation_rate	numeric	0.0%	1,944
unemployment_rate	numeric	0.0%	950	high_skew

fips

numeric identifier

This is the U.S. county FIPS code, evidenced by every one of the 3222 rows being unique with no nulls and values spanning 1001 to 72153 — the standard 5-digit state+county encoding. The distribution is essentially uniform across the FIPS range (skew 0.16, kurtosis -0.63, no outliers), which is expected for an identifier rather than a measured quantity. Treatment: Treat as a categorical county key; left-join on this id rather than feeding it as a numeric feature. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,222
min: 1,001
max: 72,153
mean: 3.138e+04
median: 30,022
std: 1.63e+04
q1: 1.903e+04
q3: 4.61e+04
iqr: 27,075
skew: 0.1574
kurtosis: -0.6314
n_outliers: 0
outlier_rate: 0
zero_rate: 0

county_name

text metadata short_text duplicates

This column holds US county-level place names: 3,222 rows with 1,960 unique values, all between 10 and 46 characters and averaging ~2 words. The vocabulary is dominated by 'county' (2,999 occurrences) but also includes 'municipio' (78, Puerto Rico), 'parish' (64, Louisiana), and 'city' (47), so the field mixes several jurisdiction types. Note the 39.2% duplicate rate — recurring names like Washington County (30), Jefferson County (25), and Franklin County (24) appear across many states, so this name alone does not uniquely identify a county. Treatment: Pair with a state code to form a unique key before joining or aggregating. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,960
len_min: 10
len_max: 46
len_mean: 14.17
len_median: 14
len_p95: 18
word_mean: 2.083
word_median: 2
n_empty: 0
n_duplicates: 1,262
duplicate_rate: 0.3917
vocab_size: 1,963
readability_flesch_mean: 33.36
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

state

categorical feature

Two-letter US state/territory codes across 3,222 rows with 52 distinct values and no nulls — consistent with the 50 states plus DC and likely Puerto Rico. Distribution tracks county counts rather than population: TX leads at 254 (7.88%), followed by GA (159), VA (133) and KY (120), suggesting one row per county/equivalent. High entropy ratio (0.93) confirms a fairly even spread across states. Treatment: one-hot or target-encode for modelling; useful as a join key to state-level reference tables. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 52
top_value: TX
top_rate: 0.07883
cardinality: 52
entropy: 5.314
entropy_ratio: 0.9322

total_16_plus

numeric feature high_skew outliers

This is a numeric population-style count of people aged 16+, with 3222 non-null rows and 3148 unique values spanning 50 to 8,086,852. The distribution is severely right-skewed (skew 13.49, kurtosis 305.88): the median is 21,167.5 but the mean is 83,549.93 and 13.7% of rows (443) flag as outliers. The std of 265,514 dwarfs the IQR of 45,507.75, consistent with a long upper tail typical of geographic aggregates. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,148
min: 50
max: 8.087e+06
mean: 8.355e+04
median: 2.117e+04
std: 2.655e+05
q1: 8986
q3: 5.449e+04
iqr: 4.551e+04
skew: 13.49
kurtosis: 305.9
n_outliers: 443
outlier_rate: 0.1375
zero_rate: 0

labor_force

numeric feature high_skew outliers

This column appears to be the size of the labor force per record, likely at a US county or similar geographic unit given the 3,222 rows and 3,099 unique values. The distribution is severely right-skewed (skew 13.29, kurtosis 295.22) with a median of 11,608.5 but a max of 5,240,842, and 14.2% of values flagged as outliers. No nulls or zeros, but the gap between Q3 (31,930.5) and the maximum signals a long tail of very large jurisdictions. Treatment: log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 3,099
min: 36
max: 5.241e+06
mean: 5.287e+04
median: 1.161e+04
std: 1.742e+05
q1: 4777
q3: 3.193e+04
iqr: 2.715e+04
skew: 13.29
kurtosis: 295.2
n_outliers: 459
outlier_rate: 0.1425
zero_rate: 0

unemployed

numeric feature high_skew outliers

This is a numeric count of unemployed persons per record, with 3222 rows, no nulls, and 1859 unique values. The distribution is severely right-skewed (skew 16.82, kurtosis 450.4): the median is 589 while the mean is 2827 and the max reaches 365544, and 417 rows (12.9%) flag as outliers. Only 0.56% of records are zero, so sparsity is not the issue—a few extreme values are. Treatment: Log-transform (or winsorize) before any modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,859
min: 0
max: 365,544
mean: 2827
median: 589
std: 1.083e+04
q1: 223
q3: 1706
iqr: 1482
skew: 16.82
kurtosis: 450.4
n_outliers: 417
outlier_rate: 0.1294
zero_rate: 0.005587

labor_force_participation_rate

numeric feature

Numeric column capturing labor force participation rate, almost certainly expressed as a percentage given the 18.63 to 84.04 range and mean of 57.89. The distribution is moderately left-skewed (-0.58) with a tight IQR of 10.695 around a median of 58.72, and only 1.18% of values flagged as outliers. No nulls or zeros across 3,222 rows, and 1,944 unique values suggest a continuous measurement rather than a coded category. Treatment: Use as-is for modelling; optionally standardize since values are bounded percentages. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 1,944
min: 18.63
max: 84.04
mean: 57.89
median: 58.72
std: 8.041
q1: 52.97
q3: 63.66
iqr: 10.7
skew: -0.5766
kurtosis: 0.4502
n_outliers: 38
outlier_rate: 0.01179
zero_rate: 0

unemployment_rate

numeric feature high_skew

This is a county- or region-level unemployment rate expressed as a percentage, with values from 0.0 to 31.99 and a median of 4.69. The distribution is heavily right-skewed (skew 2.55, kurtosis 12.81) with 154 outliers (4.78%) pulling the mean above the median, and a small share of zero readings (0.56%). Treatment: Apply a log or winsorising transform before regression to tame the right tail. high · anthropic:claude-opus-4-7

n: 3,222
nulls: 0 (0.0%)
unique: 950
min: 0
max: 31.99
mean: 5.127
median: 4.69
std: 2.926
q1: 3.42
q3: 6.08
iqr: 2.66
skew: 2.545
kurtosis: 12.81
n_outliers: 154
outlier_rate: 0.0478
zero_rate: 0.005587