saturn

/home/coolhand/datasets/us-inequality-atlas/economic/unemployment_by_county.csv 3,222 rows sample n=3,222 seed 42 2026-05-01T17:08:56+00:00

Overview

Source/home/coolhand/datasets/us-inequality-atlas/economic/unemployment_by_county.csv
Total rows3,222
Profiled sample3,222
Columns8
Generated2026-05-01T17:08:56+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 3,222 US county-level labor market records with 8 columns covering county identifiers (FIPS, name, state) and workforce statistics (labor force, total 16+, unemployed, unemployment rate, participation rate). The unemployment rate averages 5.13% with a median of 4.69%, ranging up to 31.99%, so the right tail is worth inspecting for distressed counties. Population-based counts (labor_force, total_16_plus, unemployed) are extremely right-skewed (skew >13) with hundreds of outliers — expected when a few large metros sit alongside small rural counties, but it means you should likely log-transform before modeling. Texas (254), Georgia (159), and Virginia (133) contribute the most counties, reflecting state geography rather than any sampling bias. County names show a 39% duplicate rate driven by repeated names like Washington, Jefferson, and Franklin Counties across states — join on FIPS, not name.

fips high anthropic:claude-opus-4-7

This is the U.S. county FIPS code, evidenced by every one of the 3222 rows being unique with no nulls and values spanning 1001 to 72153 — the standard 5-digit state+county encoding. The distribution is essentially uniform across the FIPS range (skew 0.16, kurtosis -0.63, no outliers), which is expected for an identifier rather than a measured quantity.

county_name high anthropic:claude-opus-4-7

This column holds US county-level place names: 3,222 rows with 1,960 unique values, all between 10 and 46 characters and averaging ~2 words. The vocabulary is dominated by 'county' (2,999 occurrences) but also includes 'municipio' (78, Puerto Rico), 'parish' (64, Louisiana), and 'city' (47), so the field mixes several jurisdiction types. Note the 39.2% duplicate rate — recurring names like Washington County (30), Jefferson County (25), and Franklin County (24) appear across many states, so this name alone does not uniquely identify a county.

state high anthropic:claude-opus-4-7

Two-letter US state/territory codes across 3,222 rows with 52 distinct values and no nulls — consistent with the 50 states plus DC and likely Puerto Rico. Distribution tracks county counts rather than population: TX leads at 254 (7.88%), followed by GA (159), VA (133) and KY (120), suggesting one row per county/equivalent. High entropy ratio (0.93) confirms a fairly even spread across states.

total_16_plus high anthropic:claude-opus-4-7

This is a numeric population-style count of people aged 16+, with 3222 non-null rows and 3148 unique values spanning 50 to 8,086,852. The distribution is severely right-skewed (skew 13.49, kurtosis 305.88): the median is 21,167.5 but the mean is 83,549.93 and 13.7% of rows (443) flag as outliers. The std of 265,514 dwarfs the IQR of 45,507.75, consistent with a long upper tail typical of geographic aggregates.

labor_force high anthropic:claude-opus-4-7

This column appears to be the size of the labor force per record, likely at a US county or similar geographic unit given the 3,222 rows and 3,099 unique values. The distribution is severely right-skewed (skew 13.29, kurtosis 295.22) with a median of 11,608.5 but a max of 5,240,842, and 14.2% of values flagged as outliers. No nulls or zeros, but the gap between Q3 (31,930.5) and the maximum signals a long tail of very large jurisdictions.

unemployed high anthropic:claude-opus-4-7

This is a numeric count of unemployed persons per record, with 3222 rows, no nulls, and 1859 unique values. The distribution is severely right-skewed (skew 16.82, kurtosis 450.4): the median is 589 while the mean is 2827 and the max reaches 365544, and 417 rows (12.9%) flag as outliers. Only 0.56% of records are zero, so sparsity is not the issue—a few extreme values are.

labor_force_participation_rate high anthropic:claude-opus-4-7

Numeric column capturing labor force participation rate, almost certainly expressed as a percentage given the 18.63 to 84.04 range and mean of 57.89. The distribution is moderately left-skewed (-0.58) with a tight IQR of 10.695 around a median of 58.72, and only 1.18% of values flagged as outliers. No nulls or zeros across 3,222 rows, and 1,944 unique values suggest a continuous measurement rather than a coded category.

unemployment_rate high anthropic:claude-opus-4-7

This is a county- or region-level unemployment rate expressed as a percentage, with values from 0.0 to 31.99 and a median of 4.69. The distribution is heavily right-skewed (skew 2.55, kurtosis 12.81) with 154 outliers (4.78%) pulling the mean above the median, and a small share of zero readings (0.56%).

Numeric correlation

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

95th-percentile length under 20 chars 39.2% duplicate strings
rows3,222
null0 (0.0%)
unique1,960
len_min10
len_max46
len_mean14.172
len_median14.000
len_p9518.000
word_mean2.083
word_median2.000
n_empty0
n_duplicates1,262
duplicate_rate0.392
vocab_size1,963
readability_flesch_mean33.359
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County
  2. Cheatham County
  3. Piute County
  4. Lamb County
  5. Martin County
  6. Sheridan County
  7. Chickasaw County
  8. Rockingham County
  9. Liberty County
  10. Clark County

state categorical

rows3,222
null0 (0.0%)
unique52
top_valueTX
top_rate0.079
cardinality52
entropy5.314
entropy_ratio0.932
Top values (rank 1–20)
  1. TX — 254
  2. GA — 159
  3. VA — 133
  4. KY — 120
  5. MO — 115
  6. KS — 105
  7. IL — 102
  8. NC — 100
  9. IA — 99
  10. TN — 95
  11. NE — 93
  12. IN — 92
  13. OH — 88
  14. MN — 87
  15. MI — 83
  16. MS — 82
  17. PR — 78
  18. OK — 77
  19. AR — 75
  20. WI — 72

total_16_plus numeric

skew=+13.49 13.7% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,148
min50.000
max8,086,852
mean83,550
median21,168
std265,514
q18,986
q354,493
iqr45,508
skew13.494
kurtosis305.881
n_outliers443
outlier_rate0.137
zero_rate0.000

labor_force numeric

skew=+13.29 14.2% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,099
min36.000
max5,240,842
mean52,869
median11,608
std174,241
q14,777
q331,930
iqr27,153
skew13.286
kurtosis295.221
n_outliers459
outlier_rate0.142
zero_rate0.000

unemployed numeric

skew=+16.82 12.9% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique1,859
min0.000
max365,544
mean2,827
median589.000
std10,828
q1223.000
q31,706
iqr1,482
skew16.825
kurtosis450.417
n_outliers417
outlier_rate0.129
zero_rate5.59e-03

labor_force_participation_rate numeric

rows3,222
null0 (0.0%)
unique1,944
min18.630
max84.040
mean57.891
median58.725
std8.041
q152.970
q363.665
iqr10.695
skew-0.577
kurtosis0.450
n_outliers38
outlier_rate0.012
zero_rate0.000

unemployment_rate numeric

skew=+2.55
rows3,222
null0 (0.0%)
unique950
min0.000
max31.990
mean5.127
median4.690
std2.926
q13.420
q36.080
iqr2.660
skew2.545
kurtosis12.812
n_outliers154
outlier_rate0.048
zero_rate5.59e-03