saturn·

education education by county

source /home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv 3,222 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 US county-level education records with 6 columns covering county identifiers (county_name, fips, state) and educational attainment metrics (pct_hs_or_higher, pct_bachelors_or_higher, total_25_plus). The bachelor's degree rate averages 23.5% but ranges from 0% to nearly 79%, suggesting wide regional disparities worth investigating. The total_25_plus population column is heavily skewed (skew=13.5) with 440 outliers and a max of nearly 6.9 million, so any analysis using it should consider log transforms or per-capita normalization. State coverage is fairly even across 52 entries, with TX, GA, and VA contributing the most counties.

citing: row_count · column_count · columns.pct_bachelors_or_higher.stats · columns.pct_hs_or_higher.stats · columns.total_25_plus.stats · columns.state.top_values · columns.county_name.top_values

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 1,960
short_text duplicates
state categorical 0.0% 52
total_25_plus numeric 0.0% 3,140
high_skew outliers
pct_hs_or_higher numeric 0.0% 1,612
pct_bachelors_or_higher numeric 0.0% 1,982

fips

numeric identifier
This column holds U.S. FIPS county/state codes: every one of the 3222 rows is unique, no nulls, and values span 1001 to 72153, consistent with state-prefixed county identifiers. The distribution is roughly symmetric (skew 0.16, kurtosis -0.63) with no outliers, which is expected for an enumerated geographic key rather than a measured quantity. Treat the numeric stats as incidental — these are categorical identifiers, not magnitudes. Treatment: Cast to zero-padded string and use as a join key to county/state geographies; do not model as a number. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text metadata short_text duplicates
This is a US county-level place-name field: 2-word entries averaging 14 characters, with 'County' appearing 2999 times alongside 'Parish' (64, Louisiana) and 'Municipio' (78, Puerto Rico). Duplication is heavy at 39.2% (1262 rows) because common names like Washington County (30), Jefferson County (25), and Franklin County (24) recur across states — so this column is not unique on its own. With 1960 distinct values across 3222 rows, it needs a state qualifier to act as a key. Treatment: Concatenate with a state/territory code before using as a join key; do not treat as unique. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,960
len_min
10
len_max
46
len_mean
14.17
len_median
14
len_p95
18
word_mean
2.083
word_median
2
n_empty
0
n_duplicates
1,262
duplicate_rate
0.3917
vocab_size
1,963
readability_flesch_mean
33.36
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

categorical feature
This is a US state code column with 52 distinct values, matching the 50 states plus likely DC and a territory. Distribution is fairly even (entropy ratio 0.93), with TX leading at 254 rows (7.88%) followed by GA, VA, and KY — consistent with county-level row counts where larger states contribute more records. No nulls. Treatment: one-hot or target-encode for modelling; usable as a join key to state-level reference tables. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
top_value
TX
top_rate
0.07883
cardinality
52
entropy
5.314
entropy_ratio
0.9322

total_25_plus

numeric feature high_skew outliers
A heavily right-skewed count of population (or units) aged 25 and over per row, ranging from 50 to 6,909,650 with a median of 18,313.5 but a mean of 71,074.3. Skew of 13.5 and kurtosis of 306.9 indicate a long heavy tail, and 440 rows (13.7%) flag as outliers. No nulls or zeros, and 3,140 of 3,222 values are unique, consistent with geographic aggregates of varying size. Treatment: log-transform before any regression or distance-based modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,140
min
50
max
6.91e+06
mean
7.107e+04
median
1.831e+04
std
2.266e+05
q1
7696
q3
4.649e+04
iqr
3.879e+04
skew
13.51
kurtosis
306.9
n_outliers
440
outlier_rate
0.1366
zero_rate
0

pct_hs_or_higher

numeric feature
This column reports the percentage of a population with at least a high school education, ranging from 33.33 to 99.69 with a mean of 88.08 and median of 89.39. The distribution is left-skewed (skew -1.33) with heavy tails (kurtosis 3.74), and 86 low-end outliers (2.67%) pull below the bulk concentrated between Q1 84.9 and Q3 92.47. With 1612 unique values across 3222 rows and no nulls, it looks like a clean geographic feature (likely county- or tract-level). Treatment: Use as-is or apply a reflected log/Box-Cox to address the left skew before linear modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,612
min
33.33
max
99.69
mean
88.08
median
89.39
std
5.97
q1
84.9
q3
92.47
iqr
7.567
skew
-1.328
kurtosis
3.742
n_outliers
86
outlier_rate
0.02669
zero_rate
0

pct_bachelors_or_higher

numeric feature
This column reports the percentage of residents with a bachelor's degree or higher across 3,222 rows, ranging from 0.0 to 78.87 with a median of 21.07 and mean of 23.50. The distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 high-end outliers (4.4%) reflecting a long tail of highly-educated areas. Near-zero zero_rate (0.0003) and no nulls suggest clean coverage. Treatment: Consider a log or sqrt transform to tame the right skew before linear modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,982
min
0
max
78.87
mean
23.5
median
21.07
std
9.983
q1
16.59
q3
27.85
iqr
11.26
skew
1.357
kurtosis
2.306
n_outliers
141
outlier_rate
0.04376
zero_rate
0.0003104