This is the FIPS county/region code, stored numerically. Every one of the 3222 rows is unique with no nulls, and the range (1001 to 72153) plus the mild skew (0.16) and absence of outliers is consistent with the standard US FIPS code distribution across states and territories.
saturn
/home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv 3,222 rows sample n=3,222 seed 42 2026-05-01T17:05:21+00:00
Overview
| Source | /home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv |
| Total rows | 3,222 |
| Profiled sample | 3,222 |
| Columns | 6 |
| Generated | 2026-05-01T17:05:21+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
This column holds US county-level place names — almost every value ends in 'County' (2999 of 3222), with smaller contingents of Louisiana 'parish' (64), Puerto Rico 'municipio' (78), and 'city' (47) designations. Despite 3222 rows there are only 1960 unique strings and a 39.2% duplicate rate, because common county names like 'Washington County' (30) and 'Jefferson County' (25) recur across states. Lengths are tight (min 10, max 46, mean 14.2) and there is no state qualifier in the string, so the name alone will not uniquely identify a county.
This is a US state code column with 52 distinct values, matching the 50 states plus likely DC and one territory. Distribution is fairly even (entropy ratio 0.93), with TX leading at 254 rows (7.9%) followed by GA, VA, and KY — consistent with a county- or jurisdiction-level dataset where TX dominates due to its 254 counties. No nulls.
A heavily right-skewed count of population/entities aged 25+, ranging from 50 to 6,909,650 with a median of 18,313.5 but mean of 71,074.3. Skew of 13.51 and kurtosis of 306.9 indicate extreme tail behavior, and 13.66% of rows (440) flag as outliers. The std of 226,586 dwarfs the IQR of 38,789.5, consistent with a few very large geographies dominating.
This column appears to capture the percentage of adults with at least a high-school education, likely at a county or similar geographic level given the 3222 rows. Values are tightly concentrated in the high 80s (median 89.39, IQR 7.57) and left-skewed (skew -1.33, kurtosis 3.74), with a long lower tail reaching 33.33 and 86 outliers (2.7%) flagged. No nulls or zeros, and 1612 unique values suggest reasonable granularity.
This column reports the percent of adults with a bachelor's degree or higher across 3,222 rows, ranging from 0.0 to 78.87 with a mean of 23.50 and median of 21.07. The distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 high-end outliers (4.4%) reflecting a long tail of highly-educated areas. Nulls are absent and only one row sits at zero, so coverage is effectively complete.
Errors during insight pass (1)
dataset:__global__:anthropic:claude-opus-4-7: Json5EOF — ("Unclosed b'array' starting near 1663", {'narrative': "This dataset covers educational attainment for 3,222 US counties, with six columns spanning identifiers (FIPS, county name, state) and three numeric measures: total population aged 25+, percent with a high school diploma or higher, and percent with a bachelor's degree or higher. The county-level population (total_25_plus) is extremely skewed (skew 13.5, max ~6.9M vs. median ~18.3K) with 440 outliers, so any analysis using it should consider a log transform or per-capita framing. Educational attainment is the more interesting signal: pct_hs_or_higher clusters tightly near 88% (left-skewed), while pct_bachelors_or_higher is right-skewed with a mean of ~23.5% and a long tail up to ~79%, pointing to large disparities in higher-ed attainment across counties. Texas, Georgia, and Virginia contribute the most counties, and 'Washington County' is the most common name (30 occurrences), reflecting the ~39% duplicate rate in county_name where state context is needed for uniqueness.", 'confidence': 'high', 'evidence_keys': ['row_count', 'column_count', 'columns.county_name.stats.duplicate_rate', 'columns.county_name.top_values', 'columns.total_25_plus.stats.skew', 'columns.total_25_plus.stats.median', 'columns.total_25_plus.stats.max', 'columns.total_25_plus.stats.n_outliers', 'columns.pct_bachelors_or_higher.stats.mean', 'columns.pct_bachelors_or_higher.stats.max', 'columns.pct_bachelors_or_higher.stats.skew', 'columns.pct_hs_or_higher.stats.mean', 'columns.pct_hs_or_higher.stats.skew', 'columns.state.top_values', 'columns.state.n_unique'], 'featured_charts': [{'column': 'pct_bachelors_or_higher', 'kind': 'histogram', 'caption': 'Right-skewed distribution showing most counties below 28% but a long tail of highly educated counties up to ~79%.'}, {'column': 'pct_hs_or_higher', 'kind': 'histogram', 'caption': 'Left-skewed distribution clustered near 88-92%, with a tail of counties as low as 33% worth flagging.'}, {'column': 'total_25_plus', 'kind': 'histogram', 'caption': 'Extreme right skew (skew ~13.5) with 440 outliers — consider a log scale before plotting.'}, {'column': 'state', 'kind': 'bar', 'caption': 'County counts per state, with TX (254), GA (159), and VA (133) contributing the most rows.'}, {'column': 'county_name', 'kind': 'bar', 'caption': "Most common county names like Washington, Jefferson, and Franklin — a reminder that names alone aren't unique keys."}]}, None)
Numeric correlation
fips numeric
county_name text
Sample values (first 10)
- Bibb County
- Cheatham County
- Piute County
- Lamb County
- Martin County
- Sheridan County
- Chickasaw County
- Rockingham County
- Liberty County
- Clark County
state categorical
Top values (rank 1–20)
- TX — 254
- GA — 159
- VA — 133
- KY — 120
- MO — 115
- KS — 105
- IL — 102
- NC — 100
- IA — 99
- TN — 95
- NE — 93
- IN — 92
- OH — 88
- MN — 87
- MI — 83
- MS — 82
- PR — 78
- OK — 77
- AR — 75
- WI — 72