saturn

/home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv 3,222 rows sample n=3,222 seed 42 2026-05-01T17:05:21+00:00

Overview

Source/home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv
Total rows3,222
Profiled sample3,222
Columns6
Generated2026-05-01T17:05:21+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

fips high anthropic:claude-opus-4-7

This is the FIPS county/region code, stored numerically. Every one of the 3222 rows is unique with no nulls, and the range (1001 to 72153) plus the mild skew (0.16) and absence of outliers is consistent with the standard US FIPS code distribution across states and territories.

county_name high anthropic:claude-opus-4-7

This column holds US county-level place names — almost every value ends in 'County' (2999 of 3222), with smaller contingents of Louisiana 'parish' (64), Puerto Rico 'municipio' (78), and 'city' (47) designations. Despite 3222 rows there are only 1960 unique strings and a 39.2% duplicate rate, because common county names like 'Washington County' (30) and 'Jefferson County' (25) recur across states. Lengths are tight (min 10, max 46, mean 14.2) and there is no state qualifier in the string, so the name alone will not uniquely identify a county.

state high anthropic:claude-opus-4-7

This is a US state code column with 52 distinct values, matching the 50 states plus likely DC and one territory. Distribution is fairly even (entropy ratio 0.93), with TX leading at 254 rows (7.9%) followed by GA, VA, and KY — consistent with a county- or jurisdiction-level dataset where TX dominates due to its 254 counties. No nulls.

total_25_plus high anthropic:claude-opus-4-7

A heavily right-skewed count of population/entities aged 25+, ranging from 50 to 6,909,650 with a median of 18,313.5 but mean of 71,074.3. Skew of 13.51 and kurtosis of 306.9 indicate extreme tail behavior, and 13.66% of rows (440) flag as outliers. The std of 226,586 dwarfs the IQR of 38,789.5, consistent with a few very large geographies dominating.

pct_hs_or_higher high anthropic:claude-opus-4-7

This column appears to capture the percentage of adults with at least a high-school education, likely at a county or similar geographic level given the 3222 rows. Values are tightly concentrated in the high 80s (median 89.39, IQR 7.57) and left-skewed (skew -1.33, kurtosis 3.74), with a long lower tail reaching 33.33 and 86 outliers (2.7%) flagged. No nulls or zeros, and 1612 unique values suggest reasonable granularity.

pct_bachelors_or_higher high anthropic:claude-opus-4-7

This column reports the percent of adults with a bachelor's degree or higher across 3,222 rows, ranging from 0.0 to 78.87 with a mean of 23.50 and median of 21.07. The distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 high-end outliers (4.4%) reflecting a long tail of highly-educated areas. Nulls are absent and only one row sits at zero, so coverage is effectively complete.

Errors during insight pass (1)
  • dataset:__global__:anthropic:claude-opus-4-7: Json5EOF — ("Unclosed b'array' starting near 1663", {'narrative': "This dataset covers educational attainment for 3,222 US counties, with six columns spanning identifiers (FIPS, county name, state) and three numeric measures: total population aged 25+, percent with a high school diploma or higher, and percent with a bachelor's degree or higher. The county-level population (total_25_plus) is extremely skewed (skew 13.5, max ~6.9M vs. median ~18.3K) with 440 outliers, so any analysis using it should consider a log transform or per-capita framing. Educational attainment is the more interesting signal: pct_hs_or_higher clusters tightly near 88% (left-skewed), while pct_bachelors_or_higher is right-skewed with a mean of ~23.5% and a long tail up to ~79%, pointing to large disparities in higher-ed attainment across counties. Texas, Georgia, and Virginia contribute the most counties, and 'Washington County' is the most common name (30 occurrences), reflecting the ~39% duplicate rate in county_name where state context is needed for uniqueness.", 'confidence': 'high', 'evidence_keys': ['row_count', 'column_count', 'columns.county_name.stats.duplicate_rate', 'columns.county_name.top_values', 'columns.total_25_plus.stats.skew', 'columns.total_25_plus.stats.median', 'columns.total_25_plus.stats.max', 'columns.total_25_plus.stats.n_outliers', 'columns.pct_bachelors_or_higher.stats.mean', 'columns.pct_bachelors_or_higher.stats.max', 'columns.pct_bachelors_or_higher.stats.skew', 'columns.pct_hs_or_higher.stats.mean', 'columns.pct_hs_or_higher.stats.skew', 'columns.state.top_values', 'columns.state.n_unique'], 'featured_charts': [{'column': 'pct_bachelors_or_higher', 'kind': 'histogram', 'caption': 'Right-skewed distribution showing most counties below 28% but a long tail of highly educated counties up to ~79%.'}, {'column': 'pct_hs_or_higher', 'kind': 'histogram', 'caption': 'Left-skewed distribution clustered near 88-92%, with a tail of counties as low as 33% worth flagging.'}, {'column': 'total_25_plus', 'kind': 'histogram', 'caption': 'Extreme right skew (skew ~13.5) with 440 outliers — consider a log scale before plotting.'}, {'column': 'state', 'kind': 'bar', 'caption': 'County counts per state, with TX (254), GA (159), and VA (133) contributing the most rows.'}, {'column': 'county_name', 'kind': 'bar', 'caption': "Most common county names like Washington, Jefferson, and Franklin — a reminder that names alone aren't unique keys."}]}, None)

Numeric correlation

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

95th-percentile length under 20 chars 39.2% duplicate strings
rows3,222
null0 (0.0%)
unique1,960
len_min10
len_max46
len_mean14.172
len_median14.000
len_p9518.000
word_mean2.083
word_median2.000
n_empty0
n_duplicates1,262
duplicate_rate0.392
vocab_size1,963
readability_flesch_mean33.359
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County
  2. Cheatham County
  3. Piute County
  4. Lamb County
  5. Martin County
  6. Sheridan County
  7. Chickasaw County
  8. Rockingham County
  9. Liberty County
  10. Clark County

state categorical

rows3,222
null0 (0.0%)
unique52
top_valueTX
top_rate0.079
cardinality52
entropy5.314
entropy_ratio0.932
Top values (rank 1–20)
  1. TX — 254
  2. GA — 159
  3. VA — 133
  4. KY — 120
  5. MO — 115
  6. KS — 105
  7. IL — 102
  8. NC — 100
  9. IA — 99
  10. TN — 95
  11. NE — 93
  12. IN — 92
  13. OH — 88
  14. MN — 87
  15. MI — 83
  16. MS — 82
  17. PR — 78
  18. OK — 77
  19. AR — 75
  20. WI — 72

total_25_plus numeric

skew=+13.51 13.7% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,140
min50.000
max6,909,650
mean71,074
median18,314
std226,586
q17,696
q346,485
iqr38,790
skew13.515
kurtosis306.897
n_outliers440
outlier_rate0.137
zero_rate0.000

pct_hs_or_higher numeric

rows3,222
null0 (0.0%)
unique1,612
min33.330
max99.690
mean88.078
median89.390
std5.970
q184.900
q392.468
iqr7.567
skew-1.328
kurtosis3.742
n_outliers86
outlier_rate0.027
zero_rate0.000

pct_bachelors_or_higher numeric

rows3,222
null0 (0.0%)
unique1,982
min0.000
max78.870
mean23.499
median21.070
std9.983
q116.590
q327.848
iqr11.258
skew1.357
kurtosis2.306
n_outliers141
outlier_rate0.044
zero_rate3.10e-04