saturn·

education education by county

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv

Saturn profiled 3,222 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv",
    "--findings", "education-education_by_county.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 US county-level education records with 6 columns covering county identifiers (county_name, fips, state) and educational attainment metrics (pct_hs_or_higher, pct_bachelors_or_higher, total_25_plus). The bachelor's degree rate averages 23.5% but ranges from 0% to nearly 79%, suggesting wide regional disparities worth investigating. The total_25_plus population column is heavily skewed (skew=13.5) with 440 outliers and a max of nearly 6.9 million, so any analysis using it should consider log transforms or per-capita normalization. State coverage is fairly even across 52 entries, with TX, GA, and VA contributing the most counties.

citing: row_count · column_count · columns.pct_bachelors_or_higher.stats · columns.pct_hs_or_higher.stats · columns.total_25_plus.stats · columns.state.top_values · columns.county_name.top_values

Fig 1.
pct_bachelors_or_higher · Shows the wide spread of college attainment across counties, from 0% to nearly 79%.
Show data table
Histogram bins for pct_bachelors_or_higher (median: 21.07).
bincount
0 – 1.9721
1.972 – 3.9440
3.944 – 5.9154
5.915 – 7.8879
7.887 – 9.85932
9.859 – 11.83135
11.83 – 13.8169
13.8 – 15.77317
15.77 – 17.75328
17.75 – 19.72376
19.72 – 21.69345
21.69 – 23.66262
23.66 – 25.63232
25.63 – 27.6189
27.6 – 29.58123
29.58 – 31.55116
31.55 – 33.52118
33.52 – 35.4996
35.49 – 37.4660
37.46 – 39.4468
39.44 – 41.4140
41.41 – 43.3834
43.38 – 45.3534
45.35 – 47.3224
47.32 – 49.2921
49.29 – 51.2719
51.27 – 53.2415
53.24 – 55.2110
55.21 – 57.1811
57.18 – 59.1510
59.15 – 61.129
61.12 – 63.16
63.1 – 65.075
65.07 – 67.041
67.04 – 69.010
69.01 – 70.981
70.98 – 72.950
72.95 – 74.930
74.93 – 76.91
76.9 – 78.871
Fig 2.
pct_hs_or_higher · Most counties cluster near 88-89% high school completion, with a long left tail of low-attainment counties.
Show data table
Histogram bins for pct_hs_or_higher (median: 89.39).
bincount
33.33 – 34.991
34.99 – 36.650
36.65 – 38.310
38.31 – 39.970
39.97 – 41.620
41.62 – 43.280
43.28 – 44.940
44.94 – 46.60
46.6 – 48.260
48.26 – 49.920
49.92 – 51.580
51.58 – 53.240
53.24 – 54.90
54.9 – 56.561
56.56 – 58.221
58.22 – 59.871
59.87 – 61.533
61.53 – 63.193
63.19 – 64.853
64.85 – 66.512
66.51 – 68.176
68.17 – 69.837
69.83 – 71.4915
71.49 – 73.1530
73.15 – 74.8130
74.81 – 76.4646
76.46 – 78.1260
78.12 – 79.7888
79.78 – 81.44131
81.44 – 83.1174
83.1 – 84.76189
84.76 – 86.42256
86.42 – 88.08289
88.08 – 89.74360
89.74 – 91.39429
91.39 – 93.05460
93.05 – 94.71389
94.71 – 96.37192
96.37 – 98.0347
98.03 – 99.699
Fig 3.
total_25_plus · Highly skewed population distribution; consider log-scaling to see the bulk of small/medium counties.
Show data table
Histogram bins for total_25_plus (median: 18313.5).
bincount
50 – 1.728e+052948
1.728e+05 – 3.455e+05135
3.455e+05 – 5.183e+0553
5.183e+05 – 6.91e+0537
6.91e+05 – 8.638e+0513
8.638e+05 – 1.036e+0610
1.036e+06 – 1.209e+067
1.209e+06 – 1.382e+064
1.382e+06 – 1.555e+062
1.555e+06 – 1.727e+065
1.727e+06 – 1.9e+061
1.9e+06 – 2.073e+061
2.073e+06 – 2.246e+061
2.246e+06 – 2.418e+061
2.418e+06 – 2.591e+060
2.591e+06 – 2.764e+060
2.764e+06 – 2.937e+060
2.937e+06 – 3.109e+062
3.109e+06 – 3.282e+060
3.282e+06 – 3.455e+060
3.455e+06 – 3.628e+060
3.628e+06 – 3.8e+061
3.8e+06 – 3.973e+060
3.973e+06 – 4.146e+060
4.146e+06 – 4.319e+060
4.319e+06 – 4.491e+060
4.491e+06 – 4.664e+060
4.664e+06 – 4.837e+060
4.837e+06 – 5.01e+060
5.01e+06 – 5.182e+060
5.182e+06 – 5.355e+060
5.355e+06 – 5.528e+060
5.528e+06 – 5.7e+060
5.7e+06 – 5.873e+060
5.873e+06 – 6.046e+060
6.046e+06 – 6.219e+060
6.219e+06 – 6.391e+060
6.391e+06 – 6.564e+060
6.564e+06 – 6.737e+060
6.737e+06 – 6.91e+061
Fig 4.
state · Counties per state — TX, GA, and VA contribute the highest counts.
Show data table
Top values for state (20 unique shown, of 52 total).
valuecountshare
TX2547.9%
GA1594.9%
VA1334.1%
KY1203.7%
MO1153.6%
KS1053.3%
IL1023.2%
NC1003.1%
IA993.1%
TN952.9%
NE932.9%
IN922.9%
OH882.7%
MN872.7%
MI832.6%
MS822.5%
PR782.4%
OK772.4%
AR752.3%
WI722.2%
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
statecategorical0.0%
total_25_plusnumeric0.0%
pct_hs_or_highernumeric0.0%
pct_bachelors_or_highernumeric0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
fipstotal_25_pluspct_hs_or_higherpct_bachelors_or_higher
fips+1.00-0.07-0.09+0.03
total_25_plus-0.07+1.00+0.08+0.34
pct_hs_or_higher-0.09+0.08+1.00+0.57
pct_bachelors_or_higher+0.03+0.34+0.57+1.00

fips numeric identifier

This column holds U.S. FIPS county/state codes: every one of the 3222 rows is unique, no nulls, and values span 1001 to 72153, consistent with state-prefixed county identifiers. The distribution is roughly symmetric (skew 0.16, kurtosis -0.63) with no outliers, which is expected for an enumerated geographic key rather than a measured quantity. Treat the numeric stats as incidental — these are categorical identifiers, not magnitudes.

Treatment: Cast to zero-padded string and use as a join key to county/state geographies; do not model as a number.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 7.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text metadata

This is a US county-level place-name field: 2-word entries averaging 14 characters, with 'County' appearing 2999 times alongside 'Parish' (64, Louisiana) and 'Municipio' (78, Puerto Rico). Duplication is heavy at 39.2% (1262 rows) because common names like Washington County (30), Jefferson County (25), and Franklin County (24) recur across states — so this column is not unique on its own. With 1960 distinct values across 3222 rows, it needs a state qualifier to act as a key.

Treatment: Concatenate with a state/territory code before using as a join key; do not treat as unique.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,960
len_min 10
len_max 46
len_mean 14.17
len_median 14
len_p95 18
word_mean 2.083
word_median 2
n_empty 0
n_duplicates 1,262
duplicate_rate 0.3917
vocab_size 1,963
readability_flesch_mean 33.36
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: short_text95th-percentile length under 20 chars
alert: duplicates39.2% duplicate strings
Fig 8.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 14.172253258845437).
charscount
10 – 1129
11 – 12255
12 – 13465
13 – 14682
14 – 14588
14 – 15493
15 – 16291
16 – 17219
17 – 1867
18 – 190
19 – 2049
20 – 2123
21 – 2216
22 – 2314
23 – 248
24 – 244
24 – 255
25 – 262
26 – 271
27 – 280
28 – 291
29 – 300
30 – 310
31 – 322
32 – 321
32 – 331
33 – 341
34 – 351
35 – 360
36 – 370
37 – 380
38 – 390
39 – 400
40 – 412
41 – 421
42 – 420
42 – 430
43 – 440
44 – 450
45 – 461

state categorical feature

This is a US state code column with 52 distinct values, matching the 50 states plus likely DC and a territory. Distribution is fairly even (entropy ratio 0.93), with TX leading at 254 rows (7.88%) followed by GA, VA, and KY — consistent with county-level row counts where larger states contribute more records. No nulls.

Treatment: one-hot or target-encode for modelling; usable as a join key to state-level reference tables.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["state"].stats

statvalue
n3,222
nulls0 (0.0%)
unique52
top_value TX
top_rate 0.07883
cardinality 52
entropy 5.314
entropy_ratio 0.9322
Fig 9.
Top values for state.
Show data table
Top values for state (20 unique shown, of 52 total).
valuecountshare
TX2547.9%
GA1594.9%
VA1334.1%
KY1203.7%
MO1153.6%
KS1053.3%
IL1023.2%
NC1003.1%
IA993.1%
TN952.9%
NE932.9%
IN922.9%
OH882.7%
MN872.7%
MI832.6%
MS822.5%
PR782.4%
OK772.4%
AR752.3%
WI722.2%

total_25_plus numeric feature

A heavily right-skewed count of population (or units) aged 25 and over per row, ranging from 50 to 6,909,650 with a median of 18,313.5 but a mean of 71,074.3. Skew of 13.5 and kurtosis of 306.9 indicate a long heavy tail, and 440 rows (13.7%) flag as outliers. No nulls or zeros, and 3,140 of 3,222 values are unique, consistent with geographic aggregates of varying size.

Treatment: log-transform before any regression or distance-based modelling.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["total_25_plus"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,140
min 50
max 6.91e+06
mean 7.107e+04
median 1.831e+04
std 2.266e+05
q1 7696
q3 4.649e+04
iqr 3.879e+04
skew 13.51
kurtosis 306.9
n_outliers 440
outlier_rate 0.1366
zero_rate 0
alert: high_skewskew=+13.51
alert: outliers13.7% rows beyond 1.5 IQR
Fig 10.
Distribution of total_25_plus. Vertical dash marks the median.
Show data table
Histogram bins for total_25_plus (median: 18313.5).
bincount
50 – 1.728e+052948
1.728e+05 – 3.455e+05135
3.455e+05 – 5.183e+0553
5.183e+05 – 6.91e+0537
6.91e+05 – 8.638e+0513
8.638e+05 – 1.036e+0610
1.036e+06 – 1.209e+067
1.209e+06 – 1.382e+064
1.382e+06 – 1.555e+062
1.555e+06 – 1.727e+065
1.727e+06 – 1.9e+061
1.9e+06 – 2.073e+061
2.073e+06 – 2.246e+061
2.246e+06 – 2.418e+061
2.418e+06 – 2.591e+060
2.591e+06 – 2.764e+060
2.764e+06 – 2.937e+060
2.937e+06 – 3.109e+062
3.109e+06 – 3.282e+060
3.282e+06 – 3.455e+060
3.455e+06 – 3.628e+060
3.628e+06 – 3.8e+061
3.8e+06 – 3.973e+060
3.973e+06 – 4.146e+060
4.146e+06 – 4.319e+060
4.319e+06 – 4.491e+060
4.491e+06 – 4.664e+060
4.664e+06 – 4.837e+060
4.837e+06 – 5.01e+060
5.01e+06 – 5.182e+060
5.182e+06 – 5.355e+060
5.355e+06 – 5.528e+060
5.528e+06 – 5.7e+060
5.7e+06 – 5.873e+060
5.873e+06 – 6.046e+060
6.046e+06 – 6.219e+060
6.219e+06 – 6.391e+060
6.391e+06 – 6.564e+060
6.564e+06 – 6.737e+060
6.737e+06 – 6.91e+061

pct_hs_or_higher numeric feature

This column reports the percentage of a population with at least a high school education, ranging from 33.33 to 99.69 with a mean of 88.08 and median of 89.39. The distribution is left-skewed (skew -1.33) with heavy tails (kurtosis 3.74), and 86 low-end outliers (2.67%) pull below the bulk concentrated between Q1 84.9 and Q3 92.47. With 1612 unique values across 3222 rows and no nulls, it looks like a clean geographic feature (likely county- or tract-level).

Treatment: Use as-is or apply a reflected log/Box-Cox to address the left skew before linear modelling.

anthropic:claude-opus-4-7 · confidence high
Out[24]:

saturn.columns["pct_hs_or_higher"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,612
min 33.33
max 99.69
mean 88.08
median 89.39
std 5.97
q1 84.9
q3 92.47
iqr 7.567
skew -1.328
kurtosis 3.742
n_outliers 86
outlier_rate 0.02669
zero_rate 0
Fig 11.
Distribution of pct_hs_or_higher. Vertical dash marks the median.
Show data table
Histogram bins for pct_hs_or_higher (median: 89.39).
bincount
33.33 – 34.991
34.99 – 36.650
36.65 – 38.310
38.31 – 39.970
39.97 – 41.620
41.62 – 43.280
43.28 – 44.940
44.94 – 46.60
46.6 – 48.260
48.26 – 49.920
49.92 – 51.580
51.58 – 53.240
53.24 – 54.90
54.9 – 56.561
56.56 – 58.221
58.22 – 59.871
59.87 – 61.533
61.53 – 63.193
63.19 – 64.853
64.85 – 66.512
66.51 – 68.176
68.17 – 69.837
69.83 – 71.4915
71.49 – 73.1530
73.15 – 74.8130
74.81 – 76.4646
76.46 – 78.1260
78.12 – 79.7888
79.78 – 81.44131
81.44 – 83.1174
83.1 – 84.76189
84.76 – 86.42256
86.42 – 88.08289
88.08 – 89.74360
89.74 – 91.39429
91.39 – 93.05460
93.05 – 94.71389
94.71 – 96.37192
96.37 – 98.0347
98.03 – 99.699

pct_bachelors_or_higher numeric feature

This column reports the percentage of residents with a bachelor's degree or higher across 3,222 rows, ranging from 0.0 to 78.87 with a median of 21.07 and mean of 23.50. The distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 high-end outliers (4.4%) reflecting a long tail of highly-educated areas. Near-zero zero_rate (0.0003) and no nulls suggest clean coverage.

Treatment: Consider a log or sqrt transform to tame the right skew before linear modelling.

anthropic:claude-opus-4-7 · confidence high
Out[27]:

saturn.columns["pct_bachelors_or_higher"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,982
min 0
max 78.87
mean 23.5
median 21.07
std 9.983
q1 16.59
q3 27.85
iqr 11.26
skew 1.357
kurtosis 2.306
n_outliers 141
outlier_rate 0.04376
zero_rate 0.0003104
Fig 12.
Distribution of pct_bachelors_or_higher. Vertical dash marks the median.
Show data table
Histogram bins for pct_bachelors_or_higher (median: 21.07).
bincount
0 – 1.9721
1.972 – 3.9440
3.944 – 5.9154
5.915 – 7.8879
7.887 – 9.85932
9.859 – 11.83135
11.83 – 13.8169
13.8 – 15.77317
15.77 – 17.75328
17.75 – 19.72376
19.72 – 21.69345
21.69 – 23.66262
23.66 – 25.63232
25.63 – 27.6189
27.6 – 29.58123
29.58 – 31.55116
31.55 – 33.52118
33.52 – 35.4996
35.49 – 37.4660
37.46 – 39.4468
39.44 – 41.4140
41.41 – 43.3834
43.38 – 45.3534
45.35 – 47.3224
47.32 – 49.2921
49.29 – 51.2719
51.27 – 53.2415
53.24 – 55.2110
55.21 – 57.1811
57.18 – 59.1510
59.15 – 61.129
61.12 – 63.16
63.1 – 65.075
65.07 – 67.041
67.04 – 69.010
69.01 – 70.981
70.98 – 72.950
72.95 – 74.930
74.93 – 76.91
76.9 – 78.871

How to cite

click to copy

BibTeX
@misc{saturn-education-education-by-county-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: education education by county},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/education-education_by_county}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: education education by county. Source: /home/coolhand/datasets/us-inequality-atlas/education/education_by_county.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/education-education_by_county