saturn·

data trove us presidential election results by county

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/geographic/election/2016_election.csv

Saturn profiled 3,141 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/geographic/election/2016_election.csv",
    "--findings", "data-trove-us-presidential-election-results-by-county.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset captures 2016 US presidential election results at the county level, covering all 3,141 counties across 51 state/territory abbreviations. The most striking pattern is the strong Republican lean in the median county: the median GOP vote share is 66.5% versus 28.6% for Democrats, though total votes are heavily right-skewed — a small number of large urban counties (max 2.65 million votes) dominate raw vote totals while most counties are small. The per-point difference column shows values ranging widely (e.g., 63% margins appear in the top values), suggesting many counties were not competitive at all. Texas leads with 254 counties, making state-level aggregation worth examining to see which states drive the most records and volume.

citing: per_gop.stats.median · per_dem.stats.median · total_votes.stats.max · total_votes.stats.median · state_abbr.stats.top_value · row_count · per_gop.stats.skew · votes_dem.stats.n_outliers

Out[4]:

saturn.schema() · 11 columns

column kind n null% unique alerts
numeric 3,141 0.0% 3,141
votes_dem numeric 3,141 0.0% 2,688 high_skew outliers
votes_gop numeric 3,141 0.0% 2,901 high_skew outliers
total_votes numeric 3,141 0.0% 2,966 high_skew outliers
per_dem numeric 3,141 0.0% 3,112
per_gop numeric 3,141 0.0% 3,112
diff text 3,141 0.0% 2,738 one_word allcaps short_text
per_point_diff text 3,141 0.0% 2,555 one_word allcaps short_text
state_abbr categorical 3,141 0.0% 51
county_name text 3,141 0.0% 1,848 short_text duplicates
combined_fips numeric 3,141 0.0% 3,141
Fig 1.
per_gop · Look for the strong rightward skew showing most counties voted heavily Republican, with a long left tail of Democratic-leaning counties.
Show data table
Histogram bins for per_gop (median: 0.665352643757).
bincount
0.04122 – 0.064011
0.06401 – 0.08682
0.0868 – 0.10965
0.1096 – 0.13242
0.1324 – 0.15526
0.1552 – 0.177911
0.1779 – 0.20077
0.2007 – 0.223517
0.2235 – 0.246317
0.2463 – 0.269117
0.2691 – 0.291923
0.2919 – 0.314732
0.3147 – 0.337534
0.3375 – 0.360230
0.3602 – 0.38343
0.383 – 0.405841
0.4058 – 0.428664
0.4286 – 0.451471
0.4514 – 0.474263
0.4742 – 0.49789
0.497 – 0.519878
0.5198 – 0.5425117
0.5425 – 0.5653116
0.5653 – 0.5881147
0.5881 – 0.6109147
0.6109 – 0.6337156
0.6337 – 0.6565165
0.6565 – 0.6793193
0.6793 – 0.7021190
0.7021 – 0.7249215
0.7249 – 0.7476223
0.7476 – 0.7704213
0.7704 – 0.7932187
0.7932 – 0.816142
0.816 – 0.8388113
0.8388 – 0.861674
0.8616 – 0.884449
0.8844 – 0.907229
0.9072 – 0.92999
0.9299 – 0.95273
Fig 2.
per_dem · Compare against the GOP histogram — the Democratic distribution clusters below 40%, highlighting how few counties lean Democrat at the county level.
Show data table
Histogram bins for per_dem (median: 0.2864).
bincount
0.03145 – 0.053878
0.05387 – 0.076316
0.0763 – 0.0987252
0.09872 – 0.121174
0.1211 – 0.1436116
0.1436 – 0.166146
0.166 – 0.1884203
0.1884 – 0.2109226
0.2109 – 0.2333240
0.2333 – 0.2557218
0.2557 – 0.2781200
0.2781 – 0.3006205
0.3006 – 0.323153
0.323 – 0.3454147
0.3454 – 0.3678153
0.3678 – 0.3903152
0.3903 – 0.4127106
0.4127 – 0.4351111
0.4351 – 0.457577
0.4575 – 0.4878
0.48 – 0.502456
0.5024 – 0.524872
0.5248 – 0.547245
0.5472 – 0.569742
0.5697 – 0.592136
0.5921 – 0.614536
0.6145 – 0.636934
0.6369 – 0.659425
0.6594 – 0.681830
0.6818 – 0.704216
0.7042 – 0.726612
0.7266 – 0.749112
0.7491 – 0.771514
0.7715 – 0.79399
0.7939 – 0.81636
0.8163 – 0.83884
0.8388 – 0.86124
0.8612 – 0.88364
0.8836 – 0.9062
0.906 – 0.92851
Fig 3.
state_abbr · Texas dominates with 254 counties; this chart reveals which states contribute the most county-level records and thus the most weight in any aggregation.
Show data table
Top values for state_abbr (20 unique shown, of 51 total).
valuecountshare
TX2548.1%
GA1595.1%
VA1334.2%
KY1203.8%
MO1153.7%
KS1053.3%
IL1023.2%
NC1003.2%
IA993.2%
TN953.0%
NE933.0%
IN922.9%
OH882.8%
MN872.8%
MI832.6%
MS822.6%
OK772.5%
AR752.4%
WI722.3%
AL672.1%
Fig 4.
total_votes · The extreme right skew (max 2.65M vs median 11,144) shows a handful of populous counties vastly outvote the typical county.
Show data table
Histogram bins for total_votes (median: 11144.0).
bincount
64 – 6.636e+042699
6.636e+04 – 1.327e+05192
1.327e+05 – 1.99e+0577
1.99e+05 – 2.653e+0570
2.653e+05 – 3.316e+0533
3.316e+05 – 3.979e+0520
3.979e+05 – 4.642e+0512
4.642e+05 – 5.305e+054
5.305e+05 – 5.968e+057
5.968e+05 – 6.631e+059
6.631e+05 – 7.294e+054
7.294e+05 – 7.957e+055
7.957e+05 – 8.62e+051
8.62e+05 – 9.283e+051
9.283e+05 – 9.946e+051
9.946e+05 – 1.061e+061
1.061e+06 – 1.127e+061
1.127e+06 – 1.193e+060
1.193e+06 – 1.26e+061
1.26e+06 – 1.326e+061
1.326e+06 – 1.392e+060
1.392e+06 – 1.459e+060
1.459e+06 – 1.525e+060
1.525e+06 – 1.591e+060
1.591e+06 – 1.658e+060
1.658e+06 – 1.724e+060
1.724e+06 – 1.79e+060
1.79e+06 – 1.856e+060
1.856e+06 – 1.923e+060
1.923e+06 – 1.989e+060
1.989e+06 – 2.055e+061
2.055e+06 – 2.122e+060
2.122e+06 – 2.188e+060
2.188e+06 – 2.254e+060
2.254e+06 – 2.321e+060
2.321e+06 – 2.387e+060
2.387e+06 – 2.453e+060
2.453e+06 – 2.519e+060
2.519e+06 – 2.586e+060
2.586e+06 – 2.652e+061
Fig 5.
county_name · Washington County appears 30 times across different states — check for common county names that could cause confusion in county-level joins or lookups.
Show data table
Character-length distribution for county_name (mean: 13.869149952244507).
charscount
6 – 729
7 – 70
7 – 80
8 – 80
8 – 90
9 – 90
9 – 100
10 – 1029
10 – 110
11 – 11255
11 – 120
12 – 12465
12 – 130
13 – 13683
13 – 140
14 – 14585
14 – 150
15 – 15485
15 – 160
16 – 16280
16 – 17202
17 – 180
18 – 1852
18 – 190
19 – 1940
19 – 200
20 – 2015
20 – 210
21 – 2111
21 – 220
22 – 226
22 – 230
23 – 232
23 – 240
24 – 241
24 – 250
25 – 250
25 – 260
26 – 260
26 – 271
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
numeric0.0%
votes_demnumeric0.0%
votes_gopnumeric0.0%
total_votesnumeric0.0%
per_demnumeric0.0%
per_gopnumeric0.0%
difftext0.0%
per_point_difftext0.0%
state_abbrcategorical0.0%
county_nametext0.0%
combined_fipsnumeric0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 7 numeric columns (values clipped to 2 decimals).
votes_demvotes_goptotal_votesper_demper_gopcombined_fips
+1.00-0.11-0.12-0.11-0.10+0.08+0.99
votes_dem-0.11+1.00+0.86+0.98+0.37-0.37-0.11
votes_gop-0.12+0.86+1.00+0.93+0.36-0.37-0.12
total_votes-0.11+0.98+0.93+1.00+0.37-0.38-0.12
per_dem-0.10+0.37+0.36+0.37+1.00-0.98-0.09
per_gop+0.08-0.37-0.37-0.38-0.98+1.00+0.07
combined_fips+0.99-0.11-0.12-0.12-0.09+0.07+1.00

numeric identifier

This column is almost certainly a row index or sequential integer ID, running from 0 to 3140 with every value unique and no nulls. The distribution is perfectly uniform: mean equals median at 1570.0, skew is exactly 0.0, kurtosis is –1.2 (consistent with a flat/uniform distribution), and there are zero outliers. The single surprising note is that zero_rate is non-zero (one zero present), which is simply the first index value (0) rather than a missing-data signal.

Treatment: Drop before modelling; if row order matters, retain as an explicit sort key only.

anthropic:default · confidence high
Out[13]:

saturn.columns[""].stats

statvalue
n3,141
nulls0 (0.0%)
unique3,141
min 0
max 3,140
mean 1,570
median 1,570
std 906.9
q1 785
q3 2,355
iqr 1,570
skew 0
kurtosis -1.2
n_outliers 0
outlier_rate 0
zero_rate 0.0003184
Fig 8.
Distribution of . Vertical dash marks the median.
Show data table
Histogram bins for (median: 1570.0).
bincount
0 – 78.579
78.5 – 15778
157 – 235.579
235.5 – 31478
314 – 392.579
392.5 – 47178
471 – 549.579
549.5 – 62878
628 – 706.579
706.5 – 78578
785 – 863.579
863.5 – 94278
942 – 102079
1020 – 109978
1099 – 117879
1178 – 125678
1256 – 133479
1334 – 141378
1413 – 149279
1492 – 157078
1570 – 164879
1648 – 172778
1727 – 180679
1806 – 188478
1884 – 196279
1962 – 204178
2041 – 212079
2120 – 219878
2198 – 227679
2276 – 235578
2355 – 243479
2434 – 251278
2512 – 259079
2590 – 266978
2669 – 274879
2748 – 282678
2826 – 290479
2904 – 298378
2983 – 306279
3062 – 314079

votes_dem numeric feature

This column represents Democratic vote counts at the county level (n=3141 matches the number of U.S. counties), recording raw votes received per county in an election. The distribution is extremely right-skewed (skew=11.65, kurtosis=224.36): the median is only 3,194 but the mean is 20,734 and the max reaches 1,893,770, reflecting the enormous disparity between rural and urban counties. Nearly 15% of rows (468) are flagged as outliers, driven by large metropolitan counties. The min of 4 votes is plausible for the least-populated counties.

Treatment: Log-transform (log1p) before regression or clustering to reduce skew; consider deriving vote share alongside raw count.

anthropic:default · confidence high
Out[16]:

saturn.columns["votes_dem"].stats

statvalue
n3,141
nulls0 (0.0%)
unique2,688
min 4
max 1.894e+06
mean 2.073e+04
median 3,194
std 7.2e+04
q1 1,175
q3 10,047
iqr 8,872
skew 11.65
kurtosis 224.4
n_outliers 468
outlier_rate 0.149
zero_rate 0
alert: high_skewskew=+11.65
alert: outliers14.9% rows beyond 1.5 IQR
Fig 9.
Distribution of votes_dem. Vertical dash marks the median.
Show data table
Histogram bins for votes_dem (median: 3194.0).
bincount
4 – 4.735e+042844
4.735e+04 – 9.469e+04145
9.469e+04 – 1.42e+0552
1.42e+05 – 1.894e+0528
1.894e+05 – 2.367e+0519
2.367e+05 – 2.841e+0511
2.841e+05 – 3.314e+0515
3.314e+05 – 3.788e+056
3.788e+05 – 4.261e+052
4.261e+05 – 4.734e+053
4.734e+05 – 5.208e+055
5.208e+05 – 5.681e+055
5.681e+05 – 6.155e+051
6.155e+05 – 6.628e+052
6.628e+05 – 7.102e+051
7.102e+05 – 7.575e+050
7.575e+05 – 8.049e+050
8.049e+05 – 8.522e+050
8.522e+05 – 8.995e+050
8.995e+05 – 9.469e+050
9.469e+05 – 9.942e+050
9.942e+05 – 1.042e+060
1.042e+06 – 1.089e+060
1.089e+06 – 1.136e+060
1.136e+06 – 1.184e+060
1.184e+06 – 1.231e+060
1.231e+06 – 1.278e+060
1.278e+06 – 1.326e+060
1.326e+06 – 1.373e+060
1.373e+06 – 1.42e+060
1.42e+06 – 1.468e+060
1.468e+06 – 1.515e+060
1.515e+06 – 1.562e+061
1.562e+06 – 1.61e+060
1.61e+06 – 1.657e+060
1.657e+06 – 1.704e+060
1.704e+06 – 1.752e+060
1.752e+06 – 1.799e+060
1.799e+06 – 1.846e+060
1.846e+06 – 1.894e+061

votes_gop numeric feature

This column records the raw count of Republican (GOP) votes per geographic unit, almost certainly at the U.S. county level given n=3141 (matching the ~3,143 U.S. counties). The distribution is extremely right-skewed (skew=5.78, kurtosis=51.78): the median is only 7,268 yet the mean is 20,645 and the max reaches 620,285, reflecting the massive population disparity between rural and urban/suburban counties. A notable 12.5% of rows (394) are flagged as outliers, corresponding to the largest-population counties that dwarf the typical small rural county.

Treatment: Log-transform (log1p) before any regression or distance-based modelling to reduce skew; consider per-capita or vote-share normalisation if comparing across counties.

anthropic:default · confidence high
Out[19]:

saturn.columns["votes_gop"].stats

statvalue
n3,141
nulls0 (0.0%)
unique2,901
min 57
max 620,285
mean 2.065e+04
median 7,268
std 4.163e+04
q1 3,241
q3 18,130
iqr 14,889
skew 5.78
kurtosis 51.78
n_outliers 394
outlier_rate 0.1254
zero_rate 0
alert: high_skewskew=+5.78
alert: outliers12.5% rows beyond 1.5 IQR
Fig 10.
Distribution of votes_gop. Vertical dash marks the median.
Show data table
Histogram bins for votes_gop (median: 7268.0).
bincount
57 – 1.556e+042260
1.556e+04 – 3.107e+04381
3.107e+04 – 4.657e+04153
4.657e+04 – 6.208e+04100
6.208e+04 – 7.759e+0454
7.759e+04 – 9.309e+0432
9.309e+04 – 1.086e+0527
1.086e+05 – 1.241e+0523
1.241e+05 – 1.396e+0547
1.396e+05 – 1.551e+0513
1.551e+05 – 1.706e+0514
1.706e+05 – 1.861e+054
1.861e+05 – 2.016e+058
2.016e+05 – 2.171e+052
2.171e+05 – 2.326e+052
2.326e+05 – 2.481e+052
2.481e+05 – 2.637e+054
2.637e+05 – 2.792e+053
2.792e+05 – 2.947e+051
2.947e+05 – 3.102e+051
3.102e+05 – 3.257e+051
3.257e+05 – 3.412e+052
3.412e+05 – 3.567e+051
3.567e+05 – 3.722e+050
3.722e+05 – 3.877e+051
3.877e+05 – 4.032e+050
4.032e+05 – 4.187e+050
4.187e+05 – 4.342e+050
4.342e+05 – 4.497e+051
4.497e+05 – 4.652e+050
4.652e+05 – 4.807e+051
4.807e+05 – 4.962e+050
4.962e+05 – 5.117e+050
5.117e+05 – 5.273e+050
5.273e+05 – 5.428e+050
5.428e+05 – 5.583e+051
5.583e+05 – 5.738e+050
5.738e+05 – 5.893e+050
5.893e+05 – 6.048e+051
6.048e+05 – 6.203e+051

total_votes numeric feature

This column represents the total vote count for records in the dataset, with values ranging from 64 to 2,652,072. The distribution is severely right-skewed (skew = 8.89, kurtosis = 136.17): the median is only 11,144 while the mean is 43,637, indicating a long upper tail driven by 442 outliers (14.1% of rows) far above the IQR ceiling of ~29,799. The spread (std = 114,568) is more than 2.5× the mean, confirming that a small number of items attract disproportionately large vote counts.

Treatment: Log-transform before modelling to compress the extreme right tail and reduce outlier leverage.

anthropic:default · confidence high
Out[22]:

saturn.columns["total_votes"].stats

statvalue
n3,141
nulls0 (0.0%)
unique2,966
min 64
max 2.652e+06
mean 4.364e+04
median 11,144
std 1.146e+05
q1 4,870
q3 29,799
iqr 24,929
skew 8.894
kurtosis 136.2
n_outliers 442
outlier_rate 0.1407
zero_rate 0
alert: high_skewskew=+8.89
alert: outliers14.1% rows beyond 1.5 IQR
Fig 11.
Distribution of total_votes. Vertical dash marks the median.
Show data table
Histogram bins for total_votes (median: 11144.0).
bincount
64 – 6.636e+042699
6.636e+04 – 1.327e+05192
1.327e+05 – 1.99e+0577
1.99e+05 – 2.653e+0570
2.653e+05 – 3.316e+0533
3.316e+05 – 3.979e+0520
3.979e+05 – 4.642e+0512
4.642e+05 – 5.305e+054
5.305e+05 – 5.968e+057
5.968e+05 – 6.631e+059
6.631e+05 – 7.294e+054
7.294e+05 – 7.957e+055
7.957e+05 – 8.62e+051
8.62e+05 – 9.283e+051
9.283e+05 – 9.946e+051
9.946e+05 – 1.061e+061
1.061e+06 – 1.127e+061
1.127e+06 – 1.193e+060
1.193e+06 – 1.26e+061
1.26e+06 – 1.326e+061
1.326e+06 – 1.392e+060
1.392e+06 – 1.459e+060
1.459e+06 – 1.525e+060
1.525e+06 – 1.591e+060
1.591e+06 – 1.658e+060
1.658e+06 – 1.724e+060
1.724e+06 – 1.79e+060
1.79e+06 – 1.856e+060
1.856e+06 – 1.923e+060
1.923e+06 – 1.989e+060
1.989e+06 – 2.055e+061
2.055e+06 – 2.122e+060
2.122e+06 – 2.188e+060
2.188e+06 – 2.254e+060
2.254e+06 – 2.321e+060
2.321e+06 – 2.387e+060
2.387e+06 – 2.453e+060
2.453e+06 – 2.519e+060
2.519e+06 – 2.586e+060
2.586e+06 – 2.652e+061

per_dem numeric numeric_target

This column almost certainly represents the Democratic party vote share (proportion) at the county level — the 3,141 rows match the number of U.S. counties exactly, and values are bounded between 0.031 and 0.928 with a mean of 0.318 and median of 0.286, consistent with Democratic vote shares skewing below 50% across most counties. The positive skew (0.942) reflects a long right tail of heavily Democratic urban counties pulling the mean above the median, while the bulk of counties are Republican-leaning. Near-uniqueness (3,112 of 3,141 values distinct) and zero null rate confirm clean, continuous proportional data with no structural issues.

Treatment: Use directly as a regression target or feature; consider logit-transform (log-odds) to map the bounded [0,1] proportion to an unbounded scale before modelling.

anthropic:default · confidence high
Out[25]:

saturn.columns["per_dem"].stats

statvalue
n3,141
nulls0 (0.0%)
unique3,112
min 0.03145
max 0.9285
mean 0.3176
median 0.2864
std 0.153
q1 0.2054
q3 0.3982
iqr 0.1929
skew 0.9422
kurtosis 0.6859
n_outliers 76
outlier_rate 0.0242
zero_rate 0
Fig 12.
Distribution of per_dem. Vertical dash marks the median.
Show data table
Histogram bins for per_dem (median: 0.2864).
bincount
0.03145 – 0.053878
0.05387 – 0.076316
0.0763 – 0.0987252
0.09872 – 0.121174
0.1211 – 0.1436116
0.1436 – 0.166146
0.166 – 0.1884203
0.1884 – 0.2109226
0.2109 – 0.2333240
0.2333 – 0.2557218
0.2557 – 0.2781200
0.2781 – 0.3006205
0.3006 – 0.323153
0.323 – 0.3454147
0.3454 – 0.3678153
0.3678 – 0.3903152
0.3903 – 0.4127106
0.4127 – 0.4351111
0.4351 – 0.457577
0.4575 – 0.4878
0.48 – 0.502456
0.5024 – 0.524872
0.5248 – 0.547245
0.5472 – 0.569742
0.5697 – 0.592136
0.5921 – 0.614536
0.6145 – 0.636934
0.6369 – 0.659425
0.6594 – 0.681830
0.6818 – 0.704216
0.7042 – 0.726612
0.7266 – 0.749112
0.7491 – 0.771514
0.7715 – 0.79399
0.7939 – 0.81636
0.8163 – 0.83884
0.8388 – 0.86124
0.8612 – 0.88364
0.8836 – 0.9062
0.906 – 0.92851

per_gop numeric numeric_target

This column represents the Republican (GOP) vote share as a proportion (0–1 scale), almost certainly at the U.S. county level given n=3141, which closely matches the total number of U.S. counties. The distribution is left-skewed (skew = -0.82) with a median of 0.665, indicating most counties lean Republican — a well-known feature of county-level electoral geography where rural counties are numerous and heavily GOP. The range (0.041 to 0.953) is plausible for partisan vote shares, and 63 outliers (2%) likely correspond to heavily urban or heavily rural counties at the extremes.

Treatment: Use directly as a regression target or feature; consider logit-transforming the proportion to unbound it from [0,1] for linear models.

anthropic:default · confidence high
Out[28]:

saturn.columns["per_gop"].stats

statvalue
n3,141
nulls0 (0.0%)
unique3,112
min 0.04122
max 0.9527
mean 0.6351
median 0.6654
std 0.1561
q1 0.5458
q3 0.7503
iqr 0.2045
skew -0.8193
kurtosis 0.376
n_outliers 63
outlier_rate 0.02006
zero_rate 0
Fig 13.
Distribution of per_gop. Vertical dash marks the median.
Show data table
Histogram bins for per_gop (median: 0.665352643757).
bincount
0.04122 – 0.064011
0.06401 – 0.08682
0.0868 – 0.10965
0.1096 – 0.13242
0.1324 – 0.15526
0.1552 – 0.177911
0.1779 – 0.20077
0.2007 – 0.223517
0.2235 – 0.246317
0.2463 – 0.269117
0.2691 – 0.291923
0.2919 – 0.314732
0.3147 – 0.337534
0.3375 – 0.360230
0.3602 – 0.38343
0.383 – 0.405841
0.4058 – 0.428664
0.4286 – 0.451471
0.4514 – 0.474263
0.4742 – 0.49789
0.497 – 0.519878
0.5198 – 0.5425117
0.5425 – 0.5653116
0.5653 – 0.5881147
0.5881 – 0.6109147
0.6109 – 0.6337156
0.6337 – 0.6565165
0.6565 – 0.6793193
0.6793 – 0.7021190
0.7021 – 0.7249215
0.7249 – 0.7476223
0.7476 – 0.7704213
0.7704 – 0.7932187
0.7932 – 0.816142
0.816 – 0.8388113
0.8388 – 0.861674
0.8616 – 0.884449
0.8844 – 0.907229
0.9072 – 0.92999
0.9299 – 0.95273

diff text feature

This column contains formatted numeric values (integers with comma thousand-separators) stored as text, representing some kind of difference or delta metric — likely a count differential. Despite being classified as text, all 3,141 values are single tokens with a mean length of 4.9 characters and 99.2% 'all-caps' rate (a quirk of how digit strings are scored by the profiler). The dominant value '37,410' appears 29 times — roughly 7× more frequent than any other value — which is a notable outlier in the frequency distribution and may warrant investigation for data entry repetition or a sentinel/default value.

Treatment: Strip commas, cast to integer, investigate the 29 occurrences of '37,410' as a potential sentinel before modelling.

anthropic:default · confidence medium
Out[31]:

saturn.columns["diff"].stats

statvalue
n3,141
nulls0 (0.0%)
unique2,738
len_min 1
len_max 9
len_mean 4.935
len_median 5
len_p95 6
word_mean 1
word_median 1
n_empty 0
n_duplicates 403
duplicate_rate 0.1283
vocab_size 2,738
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0.9924
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: allcaps99.2% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 14.
Character-length distribution for diff.
Show data table
Character-length distribution for diff (mean: 4.935370900986947).
charscount
1 – 12
1 – 10
1 – 20
2 – 20
2 – 20
2 – 222
2 – 20
2 – 30
3 – 30
3 – 30
3 – 3440
3 – 30
3 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 51978
5 – 50
5 – 60
6 – 60
6 – 60
6 – 6651
6 – 60
6 – 70
7 – 70
7 – 70
7 – 746
7 – 70
7 – 80
8 – 80
8 – 80
8 – 80
8 – 80
8 – 90
9 – 90
9 – 92

per_point_diff text feature

This column stores percentage values representing a per-point differential (likely a margin or rate metric), encoded as strings with a '%' suffix rather than as numeric floats — all 3,141 values are single uppercase tokens between 5 and 6 characters long. The allcaps_rate of 1.0 is a classifier artifact from the '%' symbol, not actual uppercase text. Surprisingly, 18.7% of rows (586) are duplicates, with '15.17%' alone appearing 31 times, suggesting repeated measurements or grouped records sharing the same differential. The column should be numeric but was ingested as text.

Treatment: Strip '%' suffix and cast to float before modelling; investigate the 31 occurrences of '15.17%' for data quality or grouping issues.

anthropic:default · confidence high
Out[34]:

saturn.columns["per_point_diff"].stats

statvalue
n3,141
nulls0 (0.0%)
unique2,555
len_min 5
len_max 6
len_mean 5.896
len_median 6
len_p95 6
word_mean 1
word_median 1
n_empty 0
n_duplicates 586
duplicate_rate 0.1866
vocab_size 2,555
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 15.
Character-length distribution for per_point_diff.
Show data table
Character-length distribution for per_point_diff (mean: 5.895893027698185).
charscount
5 – 5327
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 62814

state_abbr categorical label

This column contains US state abbreviations covering all 51 values (50 states + DC), with zero nulls across 3,141 rows — consistent with a county-level dataset where n≈3,141 matches the known US county count. TX dominates with 254 rows (8.09% of records), aligning exactly with Texas's 254 counties, confirming county-level granularity. The entropy ratio of 0.93 indicates near-uniform distribution across states, which is expected given that state representation is proportional to county count rather than population.

Treatment: Use as a grouping/aggregation key for state-level rollups; one-hot encode or target-encode if used as a model feature.

anthropic:default · confidence high
Out[37]:

saturn.columns["state_abbr"].stats

statvalue
n3,141
nulls0 (0.0%)
unique51
top_value TX
top_rate 0.08087
cardinality 51
entropy 5.275
entropy_ratio 0.9299
Fig 16.
Top values for state_abbr.
Show data table
Top values for state_abbr (20 unique shown, of 51 total).
valuecountshare
TX2548.1%
GA1595.1%
VA1334.2%
KY1203.8%
MO1153.7%
KS1053.3%
IL1023.2%
NC1003.2%
IA993.2%
TN953.0%
NE933.0%
IN922.9%
OH882.8%
MN872.8%
MI832.6%
MS822.6%
OK772.5%
AR752.4%
WI722.3%
AL672.1%

county_name text label

This column contains US county (and equivalent) names, covering all 3,141 US counties/county-equivalents with zero nulls — a near-complete national roster. The 41.2% duplicate rate (1,293 duplicates across 1,848 unique values) is expected and not anomalous: common names like 'Washington County' appear 30 times and 'Jefferson County' 25 times because the same county name exists across multiple states. Notably, 'Alaska' appears 29 times as a bare state name rather than a borough/census area name, which may signal inconsistent formatting for Alaska's county-equivalents. The word 'parish' (64 occurrences) and 'city' (43 occurrences) confirm Louisiana parishes and independent cities are included alongside standard counties.

Treatment: Use as a grouping/join key paired with state to ensure uniqueness; investigate and standardize the 29 'Alaska' bare-state entries before aggregation.

anthropic:default · confidence high
Out[40]:

saturn.columns["county_name"].stats

statvalue
n3,141
nulls0 (0.0%)
unique1,848
len_min 6
len_max 27
len_mean 13.87
len_median 14
len_p95 17
word_mean 2.054
word_median 2
n_empty 0
n_duplicates 1,293
duplicate_rate 0.4117
vocab_size 1,840
readability_flesch_mean 38.38
emoji_rate 0
url_rate 0
one_word_rate 0.009233
allcaps_rate 0
boilerplate_rate 0
alert: short_text95th-percentile length under 20 chars
alert: duplicates41.2% duplicate strings
Fig 17.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 13.869149952244507).
charscount
6 – 729
7 – 70
7 – 80
8 – 80
8 – 90
9 – 90
9 – 100
10 – 1029
10 – 110
11 – 11255
11 – 120
12 – 12465
12 – 130
13 – 13683
13 – 140
14 – 14585
14 – 150
15 – 15485
15 – 160
16 – 16280
16 – 17202
17 – 180
18 – 1852
18 – 190
19 – 1940
19 – 200
20 – 2015
20 – 210
21 – 2111
21 – 220
22 – 226
22 – 230
23 – 232
23 – 240
24 – 241
24 – 250
25 – 250
25 – 260
26 – 260
26 – 271

combined_fips numeric identifier

This column contains US county-level FIPS codes, 5-digit numeric identifiers where the first 2 digits encode the state and the last 3 encode the county. The column is perfectly unique across all 3,141 rows with zero nulls — matching exactly the canonical count of US counties and county-equivalents, confirming this is a complete national county dataset. The near-zero skew (−0.08) and platykurtic distribution (kurtosis −1.10) indicate values are spread broadly and fairly uniformly across the numeric range, which is expected since FIPS codes are administratively assigned rather than naturally distributed. Despite being stored as numeric, FIPS codes are identifiers and must not be treated as continuous values.

Treatment: Cast to zero-padded 5-character string and use as a join key; never use as a numeric feature.

anthropic:default · confidence high
Out[43]:

saturn.columns["combined_fips"].stats

statvalue
n3,141
nulls0 (0.0%)
unique3,141
min 1,001
max 56,045
mean 3.039e+04
median 29,177
std 1.516e+04
q1 18,179
q3 45,081
iqr 26,902
skew -0.08027
kurtosis -1.098
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 18.
Distribution of combined_fips. Vertical dash marks the median.
Show data table
Histogram bins for combined_fips (median: 29177.0).
bincount
1001 – 237796
2377 – 37530
3753 – 512980
5129 – 650568
6505 – 78820
7882 – 925872
9258 – 1.063e+043
1.063e+04 – 1.201e+046
1.201e+04 – 1.339e+04221
1.339e+04 – 1.476e+040
1.476e+04 – 1.614e+0448
1.614e+04 – 1.751e+04102
1.751e+04 – 1.889e+0492
1.889e+04 – 2.027e+04204
2.027e+04 – 2.164e+04120
2.164e+04 – 2.302e+0473
2.302e+04 – 2.439e+0430
2.439e+04 – 2.577e+0415
2.577e+04 – 2.715e+04156
2.715e+04 – 2.852e+0496
2.852e+04 – 2.99e+04115
2.99e+04 – 3.128e+04149
3.128e+04 – 3.265e+0417
3.265e+04 – 3.403e+0424
3.403e+04 – 3.54e+0440
3.54e+04 – 3.678e+0462
3.678e+04 – 3.816e+04153
3.816e+04 – 3.953e+0488
3.953e+04 – 4.091e+0477
4.091e+04 – 4.228e+04103
4.228e+04 – 4.366e+040
4.366e+04 – 4.504e+0423
4.504e+04 – 4.641e+0494
4.641e+04 – 4.779e+0495
4.779e+04 – 4.916e+04283
4.916e+04 – 5.054e+0414
5.054e+04 – 5.192e+04133
5.192e+04 – 5.329e+0439
5.329e+04 – 5.467e+0455
5.467e+04 – 5.604e+0495

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-us-presidential-election-results-by-county-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove us presidential election results by county},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-us-presidential-election-results-by-county}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove us presidential election results by county. Source: /home/coolhand/html/datavis/data_trove/geographic/election/2016_election.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-us-presidential-election-results-by-county