data trove us presidential election results by county
Reading
This dataset captures 2016 US presidential election results at the county level, covering all 3,141 counties across 51 state/territory abbreviations. The most striking pattern is the strong Republican lean in the median county: the median GOP vote share is 66.5% versus 28.6% for Democrats, though total votes are heavily right-skewed — a small number of large urban counties (max 2.65 million votes) dominate raw vote totals while most counties are small. The per-point difference column shows values ranging widely (e.g., 63% margins appear in the top values), suggesting many counties were not competitive at all. Texas leads with 254 counties, making state-level aggregation worth examining to see which states drive the most records and volume.
citing: per_gop.stats.median · per_dem.stats.median · total_votes.stats.max · total_votes.stats.median · state_abbr.stats.top_value · row_count · per_gop.stats.skew · votes_dem.stats.n_outliers
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 0.04122 – 0.06401 | 1 |
| 0.06401 – 0.0868 | 2 |
| 0.0868 – 0.1096 | 5 |
| 0.1096 – 0.1324 | 2 |
| 0.1324 – 0.1552 | 6 |
| 0.1552 – 0.1779 | 11 |
| 0.1779 – 0.2007 | 7 |
| 0.2007 – 0.2235 | 17 |
| 0.2235 – 0.2463 | 17 |
| 0.2463 – 0.2691 | 17 |
| 0.2691 – 0.2919 | 23 |
| 0.2919 – 0.3147 | 32 |
| 0.3147 – 0.3375 | 34 |
| 0.3375 – 0.3602 | 30 |
| 0.3602 – 0.383 | 43 |
| 0.383 – 0.4058 | 41 |
| 0.4058 – 0.4286 | 64 |
| 0.4286 – 0.4514 | 71 |
| 0.4514 – 0.4742 | 63 |
| 0.4742 – 0.497 | 89 |
| 0.497 – 0.5198 | 78 |
| 0.5198 – 0.5425 | 117 |
| 0.5425 – 0.5653 | 116 |
| 0.5653 – 0.5881 | 147 |
| 0.5881 – 0.6109 | 147 |
| 0.6109 – 0.6337 | 156 |
| 0.6337 – 0.6565 | 165 |
| 0.6565 – 0.6793 | 193 |
| 0.6793 – 0.7021 | 190 |
| 0.7021 – 0.7249 | 215 |
| 0.7249 – 0.7476 | 223 |
| 0.7476 – 0.7704 | 213 |
| 0.7704 – 0.7932 | 187 |
| 0.7932 – 0.816 | 142 |
| 0.816 – 0.8388 | 113 |
| 0.8388 – 0.8616 | 74 |
| 0.8616 – 0.8844 | 49 |
| 0.8844 – 0.9072 | 29 |
| 0.9072 – 0.9299 | 9 |
| 0.9299 – 0.9527 | 3 |
Show data table
| bin | count |
|---|---|
| 0.03145 – 0.05387 | 8 |
| 0.05387 – 0.0763 | 16 |
| 0.0763 – 0.09872 | 52 |
| 0.09872 – 0.1211 | 74 |
| 0.1211 – 0.1436 | 116 |
| 0.1436 – 0.166 | 146 |
| 0.166 – 0.1884 | 203 |
| 0.1884 – 0.2109 | 226 |
| 0.2109 – 0.2333 | 240 |
| 0.2333 – 0.2557 | 218 |
| 0.2557 – 0.2781 | 200 |
| 0.2781 – 0.3006 | 205 |
| 0.3006 – 0.323 | 153 |
| 0.323 – 0.3454 | 147 |
| 0.3454 – 0.3678 | 153 |
| 0.3678 – 0.3903 | 152 |
| 0.3903 – 0.4127 | 106 |
| 0.4127 – 0.4351 | 111 |
| 0.4351 – 0.4575 | 77 |
| 0.4575 – 0.48 | 78 |
| 0.48 – 0.5024 | 56 |
| 0.5024 – 0.5248 | 72 |
| 0.5248 – 0.5472 | 45 |
| 0.5472 – 0.5697 | 42 |
| 0.5697 – 0.5921 | 36 |
| 0.5921 – 0.6145 | 36 |
| 0.6145 – 0.6369 | 34 |
| 0.6369 – 0.6594 | 25 |
| 0.6594 – 0.6818 | 30 |
| 0.6818 – 0.7042 | 16 |
| 0.7042 – 0.7266 | 12 |
| 0.7266 – 0.7491 | 12 |
| 0.7491 – 0.7715 | 14 |
| 0.7715 – 0.7939 | 9 |
| 0.7939 – 0.8163 | 6 |
| 0.8163 – 0.8388 | 4 |
| 0.8388 – 0.8612 | 4 |
| 0.8612 – 0.8836 | 4 |
| 0.8836 – 0.906 | 2 |
| 0.906 – 0.9285 | 1 |
Show data table
| value | count | share |
|---|---|---|
| TX | 254 | 8.1% |
| GA | 159 | 5.1% |
| VA | 133 | 4.2% |
| KY | 120 | 3.8% |
| MO | 115 | 3.7% |
| KS | 105 | 3.3% |
| IL | 102 | 3.2% |
| NC | 100 | 3.2% |
| IA | 99 | 3.2% |
| TN | 95 | 3.0% |
| NE | 93 | 3.0% |
| IN | 92 | 2.9% |
| OH | 88 | 2.8% |
| MN | 87 | 2.8% |
| MI | 83 | 2.6% |
| MS | 82 | 2.6% |
| OK | 77 | 2.5% |
| AR | 75 | 2.4% |
| WI | 72 | 2.3% |
| AL | 67 | 2.1% |
Show data table
| bin | count |
|---|---|
| 64 – 6.636e+04 | 2699 |
| 6.636e+04 – 1.327e+05 | 192 |
| 1.327e+05 – 1.99e+05 | 77 |
| 1.99e+05 – 2.653e+05 | 70 |
| 2.653e+05 – 3.316e+05 | 33 |
| 3.316e+05 – 3.979e+05 | 20 |
| 3.979e+05 – 4.642e+05 | 12 |
| 4.642e+05 – 5.305e+05 | 4 |
| 5.305e+05 – 5.968e+05 | 7 |
| 5.968e+05 – 6.631e+05 | 9 |
| 6.631e+05 – 7.294e+05 | 4 |
| 7.294e+05 – 7.957e+05 | 5 |
| 7.957e+05 – 8.62e+05 | 1 |
| 8.62e+05 – 9.283e+05 | 1 |
| 9.283e+05 – 9.946e+05 | 1 |
| 9.946e+05 – 1.061e+06 | 1 |
| 1.061e+06 – 1.127e+06 | 1 |
| 1.127e+06 – 1.193e+06 | 0 |
| 1.193e+06 – 1.26e+06 | 1 |
| 1.26e+06 – 1.326e+06 | 1 |
| 1.326e+06 – 1.392e+06 | 0 |
| 1.392e+06 – 1.459e+06 | 0 |
| 1.459e+06 – 1.525e+06 | 0 |
| 1.525e+06 – 1.591e+06 | 0 |
| 1.591e+06 – 1.658e+06 | 0 |
| 1.658e+06 – 1.724e+06 | 0 |
| 1.724e+06 – 1.79e+06 | 0 |
| 1.79e+06 – 1.856e+06 | 0 |
| 1.856e+06 – 1.923e+06 | 0 |
| 1.923e+06 – 1.989e+06 | 0 |
| 1.989e+06 – 2.055e+06 | 1 |
| 2.055e+06 – 2.122e+06 | 0 |
| 2.122e+06 – 2.188e+06 | 0 |
| 2.188e+06 – 2.254e+06 | 0 |
| 2.254e+06 – 2.321e+06 | 0 |
| 2.321e+06 – 2.387e+06 | 0 |
| 2.387e+06 – 2.453e+06 | 0 |
| 2.453e+06 – 2.519e+06 | 0 |
| 2.519e+06 – 2.586e+06 | 0 |
| 2.586e+06 – 2.652e+06 | 1 |
Show data table
| chars | count |
|---|---|
| 6 – 7 | 29 |
| 7 – 7 | 0 |
| 7 – 8 | 0 |
| 8 – 8 | 0 |
| 8 – 9 | 0 |
| 9 – 9 | 0 |
| 9 – 10 | 0 |
| 10 – 10 | 29 |
| 10 – 11 | 0 |
| 11 – 11 | 255 |
| 11 – 12 | 0 |
| 12 – 12 | 465 |
| 12 – 13 | 0 |
| 13 – 13 | 683 |
| 13 – 14 | 0 |
| 14 – 14 | 585 |
| 14 – 15 | 0 |
| 15 – 15 | 485 |
| 15 – 16 | 0 |
| 16 – 16 | 280 |
| 16 – 17 | 202 |
| 17 – 18 | 0 |
| 18 – 18 | 52 |
| 18 – 19 | 0 |
| 19 – 19 | 40 |
| 19 – 20 | 0 |
| 20 – 20 | 15 |
| 20 – 21 | 0 |
| 21 – 21 | 11 |
| 21 – 22 | 0 |
| 22 – 22 | 6 |
| 22 – 23 | 0 |
| 23 – 23 | 2 |
| 23 – 24 | 0 |
| 24 – 24 | 1 |
| 24 – 25 | 0 |
| 25 – 25 | 0 |
| 25 – 26 | 0 |
| 26 – 26 | 0 |
| 26 – 27 | 1 |
Schema
11 columns| Alerts | ||||
|---|---|---|---|---|
| numeric | 0.0% | 3,141 |
|
|
| votes_dem | numeric | 0.0% | 2,688 |
high_skew
outliers
|
| votes_gop | numeric | 0.0% | 2,901 |
high_skew
outliers
|
| total_votes | numeric | 0.0% | 2,966 |
high_skew
outliers
|
| per_dem | numeric | 0.0% | 3,112 |
|
| per_gop | numeric | 0.0% | 3,112 |
|
| diff | text | 0.0% | 2,738 |
one_word
allcaps
short_text
|
| per_point_diff | text | 0.0% | 2,555 |
one_word
allcaps
short_text
|
| state_abbr | categorical | 0.0% | 51 |
|
| county_name | text | 0.0% | 1,848 |
short_text
duplicates
|
| combined_fips | numeric | 0.0% | 3,141 |
|
This column is almost certainly a row index or sequential integer ID, running from 0 to 3140 with every value unique and no nulls. The distribution is perfectly uniform: mean equals median at 1570.0, skew is exactly 0.0, kurtosis is –1.2 (consistent with a flat/uniform distribution), and there are zero outliers. The single surprising note is that zero_rate is non-zero (one zero present), which is simply the first index value (0) rather than a missing-data signal. Treatment: Drop before modelling; if row order matters, retain as an explicit sort key only.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 0
- max
- 3,140
- mean
- 1,570
- median
- 1,570
- std
- 906.9
- q1
- 785
- q3
- 2,355
- iqr
- 1,570
- skew
- 0
- kurtosis
- -1.2
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.0003184
votes_dem
numeric feature high_skew outliersThis column represents Democratic vote counts at the county level (n=3141 matches the number of U.S. counties), recording raw votes received per county in an election. The distribution is extremely right-skewed (skew=11.65, kurtosis=224.36): the median is only 3,194 but the mean is 20,734 and the max reaches 1,893,770, reflecting the enormous disparity between rural and urban counties. Nearly 15% of rows (468) are flagged as outliers, driven by large metropolitan counties. The min of 4 votes is plausible for the least-populated counties. Treatment: Log-transform (log1p) before regression or clustering to reduce skew; consider deriving vote share alongside raw count.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,688
- min
- 4
- max
- 1.894e+06
- mean
- 2.073e+04
- median
- 3,194
- std
- 7.2e+04
- q1
- 1,175
- q3
- 10,047
- iqr
- 8,872
- skew
- 11.65
- kurtosis
- 224.4
- n_outliers
- 468
- outlier_rate
- 0.149
- zero_rate
- 0
votes_gop
numeric feature high_skew outliersThis column records the raw count of Republican (GOP) votes per geographic unit, almost certainly at the U.S. county level given n=3141 (matching the ~3,143 U.S. counties). The distribution is extremely right-skewed (skew=5.78, kurtosis=51.78): the median is only 7,268 yet the mean is 20,645 and the max reaches 620,285, reflecting the massive population disparity between rural and urban/suburban counties. A notable 12.5% of rows (394) are flagged as outliers, corresponding to the largest-population counties that dwarf the typical small rural county. Treatment: Log-transform (log1p) before any regression or distance-based modelling to reduce skew; consider per-capita or vote-share normalisation if comparing across counties.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,901
- min
- 57
- max
- 620,285
- mean
- 2.065e+04
- median
- 7,268
- std
- 4.163e+04
- q1
- 3,241
- q3
- 18,130
- iqr
- 14,889
- skew
- 5.78
- kurtosis
- 51.78
- n_outliers
- 394
- outlier_rate
- 0.1254
- zero_rate
- 0
total_votes
numeric feature high_skew outliersThis column represents the total vote count for records in the dataset, with values ranging from 64 to 2,652,072. The distribution is severely right-skewed (skew = 8.89, kurtosis = 136.17): the median is only 11,144 while the mean is 43,637, indicating a long upper tail driven by 442 outliers (14.1% of rows) far above the IQR ceiling of ~29,799. The spread (std = 114,568) is more than 2.5× the mean, confirming that a small number of items attract disproportionately large vote counts. Treatment: Log-transform before modelling to compress the extreme right tail and reduce outlier leverage.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,966
- min
- 64
- max
- 2.652e+06
- mean
- 4.364e+04
- median
- 11,144
- std
- 1.146e+05
- q1
- 4,870
- q3
- 29,799
- iqr
- 24,929
- skew
- 8.894
- kurtosis
- 136.2
- n_outliers
- 442
- outlier_rate
- 0.1407
- zero_rate
- 0
per_dem
numeric numeric_targetThis column almost certainly represents the Democratic party vote share (proportion) at the county level — the 3,141 rows match the number of U.S. counties exactly, and values are bounded between 0.031 and 0.928 with a mean of 0.318 and median of 0.286, consistent with Democratic vote shares skewing below 50% across most counties. The positive skew (0.942) reflects a long right tail of heavily Democratic urban counties pulling the mean above the median, while the bulk of counties are Republican-leaning. Near-uniqueness (3,112 of 3,141 values distinct) and zero null rate confirm clean, continuous proportional data with no structural issues. Treatment: Use directly as a regression target or feature; consider logit-transform (log-odds) to map the bounded [0,1] proportion to an unbounded scale before modelling.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,112
- min
- 0.03145
- max
- 0.9285
- mean
- 0.3176
- median
- 0.2864
- std
- 0.153
- q1
- 0.2054
- q3
- 0.3982
- iqr
- 0.1929
- skew
- 0.9422
- kurtosis
- 0.6859
- n_outliers
- 76
- outlier_rate
- 0.0242
- zero_rate
- 0
per_gop
numeric numeric_targetThis column represents the Republican (GOP) vote share as a proportion (0–1 scale), almost certainly at the U.S. county level given n=3141, which closely matches the total number of U.S. counties. The distribution is left-skewed (skew = -0.82) with a median of 0.665, indicating most counties lean Republican — a well-known feature of county-level electoral geography where rural counties are numerous and heavily GOP. The range (0.041 to 0.953) is plausible for partisan vote shares, and 63 outliers (2%) likely correspond to heavily urban or heavily rural counties at the extremes. Treatment: Use directly as a regression target or feature; consider logit-transforming the proportion to unbound it from [0,1] for linear models.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,112
- min
- 0.04122
- max
- 0.9527
- mean
- 0.6351
- median
- 0.6654
- std
- 0.1561
- q1
- 0.5458
- q3
- 0.7503
- iqr
- 0.2045
- skew
- -0.8193
- kurtosis
- 0.376
- n_outliers
- 63
- outlier_rate
- 0.02006
- zero_rate
- 0
diff
text feature one_word allcaps short_textThis column contains formatted numeric values (integers with comma thousand-separators) stored as text, representing some kind of difference or delta metric — likely a count differential. Despite being classified as text, all 3,141 values are single tokens with a mean length of 4.9 characters and 99.2% 'all-caps' rate (a quirk of how digit strings are scored by the profiler). The dominant value '37,410' appears 29 times — roughly 7× more frequent than any other value — which is a notable outlier in the frequency distribution and may warrant investigation for data entry repetition or a sentinel/default value. Treatment: Strip commas, cast to integer, investigate the 29 occurrences of '37,410' as a potential sentinel before modelling.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,738
- len_min
- 1
- len_max
- 9
- len_mean
- 4.935
- len_median
- 5
- len_p95
- 6
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 403
- duplicate_rate
- 0.1283
- vocab_size
- 2,738
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 0.9924
- boilerplate_rate
- 0
per_point_diff
text feature one_word allcaps short_textThis column stores percentage values representing a per-point differential (likely a margin or rate metric), encoded as strings with a '%' suffix rather than as numeric floats — all 3,141 values are single uppercase tokens between 5 and 6 characters long. The allcaps_rate of 1.0 is a classifier artifact from the '%' symbol, not actual uppercase text. Surprisingly, 18.7% of rows (586) are duplicates, with '15.17%' alone appearing 31 times, suggesting repeated measurements or grouped records sharing the same differential. The column should be numeric but was ingested as text. Treatment: Strip '%' suffix and cast to float before modelling; investigate the 31 occurrences of '15.17%' for data quality or grouping issues.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,555
- len_min
- 5
- len_max
- 6
- len_mean
- 5.896
- len_median
- 6
- len_p95
- 6
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 586
- duplicate_rate
- 0.1866
- vocab_size
- 2,555
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
state_abbr
categorical labelThis column contains US state abbreviations covering all 51 values (50 states + DC), with zero nulls across 3,141 rows — consistent with a county-level dataset where n≈3,141 matches the known US county count. TX dominates with 254 rows (8.09% of records), aligning exactly with Texas's 254 counties, confirming county-level granularity. The entropy ratio of 0.93 indicates near-uniform distribution across states, which is expected given that state representation is proportional to county count rather than population. Treatment: Use as a grouping/aggregation key for state-level rollups; one-hot encode or target-encode if used as a model feature.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- TX
- top_rate
- 0.08087
- cardinality
- 51
- entropy
- 5.275
- entropy_ratio
- 0.9299
county_name
text label short_text duplicatesThis column contains US county (and equivalent) names, covering all 3,141 US counties/county-equivalents with zero nulls — a near-complete national roster. The 41.2% duplicate rate (1,293 duplicates across 1,848 unique values) is expected and not anomalous: common names like 'Washington County' appear 30 times and 'Jefferson County' 25 times because the same county name exists across multiple states. Notably, 'Alaska' appears 29 times as a bare state name rather than a borough/census area name, which may signal inconsistent formatting for Alaska's county-equivalents. The word 'parish' (64 occurrences) and 'city' (43 occurrences) confirm Louisiana parishes and independent cities are included alongside standard counties. Treatment: Use as a grouping/join key paired with state to ensure uniqueness; investigate and standardize the 29 'Alaska' bare-state entries before aggregation.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 1,848
- len_min
- 6
- len_max
- 27
- len_mean
- 13.87
- len_median
- 14
- len_p95
- 17
- word_mean
- 2.054
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 1,293
- duplicate_rate
- 0.4117
- vocab_size
- 1,840
- readability_flesch_mean
- 38.38
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.009233
- allcaps_rate
- 0
- boilerplate_rate
- 0
combined_fips
numeric identifierThis column contains US county-level FIPS codes, 5-digit numeric identifiers where the first 2 digits encode the state and the last 3 encode the county. The column is perfectly unique across all 3,141 rows with zero nulls — matching exactly the canonical count of US counties and county-equivalents, confirming this is a complete national county dataset. The near-zero skew (−0.08) and platykurtic distribution (kurtosis −1.10) indicate values are spread broadly and fairly uniformly across the numeric range, which is expected since FIPS codes are administratively assigned rather than naturally distributed. Despite being stored as numeric, FIPS codes are identifiers and must not be treated as continuous values. Treatment: Cast to zero-padded 5-character string and use as a join key; never use as a numeric feature.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 1,001
- max
- 56,045
- mean
- 3.039e+04
- median
- 29,177
- std
- 1.516e+04
- q1
- 18,179
- q3
- 45,081
- iqr
- 26,902
- skew
- -0.08027
- kurtosis
- -1.098
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0