nationwide 2016 election
Reading
This dataset contains 3,141 rows and 11 columns covering 2016 U.S. presidential election results at the county level, including total votes, Democratic and Republican vote counts and shares, and county/state identifiers. Vote-count columns (total_votes, votes_dem, votes_gop) are extremely right-skewed with high kurtosis and many outliers, reflecting a few very populous counties dominating the totals — worth a log-scale or filtered view. The per_gop and per_dem share columns tell a clearer story: per_gop has a mean of about 0.64 versus per_dem at 0.32, indicating Republican margins were larger across most counties. State coverage is broad (51 categories) with Texas (254 counties) and Georgia (159) most represented, so any state-level aggregation should account for that imbalance.
citing: row_count · column_count · total_votes · votes_dem · votes_gop · per_dem · per_gop · state_abbr · county_name
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 0.04122 – 0.06401 | 1 |
| 0.06401 – 0.0868 | 2 |
| 0.0868 – 0.1096 | 5 |
| 0.1096 – 0.1324 | 2 |
| 0.1324 – 0.1552 | 6 |
| 0.1552 – 0.1779 | 11 |
| 0.1779 – 0.2007 | 7 |
| 0.2007 – 0.2235 | 17 |
| 0.2235 – 0.2463 | 17 |
| 0.2463 – 0.2691 | 17 |
| 0.2691 – 0.2919 | 23 |
| 0.2919 – 0.3147 | 32 |
| 0.3147 – 0.3375 | 34 |
| 0.3375 – 0.3602 | 30 |
| 0.3602 – 0.383 | 43 |
| 0.383 – 0.4058 | 41 |
| 0.4058 – 0.4286 | 64 |
| 0.4286 – 0.4514 | 71 |
| 0.4514 – 0.4742 | 63 |
| 0.4742 – 0.497 | 89 |
| 0.497 – 0.5198 | 78 |
| 0.5198 – 0.5425 | 117 |
| 0.5425 – 0.5653 | 116 |
| 0.5653 – 0.5881 | 147 |
| 0.5881 – 0.6109 | 147 |
| 0.6109 – 0.6337 | 156 |
| 0.6337 – 0.6565 | 165 |
| 0.6565 – 0.6793 | 193 |
| 0.6793 – 0.7021 | 190 |
| 0.7021 – 0.7249 | 215 |
| 0.7249 – 0.7476 | 223 |
| 0.7476 – 0.7704 | 213 |
| 0.7704 – 0.7932 | 187 |
| 0.7932 – 0.816 | 142 |
| 0.816 – 0.8388 | 113 |
| 0.8388 – 0.8616 | 74 |
| 0.8616 – 0.8844 | 49 |
| 0.8844 – 0.9072 | 29 |
| 0.9072 – 0.9299 | 9 |
| 0.9299 – 0.9527 | 3 |
Show data table
| bin | count |
|---|---|
| 0.03145 – 0.05387 | 8 |
| 0.05387 – 0.0763 | 16 |
| 0.0763 – 0.09872 | 52 |
| 0.09872 – 0.1211 | 74 |
| 0.1211 – 0.1436 | 116 |
| 0.1436 – 0.166 | 146 |
| 0.166 – 0.1884 | 203 |
| 0.1884 – 0.2109 | 226 |
| 0.2109 – 0.2333 | 240 |
| 0.2333 – 0.2557 | 218 |
| 0.2557 – 0.2781 | 200 |
| 0.2781 – 0.3006 | 205 |
| 0.3006 – 0.323 | 153 |
| 0.323 – 0.3454 | 147 |
| 0.3454 – 0.3678 | 153 |
| 0.3678 – 0.3903 | 152 |
| 0.3903 – 0.4127 | 106 |
| 0.4127 – 0.4351 | 111 |
| 0.4351 – 0.4575 | 77 |
| 0.4575 – 0.48 | 78 |
| 0.48 – 0.5024 | 56 |
| 0.5024 – 0.5248 | 72 |
| 0.5248 – 0.5472 | 45 |
| 0.5472 – 0.5697 | 42 |
| 0.5697 – 0.5921 | 36 |
| 0.5921 – 0.6145 | 36 |
| 0.6145 – 0.6369 | 34 |
| 0.6369 – 0.6594 | 25 |
| 0.6594 – 0.6818 | 30 |
| 0.6818 – 0.7042 | 16 |
| 0.7042 – 0.7266 | 12 |
| 0.7266 – 0.7491 | 12 |
| 0.7491 – 0.7715 | 14 |
| 0.7715 – 0.7939 | 9 |
| 0.7939 – 0.8163 | 6 |
| 0.8163 – 0.8388 | 4 |
| 0.8388 – 0.8612 | 4 |
| 0.8612 – 0.8836 | 4 |
| 0.8836 – 0.906 | 2 |
| 0.906 – 0.9285 | 1 |
Show data table
| bin | count |
|---|---|
| 64 – 6.636e+04 | 2699 |
| 6.636e+04 – 1.327e+05 | 192 |
| 1.327e+05 – 1.99e+05 | 77 |
| 1.99e+05 – 2.653e+05 | 70 |
| 2.653e+05 – 3.316e+05 | 33 |
| 3.316e+05 – 3.979e+05 | 20 |
| 3.979e+05 – 4.642e+05 | 12 |
| 4.642e+05 – 5.305e+05 | 4 |
| 5.305e+05 – 5.968e+05 | 7 |
| 5.968e+05 – 6.631e+05 | 9 |
| 6.631e+05 – 7.294e+05 | 4 |
| 7.294e+05 – 7.957e+05 | 5 |
| 7.957e+05 – 8.62e+05 | 1 |
| 8.62e+05 – 9.283e+05 | 1 |
| 9.283e+05 – 9.946e+05 | 1 |
| 9.946e+05 – 1.061e+06 | 1 |
| 1.061e+06 – 1.127e+06 | 1 |
| 1.127e+06 – 1.193e+06 | 0 |
| 1.193e+06 – 1.26e+06 | 1 |
| 1.26e+06 – 1.326e+06 | 1 |
| 1.326e+06 – 1.392e+06 | 0 |
| 1.392e+06 – 1.459e+06 | 0 |
| 1.459e+06 – 1.525e+06 | 0 |
| 1.525e+06 – 1.591e+06 | 0 |
| 1.591e+06 – 1.658e+06 | 0 |
| 1.658e+06 – 1.724e+06 | 0 |
| 1.724e+06 – 1.79e+06 | 0 |
| 1.79e+06 – 1.856e+06 | 0 |
| 1.856e+06 – 1.923e+06 | 0 |
| 1.923e+06 – 1.989e+06 | 0 |
| 1.989e+06 – 2.055e+06 | 1 |
| 2.055e+06 – 2.122e+06 | 0 |
| 2.122e+06 – 2.188e+06 | 0 |
| 2.188e+06 – 2.254e+06 | 0 |
| 2.254e+06 – 2.321e+06 | 0 |
| 2.321e+06 – 2.387e+06 | 0 |
| 2.387e+06 – 2.453e+06 | 0 |
| 2.453e+06 – 2.519e+06 | 0 |
| 2.519e+06 – 2.586e+06 | 0 |
| 2.586e+06 – 2.652e+06 | 1 |
Show data table
| value | count | share |
|---|---|---|
| TX | 254 | 8.1% |
| GA | 159 | 5.1% |
| VA | 133 | 4.2% |
| KY | 120 | 3.8% |
| MO | 115 | 3.7% |
| KS | 105 | 3.3% |
| IL | 102 | 3.2% |
| NC | 100 | 3.2% |
| IA | 99 | 3.2% |
| TN | 95 | 3.0% |
| NE | 93 | 3.0% |
| IN | 92 | 2.9% |
| OH | 88 | 2.8% |
| MN | 87 | 2.8% |
| MI | 83 | 2.6% |
| MS | 82 | 2.6% |
| OK | 77 | 2.5% |
| AR | 75 | 2.4% |
| WI | 72 | 2.3% |
| AL | 67 | 2.1% |
Show data table
| bin | count |
|---|---|
| 57 – 1.556e+04 | 2260 |
| 1.556e+04 – 3.107e+04 | 381 |
| 3.107e+04 – 4.657e+04 | 153 |
| 4.657e+04 – 6.208e+04 | 100 |
| 6.208e+04 – 7.759e+04 | 54 |
| 7.759e+04 – 9.309e+04 | 32 |
| 9.309e+04 – 1.086e+05 | 27 |
| 1.086e+05 – 1.241e+05 | 23 |
| 1.241e+05 – 1.396e+05 | 47 |
| 1.396e+05 – 1.551e+05 | 13 |
| 1.551e+05 – 1.706e+05 | 14 |
| 1.706e+05 – 1.861e+05 | 4 |
| 1.861e+05 – 2.016e+05 | 8 |
| 2.016e+05 – 2.171e+05 | 2 |
| 2.171e+05 – 2.326e+05 | 2 |
| 2.326e+05 – 2.481e+05 | 2 |
| 2.481e+05 – 2.637e+05 | 4 |
| 2.637e+05 – 2.792e+05 | 3 |
| 2.792e+05 – 2.947e+05 | 1 |
| 2.947e+05 – 3.102e+05 | 1 |
| 3.102e+05 – 3.257e+05 | 1 |
| 3.257e+05 – 3.412e+05 | 2 |
| 3.412e+05 – 3.567e+05 | 1 |
| 3.567e+05 – 3.722e+05 | 0 |
| 3.722e+05 – 3.877e+05 | 1 |
| 3.877e+05 – 4.032e+05 | 0 |
| 4.032e+05 – 4.187e+05 | 0 |
| 4.187e+05 – 4.342e+05 | 0 |
| 4.342e+05 – 4.497e+05 | 1 |
| 4.497e+05 – 4.652e+05 | 0 |
| 4.652e+05 – 4.807e+05 | 1 |
| 4.807e+05 – 4.962e+05 | 0 |
| 4.962e+05 – 5.117e+05 | 0 |
| 5.117e+05 – 5.273e+05 | 0 |
| 5.273e+05 – 5.428e+05 | 0 |
| 5.428e+05 – 5.583e+05 | 1 |
| 5.583e+05 – 5.738e+05 | 0 |
| 5.738e+05 – 5.893e+05 | 0 |
| 5.893e+05 – 6.048e+05 | 1 |
| 6.048e+05 – 6.203e+05 | 1 |
Schema
11 columns| Alerts | ||||
|---|---|---|---|---|
| numeric | 0.0% | 3,141 |
|
|
| votes_dem | numeric | 0.0% | 2,688 |
high_skew
outliers
|
| votes_gop | numeric | 0.0% | 2,901 |
high_skew
outliers
|
| total_votes | numeric | 0.0% | 2,966 |
high_skew
outliers
|
| per_dem | numeric | 0.0% | 3,112 |
|
| per_gop | numeric | 0.0% | 3,112 |
|
| diff | text | 0.0% | 2,738 |
one_word
allcaps
short_text
|
| per_point_diff | text | 0.0% | 2,555 |
one_word
allcaps
short_text
|
| state_abbr | categorical | 0.0% | 51 |
|
| county_name | text | 0.0% | 1,848 |
short_text
duplicates
|
| combined_fips | numeric | 0.0% | 3,141 |
|
This unnamed numeric column runs from 0 to 3140 with exactly 3141 unique values across 3141 rows, mean and median both 1570, and zero skew — the hallmarks of a row index rather than a measured feature. There are no nulls and no outliers, and the only zero is the single index-0 row (zero_rate ≈ 0.00032). Treatment: Drop before modelling; it is a sequential row index.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 0
- max
- 3,140
- mean
- 1,570
- median
- 1,570
- std
- 906.9
- q1
- 785
- q3
- 2,355
- iqr
- 1,570
- skew
- 0
- kurtosis
- -1.2
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.0003184
votes_dem
numeric feature high_skew outliersCounts of Democratic votes per row (likely a US county or precinct), ranging from 4 to 1,893,770 with a median of 3,194 but a mean of 20,734. The distribution is extremely right-skewed (skew 11.65, kurtosis 224.4), and 468 rows (14.9%) flag as outliers — consistent with a few large urban jurisdictions dwarfing the rest. No nulls or zeros, and 2,688 unique values across 3,141 rows. Treatment: Log-transform before regression or convert to a share/per-capita rate to tame the skew.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,688
- min
- 4
- max
- 1.894e+06
- mean
- 2.073e+04
- median
- 3,194
- std
- 7.2e+04
- q1
- 1,175
- q3
- 10,047
- iqr
- 8,872
- skew
- 11.65
- kurtosis
- 224.4
- n_outliers
- 468
- outlier_rate
- 0.149
- zero_rate
- 0
votes_gop
numeric feature high_skew outliersPer-county GOP vote totals across 3,141 rows, almost all distinct (2,901 unique) and never null or zero. The distribution is heavily right-skewed (skew 5.78, kurtosis 51.78) with a median of 7,268 but a max of 620,285, and 394 rows (12.5%) flagged as outliers — consistent with a few very populous counties dwarfing the rest. Treatment: Log-transform or normalize by county population before modelling.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,901
- min
- 57
- max
- 620,285
- mean
- 2.065e+04
- median
- 7,268
- std
- 4.163e+04
- q1
- 3,241
- q3
- 18,130
- iqr
- 14,889
- skew
- 5.78
- kurtosis
- 51.78
- n_outliers
- 394
- outlier_rate
- 0.1254
- zero_rate
- 0
total_votes
numeric feature high_skew outliersPer-row vote totals across 3,141 records, almost all distinct (2,966 unique) with no nulls or zeros. The distribution is severely right-skewed (skew 8.89, kurtosis 136.17): the median is 11,144 yet the mean is 43,636 and the max reaches 2,652,072, far above Q3 of 29,799. About 14% of rows (442) flag as outliers, consistent with a few very high-vote jurisdictions dominating the tail. Treatment: log-transform before regression or aggregation to tame the heavy right tail.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,966
- min
- 64
- max
- 2.652e+06
- mean
- 4.364e+04
- median
- 11,144
- std
- 1.146e+05
- q1
- 4,870
- q3
- 29,799
- iqr
- 24,929
- skew
- 8.894
- kurtosis
- 136.2
- n_outliers
- 442
- outlier_rate
- 0.1407
- zero_rate
- 0
per_dem
numeric featureValues are continuous proportions bounded between 0.031 and 0.928 with mean 0.318 and median 0.286, consistent with a per-unit Democratic vote share across 3,141 rows (matching the U.S. county count). Distribution is right-skewed (skew 0.94) with 76 outliers (2.4%) on the upper tail, reflecting a minority of heavily Democratic units. Near-unique values (3,112/3,141) and zero null/zero rates indicate a clean, fully-populated feature. Treatment: Use as-is as a bounded proportion; consider a logit transform if feeding a linear model due to right skew.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,112
- min
- 0.03145
- max
- 0.9285
- mean
- 0.3176
- median
- 0.2864
- std
- 0.153
- q1
- 0.2054
- q3
- 0.3982
- iqr
- 0.1929
- skew
- 0.9422
- kurtosis
- 0.6859
- n_outliers
- 76
- outlier_rate
- 0.0242
- zero_rate
- 0
per_gop
numeric featureLikely the Republican (GOP) vote share per geographic unit (e.g., county), bounded between 0.041 and 0.953 with no nulls and 3112 unique values across 3141 rows. The distribution is left-skewed (skew -0.82) with a median of 0.665 above the mean of 0.635, indicating most units lean Republican while a tail of low-GOP units pulls the mean down. 63 outliers (2.0%) sit outside the IQR fence, consistent with strongly Democratic enclaves. Treatment: Use as-is as a proportion feature; consider a logit transform if feeding a linear model.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,112
- min
- 0.04122
- max
- 0.9527
- mean
- 0.6351
- median
- 0.6654
- std
- 0.1561
- q1
- 0.5458
- q3
- 0.7503
- iqr
- 0.2045
- skew
- -0.8193
- kurtosis
- 0.376
- n_outliers
- 63
- outlier_rate
- 0.02006
- zero_rate
- 0
diff
text feature one_word allcaps short_textDespite being typed as text, `diff` is a single-token numeric field stored as comma-formatted strings (one_word_rate 1.0, len_mean ~4.9, max length 9). All 3,141 rows are populated with 2,738 unique values and 403 duplicates (12.8%); the value '37,410' appears 29 times, far above any other, suggesting either a sentinel or a heavily repeated magnitude. The allcaps and short_text alerts are artefacts of digits-only content rather than real prose. Treatment: strip commas and cast to numeric before modelling, and investigate the spike at 37,410.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,738
- len_min
- 1
- len_max
- 9
- len_mean
- 4.935
- len_median
- 5
- len_p95
- 6
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 403
- duplicate_rate
- 0.1283
- vocab_size
- 2,738
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 0.9924
- boilerplate_rate
- 0
per_point_diff
text feature one_word allcaps short_textThis column stores a per-point differential as a percentage string (e.g. '15.17%', '63.21%'), with lengths tightly bound between 5 and 6 characters and exactly one token per cell. Despite 2555 unique values across 3141 rows, the duplicate rate is 18.7% and '15.17%' alone appears 31 times — far more than any other value, which is worth checking. The values are stored as text with a trailing '%', not as numbers. Treatment: Strip the '%' and cast to float before any numeric modelling.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 2,555
- len_min
- 5
- len_max
- 6
- len_mean
- 5.896
- len_median
- 6
- len_p95
- 6
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 586
- duplicate_rate
- 0.1866
- vocab_size
- 2,555
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
state_abbr
categorical foreign_keyThis column holds US state abbreviations, with 51 unique values across 3141 rows and no nulls — consistent with one row per US county (50 states plus DC). The distribution tracks county counts rather than population: TX leads at 254 (8.1%), followed by GA (159), VA (133), and KY (120). Entropy ratio of 0.93 indicates a fairly even spread across states. Treatment: Use as a categorical grouping key or left-join to a state-level reference table.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- TX
- top_rate
- 0.08087
- cardinality
- 51
- entropy
- 5.275
- entropy_ratio
- 0.9299
county_name
text feature short_text duplicatesThis column holds US county names — 3,006 of 3,141 rows contain the word 'county', with 'parish' (64) and 'city' (43) covering Louisiana and Virginia equivalents. Names repeat heavily across states: 1,293 duplicates (41.2%) leave only 1,848 unique values, with 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24) leading. One oddity: 'Alaska' appears 29 times as a bare state name, breaking the county/parish/city pattern. Treatment: Pair with a state column before joining or grouping; the name alone is not unique.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 1,848
- len_min
- 6
- len_max
- 27
- len_mean
- 13.87
- len_median
- 14
- len_p95
- 17
- word_mean
- 2.054
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 1,293
- duplicate_rate
- 0.4117
- vocab_size
- 1,840
- readability_flesch_mean
- 38.38
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.009233
- allcaps_rate
- 0
- boilerplate_rate
- 0
combined_fips
numeric identifierThis is almost certainly the 5-digit combined state+county FIPS code (state*1000 + county), with all 3141 values unique and no nulls — matching the count of US counties. The range 1001 to 56045 spans Alabama (01) through Wyoming (56), and the near-zero skew reflects roughly uniform numeric county codes across states rather than a meaningful distribution. Treatment: treat as a categorical key; left-join on this code rather than using as a numeric feature.
- n
- 3,141
- nulls
- 0 (0.0%)
- unique
- 3,141
- min
- 1,001
- max
- 56,045
- mean
- 3.039e+04
- median
- 29,177
- std
- 1.516e+04
- q1
- 18,179
- q3
- 45,081
- iqr
- 26,902
- skew
- -0.08027
- kurtosis
- -1.098
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0