saturn

/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/2016_election.csv 3,141 rows sample n=3,141 seed 42 2026-05-01T17:34:18+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/2016_election.csv
Total rows3,141
Profiled sample3,141
Columns11
Generated2026-05-01T17:34:18+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 3,141 rows and 11 columns covering 2016 U.S. presidential election results at the county level, including total votes, Democratic and Republican vote counts and shares, and county/state identifiers. Vote-count columns (total_votes, votes_dem, votes_gop) are extremely right-skewed with high kurtosis and many outliers, reflecting a few very populous counties dominating the totals — worth a log-scale or filtered view. The per_gop and per_dem share columns tell a clearer story: per_gop has a mean of about 0.64 versus per_dem at 0.32, indicating Republican margins were larger across most counties. State coverage is broad (51 categories) with Texas (254 counties) and Georgia (159) most represented, so any state-level aggregation should account for that imbalance.

high anthropic:claude-opus-4-7

This unnamed numeric column runs from 0 to 3140 with exactly 3141 unique values across 3141 rows, mean and median both 1570, and zero skew — the hallmarks of a row index rather than a measured feature. There are no nulls and no outliers, and the only zero is the single index-0 row (zero_rate ≈ 0.00032).

votes_dem high anthropic:claude-opus-4-7

Counts of Democratic votes per row (likely a US county or precinct), ranging from 4 to 1,893,770 with a median of 3,194 but a mean of 20,734. The distribution is extremely right-skewed (skew 11.65, kurtosis 224.4), and 468 rows (14.9%) flag as outliers — consistent with a few large urban jurisdictions dwarfing the rest. No nulls or zeros, and 2,688 unique values across 3,141 rows.

votes_gop high anthropic:claude-opus-4-7

Per-county GOP vote totals across 3,141 rows, almost all distinct (2,901 unique) and never null or zero. The distribution is heavily right-skewed (skew 5.78, kurtosis 51.78) with a median of 7,268 but a max of 620,285, and 394 rows (12.5%) flagged as outliers — consistent with a few very populous counties dwarfing the rest.

total_votes high anthropic:claude-opus-4-7

Per-row vote totals across 3,141 records, almost all distinct (2,966 unique) with no nulls or zeros. The distribution is severely right-skewed (skew 8.89, kurtosis 136.17): the median is 11,144 yet the mean is 43,636 and the max reaches 2,652,072, far above Q3 of 29,799. About 14% of rows (442) flag as outliers, consistent with a few very high-vote jurisdictions dominating the tail.

per_dem high anthropic:claude-opus-4-7

Values are continuous proportions bounded between 0.031 and 0.928 with mean 0.318 and median 0.286, consistent with a per-unit Democratic vote share across 3,141 rows (matching the U.S. county count). Distribution is right-skewed (skew 0.94) with 76 outliers (2.4%) on the upper tail, reflecting a minority of heavily Democratic units. Near-unique values (3,112/3,141) and zero null/zero rates indicate a clean, fully-populated feature.

per_gop high anthropic:claude-opus-4-7

Likely the Republican (GOP) vote share per geographic unit (e.g., county), bounded between 0.041 and 0.953 with no nulls and 3112 unique values across 3141 rows. The distribution is left-skewed (skew -0.82) with a median of 0.665 above the mean of 0.635, indicating most units lean Republican while a tail of low-GOP units pulls the mean down. 63 outliers (2.0%) sit outside the IQR fence, consistent with strongly Democratic enclaves.

diff high anthropic:claude-opus-4-7

Despite being typed as text, `diff` is a single-token numeric field stored as comma-formatted strings (one_word_rate 1.0, len_mean ~4.9, max length 9). All 3,141 rows are populated with 2,738 unique values and 403 duplicates (12.8%); the value '37,410' appears 29 times, far above any other, suggesting either a sentinel or a heavily repeated magnitude. The allcaps and short_text alerts are artefacts of digits-only content rather than real prose.

per_point_diff high anthropic:claude-opus-4-7

This column stores a per-point differential as a percentage string (e.g. '15.17%', '63.21%'), with lengths tightly bound between 5 and 6 characters and exactly one token per cell. Despite 2555 unique values across 3141 rows, the duplicate rate is 18.7% and '15.17%' alone appears 31 times — far more than any other value, which is worth checking. The values are stored as text with a trailing '%', not as numbers.

state_abbr high anthropic:claude-opus-4-7

This column holds US state abbreviations, with 51 unique values across 3141 rows and no nulls — consistent with one row per US county (50 states plus DC). The distribution tracks county counts rather than population: TX leads at 254 (8.1%), followed by GA (159), VA (133), and KY (120). Entropy ratio of 0.93 indicates a fairly even spread across states.

county_name high anthropic:claude-opus-4-7

This column holds US county names — 3,006 of 3,141 rows contain the word 'county', with 'parish' (64) and 'city' (43) covering Louisiana and Virginia equivalents. Names repeat heavily across states: 1,293 duplicates (41.2%) leave only 1,848 unique values, with 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24) leading. One oddity: 'Alaska' appears 29 times as a bare state name, breaking the county/parish/city pattern.

combined_fips high anthropic:claude-opus-4-7

This is almost certainly the 5-digit combined state+county FIPS code (state*1000 + county), with all 3141 values unique and no nulls — matching the count of US counties. The range 1001 to 56045 spans Alabama (01) through Wyoming (56), and the near-zero skew reflects roughly uniform numeric county codes across states rather than a meaningful distribution.

Numeric correlation

numeric

rows3,141
null0 (0.0%)
unique3,141
min0.000
max3,140
mean1,570
median1,570
std906.873
q1785.000
q32,355
iqr1,570
skew0.000
kurtosis-1.200
n_outliers0
outlier_rate0.000
zero_rate3.18e-04

votes_dem numeric

skew=+11.65 14.9% rows beyond 1.5 IQR
rows3,141
null0 (0.0%)
unique2,688
min4.000
max1,893,770
mean20,734
median3,194
std72,004
q11,175
q310,047
iqr8,872
skew11.652
kurtosis224.356
n_outliers468
outlier_rate0.149
zero_rate0.000

votes_gop numeric

skew=+5.78 12.5% rows beyond 1.5 IQR
rows3,141
null0 (0.0%)
unique2,901
min57.000
max620,285
mean20,645
median7,268
std41,627
q13,241
q318,130
iqr14,889
skew5.780
kurtosis51.776
n_outliers394
outlier_rate0.125
zero_rate0.000

total_votes numeric

skew=+8.89 14.1% rows beyond 1.5 IQR
rows3,141
null0 (0.0%)
unique2,966
min64.000
max2,652,072
mean43,637
median11,144
std114,568
q14,870
q329,799
iqr24,929
skew8.894
kurtosis136.168
n_outliers442
outlier_rate0.141
zero_rate0.000

per_dem numeric

rows3,141
null0 (0.0%)
unique3,112
min0.031
max0.928
mean0.318
median0.286
std0.153
q10.205
q30.398
iqr0.193
skew0.942
kurtosis0.686
n_outliers76
outlier_rate0.024
zero_rate0.000

per_gop numeric

rows3,141
null0 (0.0%)
unique3,112
min0.041
max0.953
mean0.635
median0.665
std0.156
q10.546
q30.750
iqr0.205
skew-0.819
kurtosis0.376
n_outliers63
outlier_rate0.020
zero_rate0.000

diff text

100.0% rows are a single word 99.2% rows are all-caps 95th-percentile length under 20 chars
rows3,141
null0 (0.0%)
unique2,738
len_min1
len_max9
len_mean4.935
len_median5.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates403
duplicate_rate0.128
vocab_size2,738
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.992
boilerplate_rate0.000
Sample values (first 10)
  1. 37,410
  2. 653
  3. 1,713
  4. 862
  5. 575
  6. 63,321
  7. 601
  8. 1,326
  9. 169
  10. 3,658

per_point_diff text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows3,141
null0 (0.0%)
unique2,555
len_min5
len_max6
len_mean5.896
len_median6.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates586
duplicate_rate0.187
vocab_size2,555
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 15.17%
  2. 23.77%
  3. 48.08%
  4. 50.06%
  5. 1.98%
  6. 27.14%
  7. 32.12%
  8. 22.89%
  9. 6.30%
  10. 54.61%

state_abbr categorical

rows3,141
null0 (0.0%)
unique51
top_valueTX
top_rate0.081
cardinality51
entropy5.275
entropy_ratio0.930
Top values (rank 1–20)
  1. TX — 254
  2. GA — 159
  3. VA — 133
  4. KY — 120
  5. MO — 115
  6. KS — 105
  7. IL — 102
  8. NC — 100
  9. IA — 99
  10. TN — 95
  11. NE — 93
  12. IN — 92
  13. OH — 88
  14. MN — 87
  15. MI — 83
  16. MS — 82
  17. OK — 77
  18. AR — 75
  19. WI — 72
  20. AL — 67

county_name text

95th-percentile length under 20 chars 41.2% duplicate strings
rows3,141
null0 (0.0%)
unique1,848
len_min6
len_max27
len_mean13.869
len_median14.000
len_p9517.000
word_mean2.054
word_median2.000
n_empty0
n_duplicates1,293
duplicate_rate0.412
vocab_size1,840
readability_flesch_mean38.384
emoji_rate0.000
url_rate0.000
one_word_rate9.23e-03
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Alaska
  2. Day County
  3. San Augustine County
  4. Fisher County
  5. Clay County
  6. Waukesha County
  7. Red Lake County
  8. Charlotte County
  9. Frio County
  10. Franklin County

combined_fips numeric

rows3,141
null0 (0.0%)
unique3,141
min1,001
max56,045
mean30,389
median29,177
std15,162
q118,179
q345,081
iqr26,902
skew-0.080
kurtosis-1.098
n_outliers0
outlier_rate0.000
zero_rate0.000