saturn

/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/2016_election.csv 3,141 rows sample n=3,141 seed 42 2026-05-01T17:34:18+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/2016_election.csv
Total rows	3,141
Profiled sample	3,141
Columns	11
Generated	2026-05-01T17:34:18+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 3,141 rows and 11 columns covering 2016 U.S. presidential election results at the county level, including total votes, Democratic and Republican vote counts and shares, and county/state identifiers. Vote-count columns (total_votes, votes_dem, votes_gop) are extremely right-skewed with high kurtosis and many outliers, reflecting a few very populous counties dominating the totals — worth a log-scale or filtered view. The per_gop and per_dem share columns tell a clearer story: per_gop has a mean of about 0.64 versus per_dem at 0.32, indicating Republican margins were larger across most counties. State coverage is broad (51 categories) with Texas (254 counties) and Georgia (159) most represented, so any state-level aggregation should account for that imbalance.

high anthropic:claude-opus-4-7

This unnamed numeric column runs from 0 to 3140 with exactly 3141 unique values across 3141 rows, mean and median both 1570, and zero skew — the hallmarks of a row index rather than a measured feature. There are no nulls and no outliers, and the only zero is the single index-0 row (zero_rate ≈ 0.00032).

votes_dem high anthropic:claude-opus-4-7

Counts of Democratic votes per row (likely a US county or precinct), ranging from 4 to 1,893,770 with a median of 3,194 but a mean of 20,734. The distribution is extremely right-skewed (skew 11.65, kurtosis 224.4), and 468 rows (14.9%) flag as outliers — consistent with a few large urban jurisdictions dwarfing the rest. No nulls or zeros, and 2,688 unique values across 3,141 rows.

votes_gop high anthropic:claude-opus-4-7

Per-county GOP vote totals across 3,141 rows, almost all distinct (2,901 unique) and never null or zero. The distribution is heavily right-skewed (skew 5.78, kurtosis 51.78) with a median of 7,268 but a max of 620,285, and 394 rows (12.5%) flagged as outliers — consistent with a few very populous counties dwarfing the rest.

total_votes high anthropic:claude-opus-4-7

Per-row vote totals across 3,141 records, almost all distinct (2,966 unique) with no nulls or zeros. The distribution is severely right-skewed (skew 8.89, kurtosis 136.17): the median is 11,144 yet the mean is 43,636 and the max reaches 2,652,072, far above Q3 of 29,799. About 14% of rows (442) flag as outliers, consistent with a few very high-vote jurisdictions dominating the tail.

per_dem high anthropic:claude-opus-4-7

Values are continuous proportions bounded between 0.031 and 0.928 with mean 0.318 and median 0.286, consistent with a per-unit Democratic vote share across 3,141 rows (matching the U.S. county count). Distribution is right-skewed (skew 0.94) with 76 outliers (2.4%) on the upper tail, reflecting a minority of heavily Democratic units. Near-unique values (3,112/3,141) and zero null/zero rates indicate a clean, fully-populated feature.

per_gop high anthropic:claude-opus-4-7

Likely the Republican (GOP) vote share per geographic unit (e.g., county), bounded between 0.041 and 0.953 with no nulls and 3112 unique values across 3141 rows. The distribution is left-skewed (skew -0.82) with a median of 0.665 above the mean of 0.635, indicating most units lean Republican while a tail of low-GOP units pulls the mean down. 63 outliers (2.0%) sit outside the IQR fence, consistent with strongly Democratic enclaves.

diff high anthropic:claude-opus-4-7

Despite being typed as text, `diff` is a single-token numeric field stored as comma-formatted strings (one_word_rate 1.0, len_mean ~4.9, max length 9). All 3,141 rows are populated with 2,738 unique values and 403 duplicates (12.8%); the value '37,410' appears 29 times, far above any other, suggesting either a sentinel or a heavily repeated magnitude. The allcaps and short_text alerts are artefacts of digits-only content rather than real prose.

per_point_diff high anthropic:claude-opus-4-7

This column stores a per-point differential as a percentage string (e.g. '15.17%', '63.21%'), with lengths tightly bound between 5 and 6 characters and exactly one token per cell. Despite 2555 unique values across 3141 rows, the duplicate rate is 18.7% and '15.17%' alone appears 31 times — far more than any other value, which is worth checking. The values are stored as text with a trailing '%', not as numbers.

state_abbr high anthropic:claude-opus-4-7

This column holds US state abbreviations, with 51 unique values across 3141 rows and no nulls — consistent with one row per US county (50 states plus DC). The distribution tracks county counts rather than population: TX leads at 254 (8.1%), followed by GA (159), VA (133), and KY (120). Entropy ratio of 0.93 indicates a fairly even spread across states.

county_name high anthropic:claude-opus-4-7

This column holds US county names — 3,006 of 3,141 rows contain the word 'county', with 'parish' (64) and 'city' (43) covering Louisiana and Virginia equivalents. Names repeat heavily across states: 1,293 duplicates (41.2%) leave only 1,848 unique values, with 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24) leading. One oddity: 'Alaska' appears 29 times as a bare state name, breaking the county/parish/city pattern.

combined_fips high anthropic:claude-opus-4-7

This is almost certainly the 5-digit combined state+county FIPS code (state*1000 + county), with all 3141 values unique and no nulls — matching the count of US counties. The range 1001 to 56045 spans Alabama (01) through Wyoming (56), and the near-zero skew reflects roughly uniform numeric county codes across states rather than a meaningful distribution.

Numeric correlation

numeric

rows3,141

null0 (0.0%)

unique3,141

min0.000

max3,140

mean1,570

median1,570

std906.873

q1785.000

q32,355

iqr1,570

skew0.000

kurtosis-1.200

n_outliers0

outlier_rate0.000

zero_rate3.18e-04

votes_dem numeric

skew=+11.65 14.9% rows beyond 1.5 IQR

rows3,141

null0 (0.0%)

unique2,688

min4.000

max1,893,770

mean20,734

median3,194

std72,004

q11,175

q310,047

iqr8,872

skew11.652

kurtosis224.356

n_outliers468

outlier_rate0.149

zero_rate0.000

votes_gop numeric

skew=+5.78 12.5% rows beyond 1.5 IQR

rows3,141

null0 (0.0%)

unique2,901

min57.000

max620,285

mean20,645

median7,268

std41,627

q13,241

q318,130

iqr14,889

skew5.780

kurtosis51.776

n_outliers394

outlier_rate0.125

zero_rate0.000

total_votes numeric

skew=+8.89 14.1% rows beyond 1.5 IQR

rows3,141

null0 (0.0%)

unique2,966

min64.000

max2,652,072

mean43,637

median11,144

std114,568

q14,870

q329,799

iqr24,929

skew8.894

kurtosis136.168

n_outliers442

outlier_rate0.141

zero_rate0.000

per_dem numeric

rows3,141

null0 (0.0%)

unique3,112

min0.031

max0.928

mean0.318

median0.286

std0.153

q10.205

q30.398

iqr0.193

skew0.942

kurtosis0.686

n_outliers76

outlier_rate0.024

zero_rate0.000

per_gop numeric

rows3,141

null0 (0.0%)

unique3,112

min0.041

max0.953

mean0.635

median0.665

std0.156

q10.546

q30.750

iqr0.205

skew-0.819

kurtosis0.376

n_outliers63

outlier_rate0.020

zero_rate0.000

diff text

100.0% rows are a single word 99.2% rows are all-caps 95th-percentile length under 20 chars

rows3,141

null0 (0.0%)

unique2,738

len_min1

len_max9

len_mean4.935

len_median5.000

len_p956.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates403

duplicate_rate0.128

vocab_size2,738

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate0.992

boilerplate_rate0.000

Sample values (first 10)

37,410
653
1,713
862
575
63,321
601
1,326
169
3,658

per_point_diff text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars

rows3,141

null0 (0.0%)

unique2,555

len_min5

len_max6

len_mean5.896

len_median6.000

len_p956.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates586

duplicate_rate0.187

vocab_size2,555

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Sample values (first 10)

15.17%
23.77%
48.08%
50.06%
1.98%
27.14%
32.12%
22.89%
6.30%
54.61%

state_abbr categorical

rows3,141

null0 (0.0%)

unique51

top_valueTX

top_rate0.081

cardinality51

entropy5.275

entropy_ratio0.930

Top values (rank 1–20)

TX — 254
GA — 159
VA — 133
KY — 120
MO — 115
KS — 105
IL — 102
NC — 100
IA — 99
TN — 95
NE — 93
IN — 92
OH — 88
MN — 87
MI — 83
MS — 82
OK — 77
AR — 75
WI — 72
AL — 67

county_name text

95th-percentile length under 20 chars 41.2% duplicate strings

rows3,141

null0 (0.0%)

unique1,848

len_min6

len_max27

len_mean13.869

len_median14.000

len_p9517.000

word_mean2.054

word_median2.000

n_empty0

n_duplicates1,293

duplicate_rate0.412

vocab_size1,840

readability_flesch_mean38.384

emoji_rate0.000

url_rate0.000

one_word_rate9.23e-03

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

Alaska
Day County
San Augustine County
Fisher County
Clay County
Waukesha County
Red Lake County
Charlotte County
Frio County
Franklin County

combined_fips numeric

rows3,141

null0 (0.0%)

unique3,141

min1,001

max56,045

mean30,389

median29,177

std15,162

q118,179

q345,081

iqr26,902

skew-0.080

kurtosis-1.098

n_outliers0

outlier_rate0.000

zero_rate0.000