saturn·

data trove scars standardized county analysis research system

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/geographic/scars/master_dataset.csv

Saturn profiled 3,221 rows across 20 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/geographic/scars/master_dataset.csv",
    "--findings", "data-trove-scars-standardized-county-analysis-research-system.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset covers 3,221 U.S. counties with demographic, economic, and electoral variables for the 2016 and 2020 presidential elections. The most striking finding is that Republican candidates dominated the majority of counties in both cycles — the median Republican share was roughly 67% in 2016 and 68% in 2020, while the Democratic median hovered near 29–30%, reflecting the well-known rural-county skew in U.S. politics. A data quality issue worth flagging immediately is the median_household_income column, which contains a minimum value of -666,666,666 — almost certainly a sentinel/error value — dragging the column mean to -$152,820 despite a plausible median of $52,380. Poverty rate averages about 15% across counties but reaches as high as 66%, and racial composition variables (pct_white, pct_black, pct_hispanic) are highly skewed, suggesting a small number of majority-minority counties sit at the extremes.

citing: republican_pct_2016.stats.median · republican_pct_2020.stats.median · democratic_pct_2016.stats.median · democratic_pct_2020.stats.median · median_household_income.stats.min · median_household_income.stats.median · poverty_rate.stats.mean · poverty_rate.stats.max · pct_white.stats.mean · pct_white.stats.skew · row_count

Out[4]:

saturn.schema() · 20 columns

column kind n null% unique alerts
NAME text 3,221 0.0% 3,221 near_unique
total_population numeric 3,221 0.0% 3,160 high_skew outliers
black_population numeric 3,221 0.0% 2,066 high_skew outliers
white_population numeric 3,221 0.0% 3,143 high_skew outliers
hispanic_population numeric 3,221 0.0% 2,331 high_skew outliers
state numeric 3,221 0.0% 52
county numeric 3,221 0.0% 326 high_skew outliers
FIPS numeric 3,221 0.0% 3,221
pct_black numeric 3,221 0.0% 3,128 high_skew outliers
pct_white numeric 3,221 0.0% 3,218
pct_hispanic numeric 3,221 0.0% 3,205 high_skew outliers
poverty_rate numeric 3,221 0.0% 3,219 high_skew
below_poverty_level numeric 3,221 0.0% 2,824 high_skew outliers
median_household_income numeric 3,221 0.0% 3,099 high_skew outliers
margin_2020 numeric 3,221 3.4% 3,112
democratic_pct_2020 numeric 3,221 3.4% 3,111
republican_pct_2020 numeric 3,221 3.4% 3,111
margin_2016 text 3,221 2.6% 2,554 one_word allcaps short_text
democratic_pct_2016 numeric 3,221 2.6% 3,111
republican_pct_2016 numeric 3,221 2.6% 3,111
Fig 1.
republican_pct_2020 · Look for the strong right-skewed peak above 0.5, confirming Republican dominance across most U.S. counties in 2020.
Show data table
Histogram bins for republican_pct_2020 (median: 0.6829120557612961).
bincount
0.05397 – 0.076671
0.07667 – 0.099372
0.09937 – 0.12212
0.1221 – 0.14486
0.1448 – 0.16756
0.1675 – 0.190115
0.1901 – 0.21285
0.2128 – 0.235513
0.2355 – 0.258212
0.2582 – 0.280925
0.2809 – 0.303626
0.3036 – 0.326332
0.3263 – 0.34932
0.349 – 0.371740
0.3717 – 0.394446
0.3944 – 0.417152
0.4171 – 0.439864
0.4398 – 0.462578
0.4625 – 0.485261
0.4852 – 0.507978
0.5079 – 0.530666
0.5306 – 0.553397
0.5533 – 0.576126
0.576 – 0.5987122
0.5987 – 0.6214139
0.6214 – 0.6441143
0.6441 – 0.6668154
0.6668 – 0.6895173
0.6895 – 0.7122176
0.7122 – 0.7349200
0.7349 – 0.7576213
0.7576 – 0.7802195
0.7802 – 0.8029203
0.8029 – 0.8256169
0.8256 – 0.8483132
0.8483 – 0.871103
0.871 – 0.893765
0.8937 – 0.916427
0.9164 – 0.93919
0.9391 – 0.96184
Fig 2.
margin_2020 · The distribution of victory margins reveals how lopsided most county-level results are, with few truly competitive counties near zero.
Show data table
Histogram bins for margin_2020 (median: 0.3843813151543954).
bincount
-0.8675 – -0.82261
-0.8226 – -0.77762
-0.7776 – -0.73263
-0.7326 – -0.68775
-0.6877 – -0.64278
-0.6427 – -0.597812
-0.5978 – -0.55286
-0.5528 – -0.507812
-0.5078 – -0.462914
-0.4629 – -0.417922
-0.4179 – -0.37328
-0.373 – -0.32833
-0.328 – -0.28329
-0.283 – -0.238146
-0.2381 – -0.193141
-0.1931 – -0.148254
-0.1482 – -0.103263
-0.1032 – -0.0582367
-0.05823 – -0.0132769
-0.01327 – 0.0316973
0.03169 – 0.0766570
0.07665 – 0.121690
0.1216 – 0.1666131
0.1666 – 0.2115117
0.2115 – 0.2565129
0.2565 – 0.3015141
0.3015 – 0.3464159
0.3464 – 0.3914165
0.3914 – 0.4363181
0.4363 – 0.4813206
0.4813 – 0.5263195
0.5263 – 0.5712197
0.5712 – 0.6162213
0.6162 – 0.6611175
0.6611 – 0.7061140
0.7061 – 0.7511102
0.7511 – 0.79669
0.796 – 0.84129
0.841 – 0.885911
0.8859 – 0.93094
Fig 3.
poverty_rate · Poverty rate peaks around 14% but has a long right tail extending to 66% — watch for the outlier counties driving extreme values.
Show data table
Histogram bins for poverty_rate (median: 13.807805224676027).
bincount
0 – 1.6552
1.655 – 3.319
3.31 – 4.96448
4.964 – 6.619123
6.619 – 8.274212
8.274 – 9.929317
9.929 – 11.58378
11.58 – 13.24392
13.24 – 14.89353
14.89 – 16.55338
16.55 – 18.2235
18.2 – 19.86192
19.86 – 21.51157
21.51 – 23.17108
23.17 – 24.8277
24.82 – 26.4853
26.48 – 28.1341
28.13 – 29.7937
29.79 – 31.4429
31.44 – 33.113
33.1 – 34.7510
34.75 – 36.4111
36.41 – 38.067
38.06 – 39.722
39.72 – 41.378
41.37 – 43.036
43.03 – 44.687
44.68 – 46.3311
46.33 – 47.996
47.99 – 49.649
49.64 – 51.38
51.3 – 52.955
52.95 – 54.616
54.61 – 56.262
56.26 – 57.922
57.92 – 59.573
59.57 – 61.231
61.23 – 62.881
62.88 – 64.540
64.54 – 66.192
Fig 4.
pct_white · Most counties are majority-white with a left-skewed distribution, but the lower tail highlights a distinct cluster of majority-minority counties.
Show data table
Histogram bins for pct_white (median: 87.65979926043318).
bincount
3.29 – 5.7082
5.708 – 8.1251
8.125 – 10.544
10.54 – 12.962
12.96 – 15.3810
15.38 – 17.86
17.8 – 20.214
20.21 – 22.637
22.63 – 25.0512
25.05 – 27.4712
27.47 – 29.897
29.89 – 32.34
32.3 – 34.7210
34.72 – 37.1413
37.14 – 39.5624
39.56 – 41.9718
41.97 – 44.3924
44.39 – 46.8130
46.81 – 49.2324
49.23 – 51.6434
51.64 – 54.0633
54.06 – 56.4837
56.48 – 58.958
58.9 – 61.3243
61.32 – 63.7369
63.73 – 66.1572
66.15 – 68.5777
68.57 – 70.9976
70.99 – 73.488
73.4 – 75.82100
75.82 – 78.2498
78.24 – 80.66143
80.66 – 83.08132
83.08 – 85.49163
85.49 – 87.91191
87.91 – 90.33246
90.33 – 92.75302
92.75 – 95.16453
95.16 – 97.58525
97.58 – 10067
Fig 5.
pct_black · Heavily right-skewed with median near 2.4% but reaching 88% — a small number of counties account for most of the Black population share.
Show data table
Histogram bins for pct_black (median: 2.382606279522089).
bincount
0 – 2.1951568
2.195 – 4.39402
4.39 – 6.584218
6.584 – 8.779153
8.779 – 10.97112
10.97 – 13.1786
13.17 – 15.3670
15.36 – 17.5654
17.56 – 19.7546
19.75 – 21.9548
21.95 – 24.1436
24.14 – 26.3442
26.34 – 28.5336
28.53 – 30.7341
30.73 – 32.9234
32.92 – 35.1232
35.12 – 37.3128
37.31 – 39.5119
39.51 – 41.725
41.7 – 43.925
43.9 – 46.0918
46.09 – 48.2817
48.28 – 50.4814
50.48 – 52.679
52.67 – 54.8713
54.87 – 57.0611
57.06 – 59.2613
59.26 – 61.457
61.45 – 63.658
63.65 – 65.844
65.84 – 68.041
68.04 – 70.236
70.23 – 72.438
72.43 – 74.625
74.62 – 76.822
76.82 – 79.015
79.01 – 81.211
81.21 – 83.41
83.4 – 85.61
85.6 – 87.792
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
NAMEtext0.0%
total_populationnumeric0.0%
black_populationnumeric0.0%
white_populationnumeric0.0%
hispanic_populationnumeric0.0%
statenumeric0.0%
countynumeric0.0%
FIPSnumeric0.0%
pct_blacknumeric0.0%
pct_whitenumeric0.0%
pct_hispanicnumeric0.0%
poverty_ratenumeric0.0%
below_poverty_levelnumeric0.0%
median_household_incomenumeric0.0%
margin_2020numeric3.4%
democratic_pct_2020numeric3.4%
republican_pct_2020numeric3.4%
margin_2016text2.6%
democratic_pct_2016numeric2.6%
republican_pct_2016numeric2.6%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
total_populationblack_populationwhite_populationhispanic_populationstatecountyFIPSpct_blackpct_whitepct_hispanicpoverty_ratebelow_poverty_level
total_population+1.00+0.69+0.98+0.88-0.06-0.10-0.06+0.04-0.19+0.08-0.15+0.93
black_population+0.69+1.00+0.63+0.44-0.01-0.03-0.01+0.27-0.29+0.01-0.04+0.77
white_population+0.98+0.63+1.00+0.86-0.06-0.11-0.06+0.01-0.12+0.07-0.17+0.89
hispanic_population+0.88+0.44+0.86+1.00-0.06-0.07-0.06-0.01-0.15+0.22-0.02+0.86
state-0.06-0.01-0.06-0.06+1.00+0.07+1.00-0.07+0.02+0.37+0.22-0.05
county-0.10-0.03-0.11-0.07+0.07+1.00+0.07+0.09-0.07+0.07+0.11-0.08
FIPS-0.06-0.01-0.06-0.06+1.00+0.07+1.00-0.07+0.02+0.37+0.22-0.05
pct_black+0.04+0.27+0.01-0.01-0.07+0.09-0.07+1.00-0.79-0.02+0.39+0.10
pct_white-0.19-0.29-0.12-0.15+0.02-0.07+0.02-0.79+1.00-0.24-0.48-0.23
pct_hispanic+0.08+0.01+0.07+0.22+0.37+0.07+0.37-0.02-0.24+1.00+0.53+0.14
poverty_rate-0.15-0.04-0.17-0.02+0.22+0.11+0.22+0.39-0.48+0.53+1.00-0.01
below_poverty_level+0.93+0.77+0.89+0.86-0.05-0.08-0.05+0.10-0.23+0.14-0.01+1.00

NAME text label

This column contains US county names, formatted with the word 'county' included (e.g., 'Jefferson County, Texas'), as evidenced by 'county,' appearing in 3,007 of 3,221 rows and US state names dominating the top words. Every value is unique (3,221 distinct entries, 0 duplicates, 0 nulls), making this a natural identifier for county-level records. The mean string length of ~24 characters and mean word count of ~3.2 are consistent with a 'Name County, State' pattern. The near-perfect vocabulary of 1,983 words across 3,221 rows suggests structured, standardized naming rather than free text.

Treatment: Use as a human-readable label or join key; normalize casing and strip trailing state suffix if joining to external county tables.

anthropic:default · confidence high
Out[13]:

saturn.columns["NAME"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,221
len_min 16
len_max 42
len_mean 24.27
len_median 24
len_p95 31
word_mean 3.243
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,983
readability_flesch_mean 7.581
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 8.
Character-length distribution for NAME.
Show data table
Character-length distribution for NAME (mean: 24.26637690158336).
charscount
16 – 173
17 – 1723
17 – 180
18 – 1972
19 – 19121
19 – 200
20 – 21190
21 – 21264
21 – 220
22 – 22407
22 – 23420
23 – 240
24 – 24363
24 – 25320
25 – 260
26 – 26240
26 – 27233
27 – 280
28 – 28153
28 – 290
29 – 30142
30 – 3086
30 – 310
31 – 3281
32 – 3241
32 – 330
33 – 3428
34 – 3416
34 – 350
35 – 3610
36 – 364
36 – 370
37 – 370
37 – 381
38 – 390
39 – 391
39 – 400
40 – 410
41 – 411
41 – 421

total_population numeric feature

This column represents the total population count for geographic or administrative units (e.g., counties, municipalities, or census tracts), ranging from 117 to 10,040,682. The distribution is severely right-skewed (skew = 13.67, kurtosis = 311.91): the median of 25,981 is less than a quarter of the mean of 102,398, indicating a long tail driven by a small number of very large population centers. An outlier rate of 13.7% (441 of 3,221 rows) is unusually high and signals that large urban units coexist with many small rural units in the same dataset.

Treatment: Log-transform before regression or distance-based modelling to reduce skew and outlier influence.

anthropic:default · confidence high
Out[16]:

saturn.columns["total_population"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,160
min 117
max 1.004e+07
mean 1.024e+05
median 25,981
std 3.283e+05
q1 11,125
q3 66,969
iqr 55,844
skew 13.67
kurtosis 311.9
n_outliers 441
outlier_rate 0.1369
zero_rate 0
alert: high_skewskew=+13.67
alert: outliers13.7% rows beyond 1.5 IQR
Fig 9.
Distribution of total_population. Vertical dash marks the median.
Show data table
Histogram bins for total_population (median: 25981.0).
bincount
117 – 2.511e+052946
2.511e+05 – 5.021e+05135
5.021e+05 – 7.532e+0553
7.532e+05 – 1.004e+0642
1.004e+06 – 1.255e+0612
1.255e+06 – 1.506e+069
1.506e+06 – 1.757e+066
1.757e+06 – 2.008e+063
2.008e+06 – 2.259e+064
2.259e+06 – 2.51e+062
2.51e+06 – 2.761e+063
2.761e+06 – 3.012e+060
3.012e+06 – 3.263e+061
3.263e+06 – 3.514e+061
3.514e+06 – 3.765e+060
3.765e+06 – 4.016e+060
4.016e+06 – 4.267e+060
4.267e+06 – 4.518e+061
4.518e+06 – 4.769e+061
4.769e+06 – 5.02e+060
5.02e+06 – 5.271e+061
5.271e+06 – 5.522e+060
5.522e+06 – 5.773e+060
5.773e+06 – 6.024e+060
6.024e+06 – 6.275e+060
6.275e+06 – 6.526e+060
6.526e+06 – 6.777e+060
6.777e+06 – 7.029e+060
7.029e+06 – 7.28e+060
7.28e+06 – 7.531e+060
7.531e+06 – 7.782e+060
7.782e+06 – 8.033e+060
8.033e+06 – 8.284e+060
8.284e+06 – 8.535e+060
8.535e+06 – 8.786e+060
8.786e+06 – 9.037e+060
9.037e+06 – 9.288e+060
9.288e+06 – 9.539e+060
9.539e+06 – 9.79e+060
9.79e+06 – 1.004e+071

black_population numeric feature

This column represents the count of Black residents per geographic unit (likely U.S. counties or census tracts). The distribution is extremely right-skewed (skew=10.46, kurtosis=148.22), with a median of just 859 versus a mean of 12,913 and a maximum of 1,202,260 — indicating a small number of high-population urban areas dominating the tail. 438 outliers (13.6% of rows) and a std of 54,951 against a median of 859 confirm the vast majority of units are small while a few are very large; 2.8% of records are zero, likely rural or sparsely populated geographies.

Treatment: Log-transform (log1p) before modelling to compress the extreme right tail; consider per-capita normalisation if total population is available.

anthropic:default · confidence high
Out[19]:

saturn.columns["black_population"].stats

statvalue
n3,221
nulls0 (0.0%)
unique2,066
min 0
max 1.202e+06
mean 1.291e+04
median 859
std 5.495e+04
q1 114
q3 5,553
iqr 5,439
skew 10.46
kurtosis 148.2
n_outliers 438
outlier_rate 0.136
zero_rate 0.02825
alert: high_skewskew=+10.46
alert: outliers13.6% rows beyond 1.5 IQR
Fig 10.
Distribution of black_population. Vertical dash marks the median.
Show data table
Histogram bins for black_population (median: 859.0).
bincount
0 – 3.006e+042968
3.006e+04 – 6.011e+04108
6.011e+04 – 9.017e+0441
9.017e+04 – 1.202e+0533
1.202e+05 – 1.503e+0512
1.503e+05 – 1.803e+0514
1.803e+05 – 2.104e+058
2.104e+05 – 2.405e+053
2.405e+05 – 2.705e+057
2.705e+05 – 3.006e+056
3.006e+05 – 3.306e+052
3.306e+05 – 3.607e+052
3.607e+05 – 3.907e+052
3.907e+05 – 4.208e+052
4.208e+05 – 4.508e+050
4.508e+05 – 4.809e+052
4.809e+05 – 5.11e+052
5.11e+05 – 5.41e+050
5.41e+05 – 5.711e+052
5.711e+05 – 6.011e+051
6.011e+05 – 6.312e+050
6.312e+05 – 6.612e+051
6.612e+05 – 6.913e+051
6.913e+05 – 7.214e+050
7.214e+05 – 7.514e+050
7.514e+05 – 7.815e+050
7.815e+05 – 8.115e+052
8.115e+05 – 8.416e+050
8.416e+05 – 8.716e+050
8.716e+05 – 9.017e+051
9.017e+05 – 9.318e+050
9.318e+05 – 9.618e+050
9.618e+05 – 9.919e+050
9.919e+05 – 1.022e+060
1.022e+06 – 1.052e+060
1.052e+06 – 1.082e+060
1.082e+06 – 1.112e+060
1.112e+06 – 1.142e+060
1.142e+06 – 1.172e+060
1.172e+06 – 1.202e+061

white_population numeric feature

This column represents the white population count for geographic units (likely counties or census tracts), with 3,221 non-null records spanning a wide range from 58 to 4,795,186. The distribution is severely right-skewed (skew = 10.35, kurtosis = 175.65): the median is only 21,282 while the mean is 72,000, indicating most units are small but a long tail of large urban areas dominates — 407 records (12.6%) are flagged as outliers. The near-unique value count (3,143 of 3,221) confirms this is a raw count feature, not a category or ID.

Treatment: Log-transform (log1p) before modelling to reduce skew and compress the extreme outlier range.

anthropic:default · confidence high
Out[22]:

saturn.columns["white_population"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,143
min 58
max 4.795e+06
mean 7.2e+04
median 21,282
std 1.918e+05
q1 8,855
q3 56,553
iqr 47,698
skew 10.35
kurtosis 175.7
n_outliers 407
outlier_rate 0.1264
zero_rate 0
alert: high_skewskew=+10.35
alert: outliers12.6% rows beyond 1.5 IQR
Fig 11.
Distribution of white_population. Vertical dash marks the median.
Show data table
Histogram bins for white_population (median: 21282.0).
bincount
58 – 1.199e+052795
1.199e+05 – 2.398e+05216
2.398e+05 – 3.597e+0574
3.597e+05 – 4.796e+0547
4.796e+05 – 5.994e+0529
5.994e+05 – 7.193e+0524
7.193e+05 – 8.392e+056
8.392e+05 – 9.591e+059
9.591e+05 – 1.079e+063
1.079e+06 – 1.199e+063
1.199e+06 – 1.319e+064
1.319e+06 – 1.439e+063
1.439e+06 – 1.558e+061
1.558e+06 – 1.678e+060
1.678e+06 – 1.798e+061
1.798e+06 – 1.918e+061
1.918e+06 – 2.038e+060
2.038e+06 – 2.158e+060
2.158e+06 – 2.278e+061
2.278e+06 – 2.398e+060
2.398e+06 – 2.518e+060
2.518e+06 – 2.637e+060
2.637e+06 – 2.757e+061
2.757e+06 – 2.877e+061
2.877e+06 – 2.997e+060
2.997e+06 – 3.117e+060
3.117e+06 – 3.237e+060
3.237e+06 – 3.357e+061
3.357e+06 – 3.477e+060
3.477e+06 – 3.596e+060
3.596e+06 – 3.716e+060
3.716e+06 – 3.836e+060
3.836e+06 – 3.956e+060
3.956e+06 – 4.076e+060
4.076e+06 – 4.196e+060
4.196e+06 – 4.316e+060
4.316e+06 – 4.436e+060
4.436e+06 – 4.555e+060
4.555e+06 – 4.675e+060
4.675e+06 – 4.795e+061

hispanic_population numeric feature

This column represents the Hispanic population count for geographic units (e.g., counties, census tracts, or ZIP codes) across 3,221 records. The distribution is extremely right-skewed (skew = 22.75, kurtosis = 744.79), with a median of only 1,209 but a mean of 19,427 and a maximum of 4,851,344 — indicating a small number of large urban areas dominate the distribution. 15.3% of records (492 rows) are flagged as outliers, and the IQR spans just 377–5,875 while the std is 125,108, confirming the extreme concentration of values at the low end with a long heavy tail.

Treatment: Log-transform (log1p) before modelling to reduce extreme skew; consider per-capita normalization if total population is available.

anthropic:default · confidence high
Out[25]:

saturn.columns["hispanic_population"].stats

statvalue
n3,221
nulls0 (0.0%)
unique2,331
min 0
max 4.851e+06
mean 1.943e+04
median 1,209
std 1.251e+05
q1 377
q3 5,875
iqr 5,498
skew 22.75
kurtosis 744.8
n_outliers 492
outlier_rate 0.1527
zero_rate 0.004967
alert: high_skewskew=+22.75
alert: outliers15.3% rows beyond 1.5 IQR
Fig 12.
Distribution of hispanic_population. Vertical dash marks the median.
Show data table
Histogram bins for hispanic_population (median: 1209.0).
bincount
0 – 1.213e+053124
1.213e+05 – 2.426e+0555
2.426e+05 – 3.639e+0513
3.639e+05 – 4.851e+059
4.851e+05 – 6.064e+054
6.064e+05 – 7.277e+053
7.277e+05 – 8.49e+052
8.49e+05 – 9.703e+050
9.703e+05 – 1.092e+062
1.092e+06 – 1.213e+064
1.213e+06 – 1.334e+061
1.334e+06 – 1.455e+061
1.455e+06 – 1.577e+060
1.577e+06 – 1.698e+060
1.698e+06 – 1.819e+060
1.819e+06 – 1.941e+061
1.941e+06 – 2.062e+061
2.062e+06 – 2.183e+060
2.183e+06 – 2.304e+060
2.304e+06 – 2.426e+060
2.426e+06 – 2.547e+060
2.547e+06 – 2.668e+060
2.668e+06 – 2.79e+060
2.79e+06 – 2.911e+060
2.911e+06 – 3.032e+060
3.032e+06 – 3.153e+060
3.153e+06 – 3.275e+060
3.275e+06 – 3.396e+060
3.396e+06 – 3.517e+060
3.517e+06 – 3.639e+060
3.639e+06 – 3.76e+060
3.76e+06 – 3.881e+060
3.881e+06 – 4.002e+060
4.002e+06 – 4.124e+060
4.124e+06 – 4.245e+060
4.245e+06 – 4.366e+060
4.366e+06 – 4.487e+060
4.487e+06 – 4.609e+060
4.609e+06 – 4.73e+060
4.73e+06 – 4.851e+061

state numeric foreign_key

This column named 'state' is almost certainly a numeric state code (e.g., FIPS state codes or similar enumeration), with 52 distinct integer values ranging from 1 to 72 — consistent with US FIPS codes covering 50 states plus DC and outlying territories such as Puerto Rico (72). The distribution is remarkably flat and near-uniform (low kurtosis of -0.63, near-zero skew of 0.16, IQR of 27 across a 1–72 range), with zero nulls and zero outliers, indicating a clean, fully-populated categorical-as-integer field. The presence of 52 unique values rather than 50 or 51 suggests territorial codes are included, which may surprise analysts expecting only the 50 US states.

Treatment: Treat as a categorical nominal code; do not use raw numeric value in regression — one-hot encode or left-join to a state reference table for geographic attributes.

anthropic:default · confidence high
Out[28]:

saturn.columns["state"].stats

statvalue
n3,221
nulls0 (0.0%)
unique52
min 1
max 72
mean 31.28
median 30
std 16.28
q1 19
q3 46
iqr 27
skew 0.157
kurtosis -0.6261
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 13.
Distribution of state. Vertical dash marks the median.
Show data table
Histogram bins for state (median: 30.0).
bincount
1 – 2.77597
2.775 – 4.5515
4.55 – 6.325133
6.325 – 8.164
8.1 – 9.8758
9.875 – 11.654
11.65 – 13.42226
13.42 – 15.25
15.2 – 16.9844
16.98 – 18.75194
18.75 – 20.52204
20.52 – 22.3184
22.3 – 24.0740
24.07 – 25.8514
25.85 – 27.62170
27.62 – 29.4197
29.4 – 31.17149
31.17 – 32.9517
32.95 – 34.7331
34.73 – 36.595
36.5 – 38.27153
38.27 – 40.05165
40.05 – 41.8236
41.82 – 43.667
43.6 – 45.3851
45.38 – 47.15161
47.15 – 48.92254
48.92 – 50.743
50.7 – 52.47133
52.47 – 54.2594
54.25 – 56.0295
56.02 – 57.80
57.8 – 59.570
59.57 – 61.350
61.35 – 63.120
63.12 – 64.90
64.9 – 66.670
66.67 – 68.450
68.45 – 70.220
70.22 – 7278

county numeric foreign_key

This column is almost certainly a numeric county FIPS code or county ID, not a true continuous measure — the 326 unique values out of 3,221 rows strongly suggest a categorical geographic identifier encoded as an integer. The distribution is heavily right-skewed (skew 2.87, kurtosis 11.64) with values ranging from 1 to 840 and 178 outliers (5.5%), which reflects the uneven distribution of records across counties rather than any meaningful numeric magnitude. The mean (102.85) sitting well above the median (79.0) confirms that a small number of high-coded counties appear disproportionately often.

Treatment: Cast to categorical/string and treat as a geographic grouping key; do not use raw numeric value in any regression or distance-based model.

anthropic:default · confidence high
Out[31]:

saturn.columns["county"].stats

statvalue
n3,221
nulls0 (0.0%)
unique326
min 1
max 840
mean 102.8
median 79
std 106.6
q1 35
q3 133
iqr 98
skew 2.868
kurtosis 11.64
n_outliers 178
outlier_rate 0.05526
zero_rate 0
alert: high_skewskew=+2.87
alert: outliers5.5% rows beyond 1.5 IQR
Fig 14.
Distribution of county. Vertical dash marks the median.
Show data table
Histogram bins for county (median: 79.0).
bincount
1 – 21.98531
21.98 – 42.95418
42.95 – 63.93411
63.93 – 84.9345
84.9 – 105.9352
105.9 – 126.9279
126.9 – 147.8234
147.8 – 168.8166
168.8 – 189.8138
189.8 – 210.870
210.8 – 231.745
231.7 – 252.725
252.7 – 273.722
273.7 – 294.723
294.7 – 315.622
315.6 – 336.613
336.6 – 357.611
357.6 – 378.610
378.6 – 399.511
399.5 – 420.510
420.5 – 441.511
441.5 – 462.510
462.5 – 483.411
483.4 – 504.410
504.4 – 525.47
525.4 – 546.42
546.4 – 567.31
567.3 – 588.32
588.3 – 609.33
609.3 – 630.23
630.2 – 651.22
651.2 – 672.22
672.2 – 693.25
693.2 – 714.22
714.2 – 735.13
735.1 – 756.12
756.1 – 777.13
777.1 – 798.11
798.1 – 8192
819 – 8403

FIPS numeric identifier

This column contains US FIPS (Federal Information Processing Standards) county codes, which are 4–5 digit numeric identifiers uniquely assigned to each US county. Every row has a distinct value (n_unique = 3221, matching n exactly) with no nulls, confirming this is a primary identifier for US counties — there are 3,221 counties/county-equivalents in the US, matching this count almost exactly. The distribution is nearly uniform (low skew of 0.157, mild platykurtosis of -0.63), consistent with the sequential-but-gapped structure of FIPS codes across states. The range of 1001 to 72153 is correct for US county FIPS codes (Alabama's first county to Puerto Rico's last).

Treatment: Treat as a categorical geographic identifier; do not use numerically — left-join to FIPS reference tables for geographic enrichment or spatial analysis.

anthropic:default · confidence high
Out[34]:

saturn.columns["FIPS"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,221
min 1,001
max 72,153
mean 3.138e+04
median 30,023
std 1.63e+04
q1 19,031
q3 46,105
iqr 27,074
skew 0.1569
kurtosis -0.6308
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 15.
Distribution of FIPS. Vertical dash marks the median.
Show data table
Histogram bins for FIPS (median: 30023.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989513
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

pct_black numeric feature

This column represents the percentage of Black residents in a geographic unit (e.g., census tract, county, or zip code), with 3,221 rows and no nulls. The distribution is heavily right-skewed (skew=2.33, kurtosis=5.45): the median is just 2.38% while the mean is pulled to 9.08%, and 422 rows (13.1%) are flagged as outliers reaching up to 87.79%. The IQR spans only 0.69–10.21%, meaning most units are predominantly non-Black, with a long tail of majority-Black geographies.

Treatment: Apply log1p or quantile transformation before regression to address severe right skew and outlier influence.

anthropic:default · confidence high
Out[37]:

saturn.columns["pct_black"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,128
min 0
max 87.79
mean 9.085
median 2.383
std 14.5
q1 0.6919
q3 10.21
iqr 9.513
skew 2.326
kurtosis 5.451
n_outliers 422
outlier_rate 0.131
zero_rate 0.02825
alert: high_skewskew=+2.33
alert: outliers13.1% rows beyond 1.5 IQR
Fig 16.
Distribution of pct_black. Vertical dash marks the median.
Show data table
Histogram bins for pct_black (median: 2.382606279522089).
bincount
0 – 2.1951568
2.195 – 4.39402
4.39 – 6.584218
6.584 – 8.779153
8.779 – 10.97112
10.97 – 13.1786
13.17 – 15.3670
15.36 – 17.5654
17.56 – 19.7546
19.75 – 21.9548
21.95 – 24.1436
24.14 – 26.3442
26.34 – 28.5336
28.53 – 30.7341
30.73 – 32.9234
32.92 – 35.1232
35.12 – 37.3128
37.31 – 39.5119
39.51 – 41.725
41.7 – 43.925
43.9 – 46.0918
46.09 – 48.2817
48.28 – 50.4814
50.48 – 52.679
52.67 – 54.8713
54.87 – 57.0611
57.06 – 59.2613
59.26 – 61.457
61.45 – 63.658
63.65 – 65.844
65.84 – 68.041
68.04 – 70.236
70.23 – 72.438
72.43 – 74.625
74.62 – 76.822
76.82 – 79.015
79.01 – 81.211
81.21 – 83.41
83.4 – 85.61
85.6 – 87.792

pct_white numeric feature

This column represents the percentage of white population in a geographic or demographic unit, ranging from 3.29% to 100% across 3,221 records. The distribution is strongly left-skewed (skew = -1.56) with a mean of 81.2% and median of 87.7%, indicating the dataset is dominated by majority-white units — likely U.S. counties, census tracts, or similar jurisdictions. The gap between mean and median signals a long lower tail of more diverse units, and 145 outliers (4.5%) likely represent highly diverse areas pulling the distribution downward. Near-perfect uniqueness (3,218 of 3,221 values) confirms this is a continuous ratio measure, not a binned or rounded variable.

Treatment: Use as-is or apply a reflection-log transform to address left skew before regression; consider interactions with other demographic features.

anthropic:default · confidence high
Out[40]:

saturn.columns["pct_white"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,218
min 3.29
max 100
mean 81.2
median 87.66
std 17.35
q1 73.62
q3 93.99
iqr 20.37
skew -1.562
kurtosis 2.301
n_outliers 145
outlier_rate 0.04502
zero_rate 0
Fig 17.
Distribution of pct_white. Vertical dash marks the median.
Show data table
Histogram bins for pct_white (median: 87.65979926043318).
bincount
3.29 – 5.7082
5.708 – 8.1251
8.125 – 10.544
10.54 – 12.962
12.96 – 15.3810
15.38 – 17.86
17.8 – 20.214
20.21 – 22.637
22.63 – 25.0512
25.05 – 27.4712
27.47 – 29.897
29.89 – 32.34
32.3 – 34.7210
34.72 – 37.1413
37.14 – 39.5624
39.56 – 41.9718
41.97 – 44.3924
44.39 – 46.8130
46.81 – 49.2324
49.23 – 51.6434
51.64 – 54.0633
54.06 – 56.4837
56.48 – 58.958
58.9 – 61.3243
61.32 – 63.7369
63.73 – 66.1572
66.15 – 68.5777
68.57 – 70.9976
70.99 – 73.488
73.4 – 75.82100
75.82 – 78.2498
78.24 – 80.66143
80.66 – 83.08132
83.08 – 85.49163
85.49 – 87.91191
87.91 – 90.33246
90.33 – 92.75302
92.75 – 95.16453
95.16 – 97.58525
97.58 – 10067

pct_hispanic numeric feature

This column represents the percentage of Hispanic population in a geographic or demographic unit, ranging from 0% to nearly 100%. The distribution is severely right-skewed (skew=3.11, kurtosis=9.89): the median is only 4.52% while the mean is 11.74%, indicating most units have low Hispanic shares but a long tail of high-concentration areas drives the average up. A notable 13% of rows (420 out of 3221) are flagged as outliers, consistent with areas of heavy Hispanic concentration. The near-zero zero_rate (0.5%) and zero null_rate suggest good data completeness.

Treatment: Log-transform or apply a square-root transformation before regression/modelling to reduce skew and diminish outlier leverage.

anthropic:default · confidence high
Out[43]:

saturn.columns["pct_hispanic"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,205
min 0
max 100
mean 11.74
median 4.516
std 19.4
q1 2.363
q3 10.66
iqr 8.294
skew 3.113
kurtosis 9.888
n_outliers 420
outlier_rate 0.1304
zero_rate 0.004967
alert: high_skewskew=+3.11
alert: outliers13.0% rows beyond 1.5 IQR
Fig 18.
Distribution of pct_hispanic. Vertical dash marks the median.
Show data table
Histogram bins for pct_hispanic (median: 4.51638689048761).
bincount
0 – 2.5882
2.5 – 5850
5 – 7.5412
7.5 – 10213
10 – 12.5148
12.5 – 15102
15 – 17.575
17.5 – 2064
20 – 22.545
22.5 – 2549
25 – 27.539
27.5 – 3030
30 – 32.526
32.5 – 3518
35 – 37.517
37.5 – 4015
40 – 42.517
42.5 – 4515
45 – 47.512
47.5 – 5010
50 – 52.512
52.5 – 5512
55 – 57.59
57.5 – 6012
60 – 62.510
62.5 – 659
65 – 67.53
67.5 – 706
70 – 72.53
72.5 – 753
75 – 77.50
77.5 – 803
80 – 82.54
82.5 – 855
85 – 87.51
87.5 – 903
90 – 92.54
92.5 – 955
95 – 97.58
97.5 – 10070

poverty_rate numeric feature

This column represents a poverty rate (percentage) measured across 3,221 geographic or demographic units, with near-complete coverage (null_rate 0.0) and near-unique values (3,219 distinct). The distribution is right-skewed (skew 2.11, kurtosis 6.92), with a median of 13.8% and mean pulled up to 15.4% by a long upper tail reaching 66.2%; 143 outliers (4.4% of records) drive this tail, suggesting a minority of units with extremely high poverty concentration that will disproportionately influence linear models.

Treatment: Apply log or square-root transform to reduce right skew before regression; investigate the 143 outlier units separately for data quality or structural differences.

anthropic:default · confidence high
Out[46]:

saturn.columns["poverty_rate"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,219
min 0
max 66.19
mean 15.38
median 13.81
std 7.97
q1 10.34
q3 18.25
iqr 7.91
skew 2.111
kurtosis 6.922
n_outliers 143
outlier_rate 0.0444
zero_rate 0.0003105
alert: high_skewskew=+2.11
Fig 19.
Distribution of poverty_rate. Vertical dash marks the median.
Show data table
Histogram bins for poverty_rate (median: 13.807805224676027).
bincount
0 – 1.6552
1.655 – 3.319
3.31 – 4.96448
4.964 – 6.619123
6.619 – 8.274212
8.274 – 9.929317
9.929 – 11.58378
11.58 – 13.24392
13.24 – 14.89353
14.89 – 16.55338
16.55 – 18.2235
18.2 – 19.86192
19.86 – 21.51157
21.51 – 23.17108
23.17 – 24.8277
24.82 – 26.4853
26.48 – 28.1341
28.13 – 29.7937
29.79 – 31.4429
31.44 – 33.113
33.1 – 34.7510
34.75 – 36.4111
36.41 – 38.067
38.06 – 39.722
39.72 – 41.378
41.37 – 43.036
43.03 – 44.687
44.68 – 46.3311
46.33 – 47.996
47.99 – 49.649
49.64 – 51.38
51.3 – 52.955
52.95 – 54.616
54.61 – 56.262
56.26 – 57.922
57.92 – 59.573
59.57 – 61.231
61.23 – 62.881
62.88 – 64.540
64.54 – 66.192

below_poverty_level numeric feature

This column represents a count of people living below the poverty level, likely aggregated at some geographic unit (e.g., census tract, county, or ZIP code). The distribution is extremely right-skewed (skew=15.1, kurtosis=360.7): the median is 3,831 but the mean is 13,136, and the maximum reaches 1,401,656 — almost certainly a large urban area or county-level aggregate pulling the tail hard. With 351 outliers (~10.9% of rows) and a standard deviation of 44,284 against a median of 3,831, a small number of high-population jurisdictions dominate the raw counts entirely.

Treatment: Log-transform (log1p) before regression or clustering; consider normalizing by total population to create a poverty rate for more comparable cross-unit modelling.

anthropic:default · confidence high
Out[49]:

saturn.columns["below_poverty_level"].stats

statvalue
n3,221
nulls0 (0.0%)
unique2,824
min 0
max 1.402e+06
mean 1.314e+04
median 3,831
std 4.428e+04
q1 1,547
q3 9,937
iqr 8,390
skew 15.11
kurtosis 360.7
n_outliers 351
outlier_rate 0.109
zero_rate 0.0003105
alert: high_skewskew=+15.11
alert: outliers10.9% rows beyond 1.5 IQR
Fig 20.
Distribution of below_poverty_level. Vertical dash marks the median.
Show data table
Histogram bins for below_poverty_level (median: 3831.0).
bincount
0 – 3.504e+042980
3.504e+04 – 7.008e+04136
7.008e+04 – 1.051e+0550
1.051e+05 – 1.402e+0519
1.402e+05 – 1.752e+057
1.752e+05 – 2.102e+058
2.102e+05 – 2.453e+053
2.453e+05 – 2.803e+052
2.803e+05 – 3.154e+053
3.154e+05 – 3.504e+052
3.504e+05 – 3.855e+055
3.855e+05 – 4.205e+050
4.205e+05 – 4.555e+051
4.555e+05 – 4.906e+051
4.906e+05 – 5.256e+050
5.256e+05 – 5.607e+051
5.607e+05 – 5.957e+050
5.957e+05 – 6.307e+050
6.307e+05 – 6.658e+050
6.658e+05 – 7.008e+051
7.008e+05 – 7.359e+051
7.359e+05 – 7.709e+050
7.709e+05 – 8.06e+050
8.06e+05 – 8.41e+050
8.41e+05 – 8.76e+050
8.76e+05 – 9.111e+050
9.111e+05 – 9.461e+050
9.461e+05 – 9.812e+050
9.812e+05 – 1.016e+060
1.016e+06 – 1.051e+060
1.051e+06 – 1.086e+060
1.086e+06 – 1.121e+060
1.121e+06 – 1.156e+060
1.156e+06 – 1.191e+060
1.191e+06 – 1.226e+060
1.226e+06 – 1.261e+060
1.261e+06 – 1.297e+060
1.297e+06 – 1.332e+060
1.332e+06 – 1.367e+060
1.367e+06 – 1.402e+061

median_household_income numeric feature

This column represents median household income, likely sourced from census or demographic data tied to geographic units. The median of 52,380 and IQR of 16,300 look plausible for household income, but the column is severely compromised by sentinel/error values: a minimum of -666,666,666 drags the mean to -152,820 and produces a kurtosis of 3,215 and skew of -56.73, all flagged as alerts. With 182 outliers (5.65% of rows) and a std of 11,747,597, the negative extremes are almost certainly coded null-substitutes or data-entry errors rather than real income values.

Treatment: Replace -666666666 and any negative values with NaN, investigate remaining outliers above q3, then consider log-transform after cleaning before modelling.

anthropic:default · confidence high
Out[52]:

saturn.columns["median_household_income"].stats

statvalue
n3,221
nulls0 (0.0%)
unique3,099
min -6.667e+08
max 147,111
mean -1.528e+05
median 52,380
std 1.175e+07
q1 44,939
q3 61,239
iqr 16,300
skew -56.73
kurtosis 3216
n_outliers 182
outlier_rate 0.0565
zero_rate 0
alert: high_skewskew=-56.73
alert: outliers5.7% rows beyond 1.5 IQR
Fig 21.
Distribution of median_household_income. Vertical dash marks the median.
Show data table
Histogram bins for median_household_income (median: 52380.0).
bincount
-6.667e+08 – -6.5e+081
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.5e+080
-4.5e+08 – -4.333e+080
-4.333e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.833e+080
-3.833e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.333e+080
-3.333e+08 – -3.166e+080
-3.166e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.666e+080
-2.666e+08 – -2.499e+080
-2.499e+08 – -2.332e+080
-2.332e+08 – -2.166e+080
-2.166e+08 – -1.999e+080
-1.999e+08 – -1.832e+080
-1.832e+08 – -1.666e+080
-1.666e+08 – -1.499e+080
-1.499e+08 – -1.332e+080
-1.332e+08 – -1.165e+080
-1.165e+08 – -9.987e+070
-9.987e+07 – -8.32e+070
-8.32e+07 – -6.653e+070
-6.653e+07 – -4.986e+070
-4.986e+07 – -3.319e+070
-3.319e+07 – -1.652e+070
-1.652e+07 – 1.471e+053220

margin_2020 numeric feature

This column represents a vote or profit margin figure for the year 2020, expressed as a proportion (roughly −0.87 to +0.93), most likely an election margin or financial margin ratio. The distribution is moderately left-skewed (skew −0.82) with a mean of 0.317 sitting noticeably below the median of 0.384, indicating a tail of strongly negative values pulling the average down. Negative values (minimum −0.868) are present and meaningful — likely contested or loss outcomes — while 48 outliers (1.54%) sit at the distributional extremes. The null rate of 3.38% is modest but worth investigating for systematic missingness.

Treatment: Use as-is for modelling; consider investigating left-tail outliers and whether nulls are structurally missing before imputation.

anthropic:default · confidence high
Out[55]:

saturn.columns["margin_2020"].stats

statvalue
n3,221
nulls109 (3.4%)
unique3,112
min -0.8675
max 0.9309
mean 0.317
median 0.3844
std 0.321
q1 0.1348
q3 0.5662
iqr 0.4314
skew -0.8212
kurtosis 0.2286
n_outliers 48
outlier_rate 0.01542
zero_rate 0
Fig 22.
Distribution of margin_2020. Vertical dash marks the median.
Show data table
Histogram bins for margin_2020 (median: 0.3843813151543954).
bincount
-0.8675 – -0.82261
-0.8226 – -0.77762
-0.7776 – -0.73263
-0.7326 – -0.68775
-0.6877 – -0.64278
-0.6427 – -0.597812
-0.5978 – -0.55286
-0.5528 – -0.507812
-0.5078 – -0.462914
-0.4629 – -0.417922
-0.4179 – -0.37328
-0.373 – -0.32833
-0.328 – -0.28329
-0.283 – -0.238146
-0.2381 – -0.193141
-0.1931 – -0.148254
-0.1482 – -0.103263
-0.1032 – -0.0582367
-0.05823 – -0.0132769
-0.01327 – 0.0316973
0.03169 – 0.0766570
0.07665 – 0.121690
0.1216 – 0.1666131
0.1666 – 0.2115117
0.2115 – 0.2565129
0.2565 – 0.3015141
0.3015 – 0.3464159
0.3464 – 0.3914165
0.3914 – 0.4363181
0.4363 – 0.4813206
0.4813 – 0.5263195
0.5263 – 0.5712197
0.5712 – 0.6162213
0.6162 – 0.6611175
0.6611 – 0.7061140
0.7061 – 0.7511102
0.7511 – 0.79669
0.796 – 0.84129
0.841 – 0.885911
0.8859 – 0.93094

democratic_pct_2020 numeric feature

This column represents the Democratic vote share (as a proportion 0–1) in the 2020 election, likely at the county or precinct level across 3,221 geographic units. The distribution is right-skewed (skew=0.83) with a mean of 0.333 and median of 0.300, indicating most units lean Republican—the typical unit gave Democrats roughly 30% of the vote. The range spans 0.031 to 0.921, capturing both deep-red and deep-blue areas, with only 49 outliers (1.57%) and near-zero null rate (3.38%), suggesting a clean, well-populated electoral feature.

Treatment: Use as-is or apply a logit transform to stretch the bounded 0–1 proportion before regression or clustering.

anthropic:default · confidence high
Out[58]:

saturn.columns["democratic_pct_2020"].stats

statvalue
n3,221
nulls109 (3.4%)
unique3,111
min 0.03091
max 0.9215
mean 0.3327
median 0.2998
std 0.1598
q1 0.2091
q3 0.4236
iqr 0.2145
skew 0.8326
kurtosis 0.2523
n_outliers 49
outlier_rate 0.01575
zero_rate 0
Fig 23.
Distribution of democratic_pct_2020. Vertical dash marks the median.
Show data table
Histogram bins for democratic_pct_2020 (median: 0.29977253358402933).
bincount
0.03091 – 0.053175
0.05317 – 0.0754411
0.07544 – 0.097731
0.0977 – 0.1276
0.12 – 0.1422104
0.1422 – 0.1645145
0.1645 – 0.1868182
0.1868 – 0.209224
0.209 – 0.2313192
0.2313 – 0.2536199
0.2536 – 0.2758200
0.2758 – 0.2981174
0.2981 – 0.3204164
0.3204 – 0.3426158
0.3426 – 0.3649130
0.3649 – 0.3871132
0.3871 – 0.4094121
0.4094 – 0.4317125
0.4317 – 0.453984
0.4539 – 0.476279
0.4762 – 0.498566
0.4985 – 0.520773
0.5207 – 0.54367
0.543 – 0.565358
0.5653 – 0.587550
0.5875 – 0.609842
0.6098 – 0.632142
0.6321 – 0.654334
0.6543 – 0.676628
0.6766 – 0.698829
0.6988 – 0.721122
0.7211 – 0.743414
0.7434 – 0.765613
0.7656 – 0.78797
0.7879 – 0.81028
0.8102 – 0.832411
0.8324 – 0.85475
0.8547 – 0.8773
0.877 – 0.89923
0.8992 – 0.92151

republican_pct_2020 numeric feature

This column represents the Republican vote share (as a proportion 0–1) in the 2020 U.S. election, most likely at the county or precinct level. The mean of 0.650 and median of 0.683 indicate a right-leaning dataset — the majority of geographic units recorded Republican majorities, which is consistent with county-level data where rural areas outnumber urban ones by count. The distribution is notably left-skewed (skew = −0.809), meaning a tail of strongly Democratic units pulls the mean below the median, while the near-mesokurtic kurtosis (0.206) and only 47 outliers suggest no extreme concentration at the tails. The null rate of 3.38% warrants investigation to confirm whether missing values reflect unreported results or data gaps.

Treatment: Use as-is for modeling after imputing or flagging the 3.38% nulls; consider logit-transform if used as a continuous predictor in a linear model.

anthropic:default · confidence high
Out[61]:

saturn.columns["republican_pct_2020"].stats

statvalue
n3,221
nulls109 (3.4%)
unique3,111
min 0.05397
max 0.9618
mean 0.6497
median 0.6829
std 0.1613
q1 0.5576
q3 0.7747
iqr 0.2171
skew -0.8091
kurtosis 0.2063
n_outliers 47
outlier_rate 0.0151
zero_rate 0
Fig 24.
Distribution of republican_pct_2020. Vertical dash marks the median.
Show data table
Histogram bins for republican_pct_2020 (median: 0.6829120557612961).
bincount
0.05397 – 0.076671
0.07667 – 0.099372
0.09937 – 0.12212
0.1221 – 0.14486
0.1448 – 0.16756
0.1675 – 0.190115
0.1901 – 0.21285
0.2128 – 0.235513
0.2355 – 0.258212
0.2582 – 0.280925
0.2809 – 0.303626
0.3036 – 0.326332
0.3263 – 0.34932
0.349 – 0.371740
0.3717 – 0.394446
0.3944 – 0.417152
0.4171 – 0.439864
0.4398 – 0.462578
0.4625 – 0.485261
0.4852 – 0.507978
0.5079 – 0.530666
0.5306 – 0.553397
0.5533 – 0.576126
0.576 – 0.5987122
0.5987 – 0.6214139
0.6214 – 0.6441143
0.6441 – 0.6668154
0.6668 – 0.6895173
0.6895 – 0.7122176
0.7122 – 0.7349200
0.7349 – 0.7576213
0.7576 – 0.7802195
0.7802 – 0.8029203
0.8029 – 0.8256169
0.8256 – 0.8483132
0.8483 – 0.871103
0.871 – 0.893765
0.8937 – 0.916427
0.9164 – 0.93919
0.9391 – 0.96184

margin_2016 text feature

This column stores the 2016 electoral or financial margin as a percentage string (e.g., '15.17%'), stored as text rather than a numeric type. All 3,221 values are single all-caps tokens of 5–6 characters, confirming a uniform percentage format. Surprisingly, '15.17%' appears 29 times — far more than any other value — suggesting it may be a default, imputed, or boundary value worth investigating. The duplicate rate of 18.6% (584 duplicates across 2,554 unique values) is notable for what should otherwise be a near-continuous numeric measure.

Treatment: Strip '%' suffix and cast to float; investigate the 29 occurrences of '15.17%' for data quality issues before modelling.

anthropic:default · confidence high
Out[64]:

saturn.columns["margin_2016"].stats

statvalue
n3,221
nulls83 (2.6%)
unique2,554
len_min 5
len_max 6
len_mean 5.896
len_median 6
len_p95 6
word_mean 1
word_median 1
n_empty 0
n_duplicates 584
duplicate_rate 0.1861
vocab_size 2,554
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 25.
Character-length distribution for margin_2016.
Show data table
Character-length distribution for margin_2016 (mean: 5.895793499043977).
charscount
5 – 5327
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 60
6 – 62811

democratic_pct_2016 numeric feature

This column represents the Democratic party vote share (as a proportion 0–1) in the 2016 U.S. presidential election, most likely aggregated at the county level given 3,221 rows. The distribution is right-skewed (skew=0.94) with a mean of 0.317 and median of 0.286, indicating that most geographic units lean Republican, with a long tail of heavily Democratic areas reaching up to 0.928. The spread is moderate (IQR=0.193, std=0.153) and 75 outliers exist on the high end, likely dense urban counties.

Treatment: Use as-is or apply logit-transform to unbound the [0,1] proportion before linear modelling.

anthropic:default · confidence high
Out[67]:

saturn.columns["democratic_pct_2016"].stats

statvalue
n3,221
nulls83 (2.6%)
unique3,111
min 0.03145
max 0.9285
mean 0.3174
median 0.2861
std 0.1527
q1 0.2054
q3 0.3982
iqr 0.1928
skew 0.9371
kurtosis 0.666
n_outliers 75
outlier_rate 0.0239
zero_rate 0
Fig 26.
Distribution of democratic_pct_2016. Vertical dash marks the median.
Show data table
Histogram bins for democratic_pct_2016 (median: 0.2861345852895).
bincount
0.03145 – 0.053878
0.05387 – 0.076316
0.0763 – 0.0987252
0.09872 – 0.121174
0.1211 – 0.1436116
0.1436 – 0.166146
0.166 – 0.1884203
0.1884 – 0.2109226
0.2109 – 0.2333240
0.2333 – 0.2557218
0.2557 – 0.2781200
0.2781 – 0.3006205
0.3006 – 0.323153
0.323 – 0.3454147
0.3454 – 0.3678153
0.3678 – 0.3903150
0.3903 – 0.4127106
0.4127 – 0.4351111
0.4351 – 0.457577
0.4575 – 0.4878
0.48 – 0.502456
0.5024 – 0.524872
0.5248 – 0.547245
0.5472 – 0.569742
0.5697 – 0.592136
0.5921 – 0.614536
0.6145 – 0.636934
0.6369 – 0.659425
0.6594 – 0.681830
0.6818 – 0.704216
0.7042 – 0.726612
0.7266 – 0.749112
0.7491 – 0.771514
0.7715 – 0.79399
0.7939 – 0.81636
0.8163 – 0.83884
0.8388 – 0.86124
0.8612 – 0.88363
0.8836 – 0.9062
0.906 – 0.92851

republican_pct_2016 numeric feature

This column represents the Republican vote share (as a proportion, 0–1) in the 2016 U.S. presidential election, likely at the county or precinct level across 3,221 geographic units. The distribution is left-skewed (skew = -0.81) with a median of 0.666 and mean of 0.635, indicating that most units leaned heavily Republican in 2016, which is consistent with rural-county-level data where Republicans dominate by count even if not by population. The range spans 0.041 to 0.953, covering genuinely competitive to overwhelmingly one-sided areas, with only 62 outliers (1.98%) and near-zero nulls (2.58%), suggesting a clean, well-populated field.

Treatment: Use directly as a continuous feature; consider pairing with democratic equivalent or computing a two-party margin; mild left skew does not require transformation for most models.

anthropic:default · confidence high
Out[70]:

saturn.columns["republican_pct_2016"].stats

statvalue
n3,221
nulls83 (2.6%)
unique3,111
min 0.04122
max 0.9527
mean 0.6354
median 0.6656
std 0.1559
q1 0.5463
q3 0.7503
iqr 0.2041
skew -0.8145
kurtosis 0.3566
n_outliers 62
outlier_rate 0.01976
zero_rate 0
Fig 27.
Distribution of republican_pct_2016. Vertical dash marks the median.
Show data table
Histogram bins for republican_pct_2016 (median: 0.6655515136155).
bincount
0.04122 – 0.064011
0.06401 – 0.08681
0.0868 – 0.10965
0.1096 – 0.13242
0.1324 – 0.15526
0.1552 – 0.177911
0.1779 – 0.20077
0.2007 – 0.223517
0.2235 – 0.246317
0.2463 – 0.269117
0.2691 – 0.291923
0.2919 – 0.314732
0.3147 – 0.337534
0.3375 – 0.360230
0.3602 – 0.38343
0.383 – 0.405841
0.4058 – 0.428664
0.4286 – 0.451471
0.4514 – 0.474263
0.4742 – 0.49789
0.497 – 0.519878
0.5198 – 0.5425115
0.5425 – 0.5653116
0.5653 – 0.5881147
0.5881 – 0.6109147
0.6109 – 0.6337156
0.6337 – 0.6565165
0.6565 – 0.6793193
0.6793 – 0.7021190
0.7021 – 0.7249215
0.7249 – 0.7476223
0.7476 – 0.7704213
0.7704 – 0.7932187
0.7932 – 0.816142
0.816 – 0.8388113
0.8388 – 0.861674
0.8616 – 0.884449
0.8844 – 0.907229
0.9072 – 0.92999
0.9299 – 0.95273

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-scars-standardized-county-analysis-research-system-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove scars standardized county analysis research system},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-scars-standardized-county-analysis-research-system}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove scars standardized county analysis research system. Source: /home/coolhand/html/datavis/data_trove/data/geographic/scars/master_dataset.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-scars-standardized-county-analysis-research-system