saturn·

median income

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/median_income.parquet

Saturn profiled 3,222 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/median_income.parquet",
    "--findings", "median_income.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows covering U.S. counties, with three columns: a county name, a FIPS code, and median household income. The income column is the headline issue — it has a minimum of -666,666,666 and a mean of roughly -144,603 against a median of 60,458, indicating a sentinel value (likely a missing-data placeholder) that is dragging the distribution into nonsense. About 5.8% of records (188 rows) are flagged as outliers and skew is extreme (-56.7), so any analysis should filter these sentinels before computing summary stats. County names are essentially unique row labels, while FIPS codes look clean and well-distributed across the expected national range.

citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.max · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.stats.n_outliers · median_household_income.stats.outlier_rate · fips.stats.min · fips.stats.max · county_name.top_words

Out[4]:

saturn.schema() · 3 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 3,222 near_unique
median_household_income numeric 3,222 0.0% 3,099 high_skew outliers
Fig 1.
median_household_income · Look for the long left tail caused by sentinel values like -666,666,666 that distort the mean.
Show data table
Histogram bins for median_household_income (median: 60458.5).
bincount
-6.667e+08 – -6.5e+081
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.499e+080
-4.499e+08 – -4.333e+080
-4.333e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.833e+080
-3.833e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.332e+080
-3.332e+08 – -3.166e+080
-3.166e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.666e+080
-2.666e+08 – -2.499e+080
-2.499e+08 – -2.332e+080
-2.332e+08 – -2.166e+080
-2.166e+08 – -1.999e+080
-1.999e+08 – -1.832e+080
-1.832e+08 – -1.665e+080
-1.665e+08 – -1.499e+080
-1.499e+08 – -1.332e+080
-1.332e+08 – -1.165e+080
-1.165e+08 – -9.986e+070
-9.986e+07 – -8.318e+070
-8.318e+07 – -6.651e+070
-6.651e+07 – -4.984e+070
-4.984e+07 – -3.317e+070
-3.317e+07 – -1.65e+070
-1.65e+07 – 1.705e+053221
Fig 2.
fips · FIPS codes spread evenly from 1,001 to 72,153, confirming national county coverage with no outliers.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478
Fig 3.
county_name · County name lengths cluster tightly around 24 characters, reflecting the consistent 'X County, State' format.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
median_household_incomenumeric0.0%
Fig 5.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
fipsmedian_household_income
fips+1.00-0.07
median_household_income-0.07+1.00

fips numeric identifier

This column is a FIPS county/area code—every one of the 3222 rows is unique with no nulls, and the values span 1001 to 72153, the canonical FIPS range covering U.S. states and territories. The distribution is nearly symmetric (skew 0.157, kurtosis -0.631) with no outliers, consistent with a structured geographic identifier rather than a measured quantity. Treat it as a key, not a numeric feature.

Treatment: Use as a categorical join key on county-level data; do not feed as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[11]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 6.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text identifier

This column holds full US county identifiers (e.g., 'X County, '), with all 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana, 'Borough'/'Census Area' in Alaska, or independent cities). State-name frequencies match expected US distribution, with Texas (256) leading.

Treatment: Use as a join key after splitting into county and state components; do not treat as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 7.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

median_household_income numeric feature

County-level median household income in dollars, with 3099 distinct values across 3222 rows and no nulls. The minimum of -666666666 is a clear sentinel for missing data, dragging the mean to -144603 even though the median is 60458.5 and Q1-Q3 sit between 51814.75 and 70376.25. This sentinel produces the extreme skew (-56.74) and kurtosis (3216.99), and 188 outliers (5.83%) are flagged.

Treatment: Replace the -666666666 sentinel with null, then consider a log or robust scaler before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["median_household_income"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,099
min -6.667e+08
max 170,463
mean -1.446e+05
median 6.046e+04
std 1.175e+07
q1 5.181e+04
q3 7.038e+04
iqr 1.856e+04
skew -56.74
kurtosis 3217
n_outliers 188
outlier_rate 0.05835
zero_rate 0
alert: high_skewskew=-56.74
alert: outliers5.8% rows beyond 1.5 IQR
Fig 8.
Distribution of median_household_income. Vertical dash marks the median.
Show data table
Histogram bins for median_household_income (median: 60458.5).
bincount
-6.667e+08 – -6.5e+081
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.499e+080
-4.499e+08 – -4.333e+080
-4.333e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.833e+080
-3.833e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.332e+080
-3.332e+08 – -3.166e+080
-3.166e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.666e+080
-2.666e+08 – -2.499e+080
-2.499e+08 – -2.332e+080
-2.332e+08 – -2.166e+080
-2.166e+08 – -1.999e+080
-1.999e+08 – -1.832e+080
-1.832e+08 – -1.665e+080
-1.665e+08 – -1.499e+080
-1.499e+08 – -1.332e+080
-1.332e+08 – -1.165e+080
-1.165e+08 – -9.986e+070
-9.986e+07 – -8.318e+070
-8.318e+07 – -6.651e+070
-6.651e+07 – -4.984e+070
-4.984e+07 – -3.317e+070
-3.317e+07 – -1.65e+070
-1.65e+07 – 1.705e+053221

How to cite

click to copy

BibTeX
@misc{saturn-median-income-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: median income},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/median_income}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: median income. Source: /home/coolhand/html/datavis/data_trove/cache/median_income.parquet. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/median_income