saturn·

median income

source /home/coolhand/html/datavis/data_trove/cache/median_income.parquet 3,222 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows covering U.S. counties, with three columns: a county name, a FIPS code, and median household income. The income column is the headline issue — it has a minimum of -666,666,666 and a mean of roughly -144,603 against a median of 60,458, indicating a sentinel value (likely a missing-data placeholder) that is dragging the distribution into nonsense. About 5.8% of records (188 rows) are flagged as outliers and skew is extreme (-56.7), so any analysis should filter these sentinels before computing summary stats. County names are essentially unique row labels, while FIPS codes look clean and well-distributed across the expected national range.

citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.max · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.stats.n_outliers · median_household_income.stats.outlier_rate · fips.stats.min · fips.stats.max · county_name.top_words

Schema

3 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
median_household_income numeric 0.0% 3,099
high_skew outliers

fips

numeric identifier
This column is a FIPS county/area code—every one of the 3222 rows is unique with no nulls, and the values span 1001 to 72153, the canonical FIPS range covering U.S. states and territories. The distribution is nearly symmetric (skew 0.157, kurtosis -0.631) with no outliers, consistent with a structured geographic identifier rather than a measured quantity. Treat it as a key, not a numeric feature. Treatment: Use as a categorical join key on county-level data; do not feed as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds full US county identifiers (e.g., 'X County, '), with all 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana, 'Borough'/'Census Area' in Alaska, or independent cities). State-name frequencies match expected US distribution, with Texas (256) leading. Treatment: Use as a join key after splitting into county and state components; do not treat as a feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

median_household_income

numeric feature high_skew outliers
County-level median household income in dollars, with 3099 distinct values across 3222 rows and no nulls. The minimum of -666666666 is a clear sentinel for missing data, dragging the mean to -144603 even though the median is 60458.5 and Q1-Q3 sit between 51814.75 and 70376.25. This sentinel produces the extreme skew (-56.74) and kurtosis (3216.99), and 188 outliers (5.83%) are flagged. Treatment: Replace the -666666666 sentinel with null, then consider a log or robust scaler before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,099
min
-6.667e+08
max
170,463
mean
-1.446e+05
median
6.046e+04
std
1.175e+07
q1
5.181e+04
q3
7.038e+04
iqr
1.856e+04
skew
-56.74
kurtosis
3217
n_outliers
188
outlier_rate
0.05835
zero_rate
0