saturn

/home/coolhand/html/datavis/data_trove/cache/median_income.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:55:05+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/median_income.parquet
Total rows3,222
Profiled sample3,222
Columns3
Generated2026-05-01T16:55:05+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 3,222 rows covering U.S. counties, with three columns: a county name, a FIPS code, and median household income. The income column is the headline issue — it has a minimum of -666,666,666 and a mean of roughly -144,603 against a median of 60,458, indicating a sentinel value (likely a missing-data placeholder) that is dragging the distribution into nonsense. About 5.8% of records (188 rows) are flagged as outliers and skew is extreme (-56.7), so any analysis should filter these sentinels before computing summary stats. County names are essentially unique row labels, while FIPS codes look clean and well-distributed across the expected national range.

fips high anthropic:claude-opus-4-7

This column is a FIPS county/area code—every one of the 3222 rows is unique with no nulls, and the values span 1001 to 72153, the canonical FIPS range covering U.S. states and territories. The distribution is nearly symmetric (skew 0.157, kurtosis -0.631) with no outliers, consistent with a structured geographic identifier rather than a measured quantity. Treat it as a key, not a numeric feature.

county_name high anthropic:claude-opus-4-7

This column holds full US county identifiers (e.g., 'X County, '), with all 3222 rows unique and zero nulls. The token 'county,' appears 2999 times, suggesting ~223 rows use a different suffix (likely 'Parish' in Louisiana, 'Borough'/'Census Area' in Alaska, or independent cities). State-name frequencies match expected US distribution, with Texas (256) leading.

median_household_income high anthropic:claude-opus-4-7

County-level median household income in dollars, with 3099 distinct values across 3222 rows and no nulls. The minimum of -666666666 is a clear sentinel for missing data, dragging the mean to -144603 even though the median is 60458.5 and Q1-Q3 sit between 51814.75 and 70376.25. This sentinel produces the extreme skew (-56.74) and kurtosis (3216.99), and 188 outliers (5.83%) are flagged.

Numeric correlation

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

median_household_income numeric

skew=-56.74 5.8% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,099
min-666,666,666
max170,463
mean-144,603
median60,458
std11,745,921
q151,815
q370,376
iqr18,562
skew-56.736
kurtosis3,217
n_outliers188
outlier_rate0.058
zero_rate0.000