saturn

/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/census_counties_nationwide.csv 3,144 rows sample n=3,144 seed 42 2026-05-01T17:35:35+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/census_counties_nationwide.csv
Total rows3,144
Profiled sample3,144
Columns8
Generated2026-05-01T17:35:35+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset covers 3,144 U.S. counties with demographic and socioeconomic indicators including population, median income, college attainment rate, and poverty rate, identified by FIPS codes and state. The most urgent issue is median_income: it has a minimum of -666,666,666 and a mean of -148,752, which are clearly sentinel values for missing data masquerading as numbers and must be cleaned before any analysis. Population is also extremely right-skewed (skew ~13, max ~9.9M vs median ~25,785), so log-scaling will be necessary for any visualization or modeling. State coverage is uneven, with Texas (254 counties), Georgia (159), and Virginia (133) dominating the row counts. College and poverty rates are the cleanest fields and behave roughly as expected for county-level distributions.

name high anthropic:claude-opus-4-7

This is the full name of a US county-state pair: 2999 of 3144 rows contain the word 'county,' and the remaining top tokens are state names (Texas 256, Virginia 189, Georgia 159). Every value is unique (n_unique=3144, duplicate_rate=0.0) with no nulls and a tight length band (min 16, mean 24.2, max 59). It functions as a row identifier rather than a modelling feature.

state_fips high anthropic:claude-opus-4-7

Numeric column with exactly 51 unique values across 3144 rows, ranging 1 to 56 with no nulls — this is the U.S. state FIPS code (50 states plus DC), and 3144 matches the U.S. county count. The mean (30.26) and median (29) sit near the middle of the code range, and the near-zero skew (-0.08) reflects roughly uniform coverage of states. Despite being stored as numeric, the values are categorical identifiers, not measurements.

county_fips high anthropic:claude-opus-4-7

This is the county-level component of a FIPS code stored as an integer, with 3144 rows and only 329 unique values, suggesting many counties share the same within-state numeric suffix. Values run from 1 to 840 with a median of 79, but the high skew (2.84) and 176 outliers (5.6%) reflect the long tail of larger county codes used in a few states rather than a true distribution. There are no nulls or zeros.

state_name high anthropic:claude-opus-4-7

This column holds US state names, with 51 distinct values across 3,144 rows and no nulls — consistent with a county-level dataset covering all states plus DC. Distribution mirrors county counts: Texas leads at 254 (8.08%), followed by Georgia (159) and Virginia (133), and entropy ratio of 0.93 indicates a fairly even spread across states. No anomalies flagged.

median_income high anthropic:claude-opus-4-7

This column appears to be county-level median household income in dollars, with a median of 60931 and IQR spanning 52544.5 to 70605.25. The minimum of -666666666 is a sentinel value masquerading as data, dragging the mean to -148752.33 and producing a skew of -56.04 and kurtosis of 3138.99. Aside from that contamination, 3021 unique values across 3144 rows and 135 outliers (4.29%) suggest an otherwise plausible distribution capped at 170463.

poverty_rate high anthropic:claude-opus-4-7

Numeric poverty_rate spanning 1.60 to 55.10 with mean 13.82 and median 12.95, suggesting a percentage-style measure across 3144 rows (no nulls, no zeros). Distribution is right-skewed (skew 1.15, kurtosis 2.90) with 74 high-end outliers (2.35%) stretching the tail well past the Q3 of 16.77. Every one of the 3144 values is unique, consistent with a per-geography rate (e.g., one row per US county).

college_rate high anthropic:claude-opus-4-7

Likely a percentage of college-educated residents per row (probably a US county-level rate given n=3144). Values range from 0.0 to 56.35 with mean 16.26 and median 14.60, right-skewed (skew 1.42) with 134 outliers (4.26%) on the high tail. Near-unique (3143/3144) and no nulls, with only a single zero observation.

population high anthropic:claude-opus-4-7

This column reports a population count for 3,144 rows with no nulls and 3,080 unique values, consistent with one row per US county. The distribution is extremely right-skewed (skew 13.17, kurtosis 289.76): the median is 25,784.5 yet the mean is 105,310.94 and the max reaches 9,936,690, with 440 rows (14.0%) flagged as outliers. The std of 333,792 dwarfs the IQR of 57,244, confirming a heavy upper tail driven by a few very large jurisdictions.

Numeric correlation

name text

100.0% of rows are unique strings
rows3,144
null0 (0.0%)
unique3,144
len_min16
len_max59
len_mean24.165
len_median24.000
len_p9530.850
word_mean3.224
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,910
readability_flesch_mean6.826
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Day County, South Dakota
  3. Sabine County, Texas
  4. Fayette County, Texas
  5. Chisago County, Minnesota
  6. Dane County, Wisconsin
  7. Ramsey County, Minnesota
  8. Bath County, Virginia
  9. Freestone County, Texas
  10. Carroll County, Arkansas

state_fips numeric

rows3,144
null0 (0.0%)
unique51
min1.000
max56.000
mean30.264
median29.000
std15.153
q118.000
q345.000
iqr27.000
skew-0.081
kurtosis-1.099
n_outliers0
outlier_rate0.000
zero_rate0.000

county_fips numeric

skew=+2.84 5.6% rows beyond 1.5 IQR
rows3,144
null0 (0.0%)
unique329
min1.000
max840.000
mean103.874
median79.000
std107.567
q135.000
q3133.500
iqr98.500
skew2.841
kurtosis11.377
n_outliers176
outlier_rate0.056
zero_rate0.000

state_name categorical

rows3,144
null0 (0.0%)
unique51
top_valueTexas
top_rate0.081
cardinality51
entropy5.277
entropy_ratio0.930
Top values (rank 1–20)
  1. Texas — 254
  2. Georgia — 159
  3. Virginia — 133
  4. Kentucky — 120
  5. Missouri — 115
  6. Kansas — 105
  7. Illinois — 102
  8. North Carolina — 100
  9. Iowa — 99
  10. Tennessee — 95
  11. Nebraska — 93
  12. Indiana — 92
  13. Ohio — 88
  14. Minnesota — 87
  15. Michigan — 83
  16. Mississippi — 82
  17. Oklahoma — 77
  18. Arkansas — 75
  19. Wisconsin — 72
  20. Alabama — 67

median_income numeric

skew=-56.04
rows3,144
null0 (0.0%)
unique3,021
min-666,666,666
max170,463
mean-148,752
median60,931
std11,890,747
q152,544
q370,605
iqr18,061
skew-56.044
kurtosis3,139
n_outliers135
outlier_rate0.043
zero_rate0.000

poverty_rate numeric

rows3,144
null0 (0.0%)
unique3,144
min1.603
max55.100
mean13.815
median12.952
std5.702
q19.699
q316.774
iqr7.074
skew1.150
kurtosis2.901
n_outliers74
outlier_rate0.024
zero_rate0.000

college_rate numeric

rows3,144
null0 (0.0%)
unique3,143
min0.000
max56.346
mean16.264
median14.596
std7.005
q111.483
q319.374
iqr7.892
skew1.422
kurtosis2.751
n_outliers134
outlier_rate0.043
zero_rate3.18e-04

population numeric

skew=+13.17 14.0% rows beyond 1.5 IQR
rows3,144
null0 (0.0%)
unique3,080
min50.000
max9,936,690
mean105,311
median25,784
std333,792
q110,836
q368,080
iqr57,244
skew13.175
kurtosis289.761
n_outliers440
outlier_rate0.140
zero_rate0.000