This dataset covers 3,144 U.S. counties with demographic and socioeconomic indicators including population, median income, college attainment rate, and poverty rate, identified by FIPS codes and state. The most urgent issue is median_income: it has a minimum of -666,666,666 and a mean of -148,752, which are clearly sentinel values for missing data masquerading as numbers and must be cleaned before any analysis. Population is also extremely right-skewed (skew ~13, max ~9.9M vs median ~25,785), so log-scaling will be necessary for any visualization or modeling. State coverage is uneven, with Texas (254 counties), Georgia (159), and Virginia (133) dominating the row counts. College and poverty rates are the cleanest fields and behave roughly as expected for county-level distributions.
saturn
/home/coolhand/html/datavis/data_trove/data/geographic/nationwide/census_counties_nationwide.csv 3,144 rows sample n=3,144 seed 42 2026-05-01T17:35:35+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/data/geographic/nationwide/census_counties_nationwide.csv |
| Total rows | 3,144 |
| Profiled sample | 3,144 |
| Columns | 8 |
| Generated | 2026-05-01T17:35:35+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
This is the full name of a US county-state pair: 2999 of 3144 rows contain the word 'county,' and the remaining top tokens are state names (Texas 256, Virginia 189, Georgia 159). Every value is unique (n_unique=3144, duplicate_rate=0.0) with no nulls and a tight length band (min 16, mean 24.2, max 59). It functions as a row identifier rather than a modelling feature.
Numeric column with exactly 51 unique values across 3144 rows, ranging 1 to 56 with no nulls — this is the U.S. state FIPS code (50 states plus DC), and 3144 matches the U.S. county count. The mean (30.26) and median (29) sit near the middle of the code range, and the near-zero skew (-0.08) reflects roughly uniform coverage of states. Despite being stored as numeric, the values are categorical identifiers, not measurements.
This is the county-level component of a FIPS code stored as an integer, with 3144 rows and only 329 unique values, suggesting many counties share the same within-state numeric suffix. Values run from 1 to 840 with a median of 79, but the high skew (2.84) and 176 outliers (5.6%) reflect the long tail of larger county codes used in a few states rather than a true distribution. There are no nulls or zeros.
This column holds US state names, with 51 distinct values across 3,144 rows and no nulls — consistent with a county-level dataset covering all states plus DC. Distribution mirrors county counts: Texas leads at 254 (8.08%), followed by Georgia (159) and Virginia (133), and entropy ratio of 0.93 indicates a fairly even spread across states. No anomalies flagged.
This column appears to be county-level median household income in dollars, with a median of 60931 and IQR spanning 52544.5 to 70605.25. The minimum of -666666666 is a sentinel value masquerading as data, dragging the mean to -148752.33 and producing a skew of -56.04 and kurtosis of 3138.99. Aside from that contamination, 3021 unique values across 3144 rows and 135 outliers (4.29%) suggest an otherwise plausible distribution capped at 170463.
Numeric poverty_rate spanning 1.60 to 55.10 with mean 13.82 and median 12.95, suggesting a percentage-style measure across 3144 rows (no nulls, no zeros). Distribution is right-skewed (skew 1.15, kurtosis 2.90) with 74 high-end outliers (2.35%) stretching the tail well past the Q3 of 16.77. Every one of the 3144 values is unique, consistent with a per-geography rate (e.g., one row per US county).
Likely a percentage of college-educated residents per row (probably a US county-level rate given n=3144). Values range from 0.0 to 56.35 with mean 16.26 and median 14.60, right-skewed (skew 1.42) with 134 outliers (4.26%) on the high tail. Near-unique (3143/3144) and no nulls, with only a single zero observation.
This column reports a population count for 3,144 rows with no nulls and 3,080 unique values, consistent with one row per US county. The distribution is extremely right-skewed (skew 13.17, kurtosis 289.76): the median is 25,784.5 yet the mean is 105,310.94 and the max reaches 9,936,690, with 440 rows (14.0%) flagged as outliers. The std of 333,792 dwarfs the IQR of 57,244, confirming a heavy upper tail driven by a few very large jurisdictions.
Numeric correlation
name text
Sample values (first 10)
- Bibb County, Alabama
- Day County, South Dakota
- Sabine County, Texas
- Fayette County, Texas
- Chisago County, Minnesota
- Dane County, Wisconsin
- Ramsey County, Minnesota
- Bath County, Virginia
- Freestone County, Texas
- Carroll County, Arkansas
state_fips numeric
county_fips numeric
state_name categorical
Top values (rank 1–20)
- Texas — 254
- Georgia — 159
- Virginia — 133
- Kentucky — 120
- Missouri — 115
- Kansas — 105
- Illinois — 102
- North Carolina — 100
- Iowa — 99
- Tennessee — 95
- Nebraska — 93
- Indiana — 92
- Ohio — 88
- Minnesota — 87
- Michigan — 83
- Mississippi — 82
- Oklahoma — 77
- Arkansas — 75
- Wisconsin — 72
- Alabama — 67