saturn

/home/coolhand/html/datavis/data_trove/cache/median_rents.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T16:53:40+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/median_rents.parquet
Total rows3,222
Profiled sample3,222
Columns3
Generated2026-05-01T16:53:40+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 3,222 rows of U.S. county-level median gross rent figures, keyed by county name and FIPS code. The standout issue is the median_gross_rent column: while the median is a plausible $817.50 and the IQR runs $718 to $978, the minimum is -666,666,666, dragging the mean to roughly -2.07M and producing extreme skew (-17.87) and kurtosis (317.2). That sentinel-style negative value and the 235 flagged outliers (7.3%) should be cleaned or filtered before any analysis. The fips column is well-behaved and unique per row, and county_name is essentially an identifier (3,222 unique values), so neither needs deep inspection beyond confirming coverage.

fips high anthropic:claude-opus-4-7

This is the county-level FIPS code: an integer geographic identifier where every one of the 3222 rows is unique and non-null. The range (1001 to 72153) and distribution (mean 31377, median 30022, low skew 0.16) are consistent with the standard 5-digit state+county FIPS scheme covering US states and territories. There is nothing anomalous here — it behaves as a clean primary key rather than a numeric feature.

county_name high anthropic:claude-opus-4-7

This column holds fully-qualified US county names (e.g. 'X County, Texas'), with the word 'county,' appearing in 2999 of 3222 rows and state names like Texas (256), Virginia (189) and Georgia (159) dominating the remaining tokens. Every one of the 3222 values is unique with zero nulls or duplicates, so it functions as a row identifier rather than a categorical feature. String lengths are tight (16-59 chars, median 24) and there is no boilerplate, URL or emoji noise.

median_gross_rent high anthropic:claude-opus-4-7

Median gross rent in dollars, with a healthy interquartile range of 718 to 978 around a median of 817.5. The minimum of -666666666 is clearly a sentinel for missing data, dragging the mean to -2068220 and producing extreme skew (-17.87) and kurtosis (317.20). 235 outliers (7.3%) flag this contamination even though null_rate is 0.

Numeric correlation

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

median_gross_rent numeric

skew=-17.87 7.3% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique984
min-666,666,666
max2,805
mean-2,068,220
median817.500
std37,088,473
q1718.000
q3978.000
iqr260.000
skew-17.866
kurtosis317.203
n_outliers235
outlier_rate0.073
zero_rate0.000