saturn·

median rents

source /home/coolhand/html/datavis/data_trove/cache/median_rents.parquet 3,222 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows of U.S. county-level median gross rent figures, keyed by county name and FIPS code. The standout issue is the median_gross_rent column: while the median is a plausible $817.50 and the IQR runs $718 to $978, the minimum is -666,666,666, dragging the mean to roughly -2.07M and producing extreme skew (-17.87) and kurtosis (317.2). That sentinel-style negative value and the 235 flagged outliers (7.3%) should be cleaned or filtered before any analysis. The fips column is well-behaved and unique per row, and county_name is essentially an identifier (3,222 unique values), so neither needs deep inspection beyond confirming coverage.

citing: median_gross_rent.stats.median · median_gross_rent.stats.mean · median_gross_rent.stats.min · median_gross_rent.stats.max · median_gross_rent.stats.q1 · median_gross_rent.stats.q3 · median_gross_rent.stats.skew · median_gross_rent.stats.kurtosis · median_gross_rent.stats.n_outliers · median_gross_rent.stats.outlier_rate · fips.stats.min · fips.stats.max · row_count · column_count

Schema

3 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
median_gross_rent numeric 0.0% 984
high_skew outliers

fips

numeric identifier
This is the county-level FIPS code: an integer geographic identifier where every one of the 3222 rows is unique and non-null. The range (1001 to 72153) and distribution (mean 31377, median 30022, low skew 0.16) are consistent with the standard 5-digit state+county FIPS scheme covering US states and territories. There is nothing anomalous here — it behaves as a clean primary key rather than a numeric feature. Treatment: Treat as a categorical key for joins; do not use as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g. 'X County, Texas'), with the word 'county,' appearing in 2999 of 3222 rows and state names like Texas (256), Virginia (189) and Georgia (159) dominating the remaining tokens. Every one of the 3222 values is unique with zero nulls or duplicates, so it functions as a row identifier rather than a categorical feature. String lengths are tight (16-59 chars, median 24) and there is no boilerplate, URL or emoji noise. Treatment: Split into county and state fields, or use as a join key against county-level reference data. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

median_gross_rent

numeric feature high_skew outliers
Median gross rent in dollars, with a healthy interquartile range of 718 to 978 around a median of 817.5. The minimum of -666666666 is clearly a sentinel for missing data, dragging the mean to -2068220 and producing extreme skew (-17.87) and kurtosis (317.20). 235 outliers (7.3%) flag this contamination even though null_rate is 0. Treatment: Replace the -666666666 sentinel with NA before any modelling or aggregation. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
984
min
-6.667e+08
max
2,805
mean
-2.068e+06
median
817.5
std
3.709e+07
q1
718
q3
978
iqr
260
skew
-17.87
kurtosis
317.2
n_outliers
235
outlier_rate
0.07294
zero_rate
0