saturn·

housing units

source /home/coolhand/html/datavis/data_trove/cache/housing_units.parquet 3,222 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 3,222 U.S. counties with housing-unit counts (owner-occupied, renter-occupied, total) plus a FIPS code, county name, and the percent of renters. The three count columns are extremely right-skewed (skew between 9.5 and 15.8, kurtosis above 140) with 13–14% of rows flagged as outliers — a handful of huge urban counties (max total_housing_units of about 3.36M vs a median of roughly 10,021) dominate the distribution. The pct_renter field is far better behaved, centered near 26% with a much tighter spread, making it the most useful comparable metric across counties. Start by inspecting the long tail of total_housing_units, then use pct_renter to compare counties on a normalized basis.

citing: owner_occupied.stats · renter_occupied.stats · total_housing_units.stats · pct_renter.stats · fips.stats · county_name.top_words · row_count

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
total_housing_units numeric 0.0% 3,074
high_skew outliers
owner_occupied numeric 0.0% 3,001
high_skew outliers
renter_occupied numeric 0.0% 2,709
high_skew outliers
pct_renter numeric 0.0% 1,925

fips

numeric identifier
This column is the FIPS code for U.S. counties — every one of 3,222 rows is unique with no nulls, matching the count of U.S. counties. Values span 1001 to 72153, consistent with state-prefixed county FIPS identifiers, and the distribution is essentially uniform across the code space (skew 0.157, kurtosis -0.63, no outliers). Treatment: Treat as a categorical key; left-join on this code to county-level reference data rather than using as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g. 'X County, State'), with 3222 rows all unique and zero nulls. The token 'county,' appears 2999 times, so roughly 223 rows use a different administrative suffix (parish, borough, census area). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the real US county count. Treatment: use as a join key after splitting into county and state components. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_housing_units

numeric feature high_skew outliers
Counts of total housing units per record, almost certainly at a county or similar geographic level given 3,222 rows with 3,074 unique values and no nulls. The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a median of 10,021 but a max of 3,363,093, and 443 rows (13.7%) flagged as outliers well above the Q3 of 25,939. The mean of 39,402 sits far above the median, confirming a long heavy tail driven by a few very large geographies. Treatment: log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,074
min
32
max
3.363e+06
mean
3.94e+04
median
10,021
std
1.201e+05
q1
4211
q3
25,939
iqr
2.173e+04
skew
12.05
kurtosis
240.5
n_outliers
443
outlier_rate
0.1375
zero_rate
0

owner_occupied

numeric feature high_skew outliers
This appears to be a count of owner-occupied housing units per geographic area, with 3001 unique values across 3222 rows and effectively no zeros (zero_rate 0.0003) or nulls. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9): the median is 7325.5 but the mean is 25551.7 and the max reaches 1,552,164, producing 429 outliers (13.3% outlier rate). The interquartile range (3147.75 to 18863.5) is dwarfed by the standard deviation of 67553, indicating a long tail of large jurisdictions. Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,001
min
0
max
1.552e+06
mean
2.555e+04
median
7326
std
6.755e+04
q1
3148
q3
1.886e+04
iqr
1.572e+04
skew
9.516
kurtosis
146.9
n_outliers
429
outlier_rate
0.1331
zero_rate
0.0003104

renter_occupied

numeric feature high_skew outliers
Counts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of 2,579.5 — consistent with a geographic rollup (likely county or similar). The distribution is extremely right-skewed (skew 15.82, kurtosis 398.15) and 13.9% of rows fall outside the IQR fences, reflecting a few very large metros dominating a long tail of small areas. No nulls or zeros, and 2,709 unique values across 3,222 rows. Treatment: log-transform before modelling to tame the skew and heavy outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,709
min
28
max
1.811e+06
mean
1.385e+04
median
2580
std
5.535e+04
q1
1004
q3
7396
iqr
6,392
skew
15.82
kurtosis
398.2
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_renter

numeric feature
This is a numeric feature representing the percentage of renters per record, ranging from 3.01 to 100.0 with a mean of 27.35 and median of 26.07. The distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 outliers (2.7%) on the high end, suggesting a small set of records — likely dense urban areas — with renter shares far above the typical 21.64–31.66 IQR. No nulls or zeros, and 1925 unique values across 3222 rows indicate well-populated continuous data. Treatment: Use as-is or apply a mild transform (e.g., log or winsorize) before regression to dampen the right-skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,925
min
3.01
max
100
mean
27.35
median
26.07
std
8.564
q1
21.64
q3
31.66
iqr
10.02
skew
1.317
kurtosis
4.412
n_outliers
88
outlier_rate
0.02731
zero_rate
0