saturn·

housing housing crisis merged

source /home/coolhand/datasets/us-inequality-atlas/housing/housing_crisis_merged.csv 3,222 rows 16 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 3,222 U.S. counties (one row per county, identified by FIPS code) with 16 columns spanning housing stock, rent burden, income, and affordability metrics. The headline finding is that the affordability_category field is overwhelmingly imbalanced — 'Affordable' covers 3,192 of 3,222 counties (top_rate 0.99), with only 29 'Moderately Burdened' and 1 'Extremely Burdened', so this label likely needs reworking before it's useful. The rent-burden percentages tell a richer story: pct_rent_burdened_30plus has a mean of 36.4% and pct_rent_burdened_50plus a mean of 17.4%, suggesting real stress that the categorical label hides. Housing-count columns (owner_occupied, renter_occupied, total_housing_units) are extremely right-skewed (skew 9.5–15.8) with hundreds of outliers, reflecting a few very large urban counties — log scales recommended. Also note rent_to_income_ratio has an extreme max of 1200 with skew ~54, hinting at data-quality issues worth checking.

citing: row_count · column_count · affordability_category.top_rate · affordability_category.top_values · pct_rent_burdened_30plus.mean · pct_rent_burdened_50plus.mean · owner_occupied.skew · renter_occupied.skew · total_housing_units.skew · rent_to_income_ratio.max · rent_to_income_ratio.skew · median_household_income.mean

Schema

16 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
total_renters numeric 0.0% 2,709
high_skew outliers
pct_rent_burdened_30plus numeric 0.0% 2,146
pct_rent_burdened_50plus numeric 0.0% 1,769
median_gross_rent numeric 0.3% 983
outliers
median_household_income numeric 0.0% 3,098
outliers
total_housing_units numeric 0.0% 3,074
high_skew outliers
owner_occupied numeric 0.0% 3,001
high_skew outliers
renter_occupied numeric 0.0% 2,709
high_skew outliers
pct_renter numeric 0.0% 1,925
annual_rent numeric 0.3% 983
outliers
rent_to_income_ratio numeric 0.3% 1,269
high_skew
affordability_category categorical 0.0% 3
imbalance
hours_at_min_wage_for_rent numeric 0.3% 229
outliers
weeks_at_min_wage_for_rent numeric 0.3% 71
outliers

fips

numeric identifier
This is the US FIPS county code: every one of the 3222 rows is unique, there are no nulls, and the value range (1001 to 72153) matches the standard 2-digit state + 3-digit county encoding. Distribution stats like mean 31377.89 and skew 0.157 are not meaningful here since the integers are categorical identifiers, not quantities. Treatment: Treat as a categorical key; left-join on this code to bring in county/state attributes rather than using it as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g., 'X County, State'), with the token 'county,' appearing in 2999 of 3222 rows and state names like Texas (256), Virginia (189), and Georgia (159) topping the word frequencies. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly between 16 and 31 characters (mean 24.3). The 223 rows missing the 'county,' token likely correspond to parishes (Louisiana), boroughs/census areas (Alaska), or independent cities, which an analyst should not treat as data quality issues. Treatment: Split into county and state fields and left-join on a county FIPS lookup. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_renters

numeric feature high_skew outliers
This column reports a count of renters per record, ranging from 28 to 1,810,929 with a median of 2,579.5 and a mean of 13,851.1 — consistent with geographic or administrative aggregates rather than individual-level data. The distribution is severely right-skewed (skew 15.82, kurtosis 398.15) and 449 of 3,222 rows (14.0%) flag as outliers, with the std (55,351.6) dwarfing the IQR (6,392). No nulls or zeros are present, and 2,709 of 3,222 values are unique. Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,709
min
28
max
1.811e+06
mean
1.385e+04
median
2580
std
5.535e+04
q1
1004
q3
7396
iqr
6,392
skew
15.82
kurtosis
398.2
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_rent_burdened_30plus

numeric feature
Percentage of renter households spending 30%+ of income on rent, reported per record (n=3222). Distribution is roughly centered with median 37.36 and IQR 30.67–43.48, mildly left-skewed (-0.57) and ranging 0 to 64.96, with 58 outliers (1.8%) and a small zero_rate of 0.25%. With 2146 unique values out of 3222, granularity is high but not near-unique. Treatment: Use as-is as a numeric feature; no transform needed given near-symmetric, bounded percentage scale. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,146
min
0
max
64.96
mean
36.44
median
37.36
std
10.01
q1
30.67
q3
43.48
iqr
12.81
skew
-0.5673
kurtosis
0.5032
n_outliers
58
outlier_rate
0.018
zero_rate
0.002483

pct_rent_burdened_50plus

numeric feature
Likely a county- or tract-level percentage of renter households spending 50%+ of income on rent (severely rent-burdened). Values span 0 to 64.96 with mean 17.35 and median 17.62, and the distribution is nearly symmetric (skew 0.05, kurtosis 0.98) with only 1.5% outliers. About 0.9% of rows are exactly zero and there are no nulls across 3,222 records. Treatment: Use as-is in modelling; no transform needed given near-symmetric distribution. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,769
min
0
max
64.96
mean
17.35
median
17.62
std
6.577
q1
13.07
q3
21.63
iqr
8.557
skew
0.05436
kurtosis
0.9823
n_outliers
47
outlier_rate
0.01459
zero_rate
0.009311

median_gross_rent

numeric feature outliers
Numeric column capturing the median gross rent (presumably USD per month) across 3,222 rows with only 0.31% missing and no zeros. The distribution is right-skewed (skew 1.76, kurtosis 4.55) with median 818 and mean 890.9, and 225 values (7.0%) flagged as outliers stretching up to 2,805 against a Q3 of 978. Treatment: Log-transform or winsorize before regression to tame the right-skew and high-rent outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
10 (0.3%)
unique
983
min
297
max
2,805
mean
890.9
median
818
std
283.4
q1
718
q3
978
iqr
260
skew
1.763
kurtosis
4.55
n_outliers
225
outlier_rate
0.07005
zero_rate
0

median_household_income

numeric feature outliers
Median household income in dollars, almost certainly at a US county or similar geography given n=3222 and the typical 14525-170463 range. Distribution is right-skewed (skew 0.95, kurtosis 2.96) with 187 high-side outliers (5.8%) pulling the mean (62327) above the median (60461). Near-complete coverage with only a 0.03% null rate and no zeros. Treatment: Log-transform before regression to tame the right skew and high-income outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
1 (0.0%)
unique
3,098
min
14,525
max
170,463
mean
6.233e+04
median
60,461
std
1.777e+04
q1
51,823
q3
70,379
iqr
18,556
skew
0.9478
kurtosis
2.962
n_outliers
187
outlier_rate
0.05806
zero_rate
0

total_housing_units

numeric feature high_skew outliers
Counts of total housing units per record, almost certainly aggregated to a geography (likely US counties given n=3222). The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a median of 10,021 but a max of 3,363,093, and 443 rows (13.7%) flag as outliers — consistent with a few massive metros dwarfing thousands of small areas. No nulls or zeros, and 3,074 of 3,222 values are unique. Treatment: log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,074
min
32
max
3.363e+06
mean
3.94e+04
median
10,021
std
1.201e+05
q1
4211
q3
25,939
iqr
2.173e+04
skew
12.05
kurtosis
240.5
n_outliers
443
outlier_rate
0.1375
zero_rate
0

owner_occupied

numeric feature high_skew outliers
Likely a count of owner-occupied housing units per geographic area, with 3001 unique values across 3222 rows and effectively no zeros (zero_rate 0.0003) or nulls. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9): median is 7325.5 but the mean is 25551.7 and the max reaches 1,552,164, producing 429 outliers (13.3%). Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,001
min
0
max
1.552e+06
mean
2.555e+04
median
7326
std
6.755e+04
q1
3148
q3
1.886e+04
iqr
1.572e+04
skew
9.516
kurtosis
146.9
n_outliers
429
outlier_rate
0.1331
zero_rate
0.0003104

renter_occupied

numeric feature high_skew outliers
Counts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of just 2,579.5. The distribution is severely right-skewed (skew 15.82, kurtosis 398.15) with 449 outliers (14% of rows), consistent with a few very large geographies dominating an otherwise small-county distribution. No nulls or zeros, and 2,709 unique values across 3,222 rows suggest county- or tract-level granularity. Treatment: log-transform before regression to tame the extreme right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,709
min
28
max
1.811e+06
mean
1.385e+04
median
2580
std
5.535e+04
q1
1004
q3
7396
iqr
6,392
skew
15.82
kurtosis
398.2
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_renter

numeric feature
Percentage of renter-occupied housing units across 3,222 records, ranging from 3.01 to 100.0 with a mean of 27.35 and median of 26.07. The distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 high-side outliers (2.7%); the 100.0 maximum stands out against a Q3 of 31.66 and suggests a few all-renter localities. Treatment: Use as-is or apply a mild transform (e.g., logit on the 0-100 scale) before linear models given the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,925
min
3.01
max
100
mean
27.35
median
26.07
std
8.564
q1
21.64
q3
31.66
iqr
10.02
skew
1.317
kurtosis
4.412
n_outliers
88
outlier_rate
0.02731
zero_rate
0

annual_rent

numeric feature outliers
Likely an annual rent figure in currency units, with 3222 records and 983 distinct values ranging from 3564 to 33660 and a median of 9816. The distribution is right-skewed (skew 1.76, kurtosis 4.55) and 225 rows (7.0%) sit beyond the outlier fences, suggesting a long tail of high-rent cases above the Q3 of 11736. Nulls are negligible (0.31%) and there are no zero values. Treatment: Log-transform before regression to dampen the right-skew and high-rent outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
10 (0.3%)
unique
983
min
3,564
max
33,660
mean
1.069e+04
median
9,816
std
3400
q1
8,616
q3
11,736
iqr
3,120
skew
1.763
kurtosis
4.55
n_outliers
225
outlier_rate
0.07005
zero_rate
0

rent_to_income_ratio

numeric feature high_skew
Likely a rent-to-income ratio expressed as a percentage, with a tight interquartile band between 15.1 and 19.39 and median 17.06. The distribution is severely contaminated: skew of 53.98 and kurtosis of 3007 are driven by a max of 1200.0 against a mean of 17.89, and 107 outliers (3.33%) sit far outside the IQR of 4.29. Nulls are negligible at 0.28% and there are no zeros, but the extreme tail suggests data-entry errors or unit inconsistencies. Treatment: Winsorize or cap extreme values and log-transform before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
9 (0.3%)
unique
1,269
min
6.1
max
1,200
mean
17.89
median
17.06
std
21.2
q1
15.1
q3
19.39
iqr
4.29
skew
53.98
kurtosis
3007
n_outliers
107
outlier_rate
0.0333
zero_rate
0

affordability_category

categorical label imbalance
A 3-level categorical flag bucketing rows into housing affordability tiers. The distribution is extremely degenerate: 'Affordable' covers 3192 of 3222 rows (top_rate 0.9907), 'Moderately Burdened' has 29, and 'Extremely Burdened' has just 1, yielding an entropy_ratio of 0.049. As a predictor it carries almost no information, and the single 'Extremely Burdened' row will not survive any train/test split. Treatment: Collapse to a binary Affordable vs. Burdened flag or drop; near-constant as-is. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3
top_value
Affordable
top_rate
0.9907
cardinality
3
entropy
0.07815
entropy_ratio
0.04931

hours_at_min_wage_for_rent

numeric feature outliers
This column reports the number of minimum-wage work hours required to afford rent, with values ranging from 41 to 387 (median 113, mean 122.9). The distribution is right-skewed (skew 1.76, kurtosis 4.55) and 222 rows (6.9%) flag as outliers in the upper tail, suggesting a subset of high-cost areas where rent demands far more hours than typical. Nulls are negligible (0.31%) and there are no zeros, so coverage is essentially complete. Treatment: Log-transform or winsorize before regression to dampen the right-tail outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
10 (0.3%)
unique
229
min
41
max
387
mean
122.9
median
113
std
39.09
q1
99
q3
135
iqr
36
skew
1.763
kurtosis
4.546
n_outliers
222
outlier_rate
0.06912
zero_rate
0

weeks_at_min_wage_for_rent

numeric feature outliers
This column reports the number of weeks of minimum-wage work needed to cover rent, ranging from 1.0 to 9.7 with a median of 2.8 and IQR of 0.9. The distribution is right-skewed (skew 1.76, kurtosis 4.57) and 222 rows (6.9%) flag as outliers on the high end, pointing to localities where rent dramatically outpaces minimum wage. Nulls are negligible (0.31%) and only 71 unique values appear across 3222 rows, suggesting rounded or coarsely binned figures. Treatment: Log-transform or winsorize before regression to dampen the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
10 (0.3%)
unique
71
min
1
max
9.7
mean
3.072
median
2.8
std
0.9775
q1
2.5
q3
3.4
iqr
0.9
skew
1.763
kurtosis
4.567
n_outliers
222
outlier_rate
0.06912
zero_rate
0