saturn·

housing housing crisis counties

source /home/coolhand/html/datavis/data_trove/demographic/housing/housing_crisis_counties.csv 3,222 rows 16 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 3,222 US counties with 16 columns describing housing affordability — rents, incomes, renter shares, and rent-burden percentages. Several core numeric fields (annual_rent, median_gross_rent, median_household_income, rent_to_income_ratio) contain extreme negative sentinel values like -666666666 and -7999999992 that are dragging means deeply negative and producing skew of -17 to -56; these need to be cleaned or filtered before any analysis. The affordability_category field is heavily imbalanced, with 'Affordable' covering 99.1% of counties and only 1 county labeled 'Extremely Burdened', which suggests the categorization rule may be miscalibrated. Once the sentinel values are removed, the rent-burden percentage columns (pct_rent_burdened_30plus around a median of 37.4%, pct_rent_burdened_50plus around 17.6%) look like the cleanest signals to start with.

citing: annual_rent · median_gross_rent · median_household_income · rent_to_income_ratio · affordability_category · pct_rent_burdened_30plus · pct_rent_burdened_50plus · pct_renter

Schema

16 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
total_renters numeric 0.0% 2,709
high_skew outliers
pct_rent_burdened_30plus numeric 0.0% 2,146
pct_rent_burdened_50plus numeric 0.0% 1,769
median_gross_rent numeric 0.0% 984
high_skew outliers
median_household_income numeric 0.0% 3,099
high_skew outliers
total_housing_units numeric 0.0% 3,074
high_skew outliers
owner_occupied numeric 0.0% 3,001
high_skew outliers
renter_occupied numeric 0.0% 2,709
high_skew outliers
pct_renter numeric 0.0% 1,925
annual_rent numeric 0.0% 984
high_skew outliers
rent_to_income_ratio numeric 0.0% 1,278
high_skew
affordability_category categorical 0.0% 3
imbalance
hours_at_min_wage_for_rent numeric 0.0% 230
high_skew outliers
weeks_at_min_wage_for_rent numeric 0.0% 72
high_skew outliers

fips

numeric identifier
This column is the US county FIPS code, a 4-5 digit geographic identifier where each row is unique (n=3222, n_unique=3222). Values span 1001 to 72153 with no nulls or zeros, consistent with the standard state+county encoding (e.g., 01001 Alabama through 72xxx Puerto Rico). The numeric statistics (mean 31377, skew 0.16) are not meaningful here since the digits encode geography, not magnitude. Treatment: Treat as a categorical geographic key; left-join on this to bring in county-level attributes rather than using as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column holds fully-qualified US county names (e.g., 'X County, State'), with 'county,' appearing in 2999 of 3222 rows and state tokens like Texas (256), Virginia (189), and Georgia (159) trailing as the second word. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly between 16 and 31 characters (median 24). The 223 rows missing the 'county,' token likely correspond to parishes (Louisiana), boroughs (Alaska), or independent cities — worth confirming before any string join. Treatment: Split into county and state fields, then use as a join key against FIPS or census tables. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_renters

numeric feature high_skew outliers
A count of renters per record, ranging from 28 to 1,810,929 with a median of 2,579.5 but a mean of 13,851 — classic right-tailed population/household data. The distribution is severely skewed (skew 15.82, kurtosis 398.15) with 449 outliers (13.9% of rows) and a standard deviation (55,351) far exceeding the IQR (6,392). No nulls or zeros, and 2,709 unique values across 3,222 rows suggest aggregated geographic units rather than individuals. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,709
min
28
max
1.811e+06
mean
1.385e+04
median
2580
std
5.535e+04
q1
1004
q3
7396
iqr
6,392
skew
15.82
kurtosis
398.2
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_rent_burdened_30plus

numeric feature
This appears to be the share of renter households spending 30%+ of income on rent, expressed as a percentage (0 to 64.96, mean 36.44, median 37.36). The distribution is moderately left-skewed (-0.57) and tightly concentrated, with an IQR of 12.81 around a Q1-Q3 range of 30.67-43.48. Only 0.25% of rows are zero and 1.8% flag as outliers, suggesting the metric is well-populated and behaves consistently across the 3,222 rows. Treatment: Use as-is as a continuous feature; no transformation needed given the near-symmetric, bounded distribution. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,146
min
0
max
64.96
mean
36.44
median
37.36
std
10.01
q1
30.67
q3
43.48
iqr
12.81
skew
-0.5673
kurtosis
0.5032
n_outliers
58
outlier_rate
0.018
zero_rate
0.002483

pct_rent_burdened_50plus

numeric feature
This column reports the percentage of households spending 50%+ of income on rent, observed for 3,222 geographies with no nulls. The distribution is roughly symmetric (skew 0.054, kurtosis 0.98) and centered near 17.35% mean / 17.62% median, with an IQR of 8.56 points and 47 outliers (1.46%) reaching up to 64.96%. About 0.93% of rows are exactly zero, which may reflect very small or non-residential areas. Treatment: Use as-is for modelling; no transform needed given near-symmetric distribution, but consider winsorizing the 47 high-end outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,769
min
0
max
64.96
mean
17.35
median
17.62
std
6.577
q1
13.07
q3
21.63
iqr
8.557
skew
0.05436
kurtosis
0.9823
n_outliers
47
outlier_rate
0.01459
zero_rate
0.009311

median_gross_rent

numeric feature high_skew outliers
This column reports median gross rent in dollars, with a typical value near the median of 817.5 and an interquartile range of 718 to 978. The data is corrupted by sentinel values: the minimum is -666666666 and the mean is -2068220 with std 37088473, producing extreme negative skew (-17.87) and kurtosis (317.20). Roughly 7.3% of rows (235) are flagged as outliers, almost certainly these sentinel codes rather than legitimate rents. Treatment: Replace negative sentinel values with nulls before any modelling or aggregation. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
984
min
-6.667e+08
max
2,805
mean
-2.068e+06
median
817.5
std
3.709e+07
q1
718
q3
978
iqr
260
skew
-17.87
kurtosis
317.2
n_outliers
235
outlier_rate
0.07294
zero_rate
0

median_household_income

numeric feature high_skew outliers
This column reports median household income per row (likely county-level given n=3222), with 3099 unique values and no nulls. The minimum of -666666666 is a classic sentinel for missing data and single-handedly drags the mean to -144603 despite a median of 60458.5; skew of -56.7 and kurtosis of 3216 confirm the contamination. After removing sentinels, the IQR of 18561.5 between 51814.75 and 70376.25 looks like a plausible income distribution, with 188 flagged outliers (5.8%). Treatment: Recode the -666666666 sentinel to null, then consider a log or robust scaler before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,099
min
-6.667e+08
max
170,463
mean
-1.446e+05
median
6.046e+04
std
1.175e+07
q1
5.181e+04
q3
7.038e+04
iqr
1.856e+04
skew
-56.74
kurtosis
3217
n_outliers
188
outlier_rate
0.05835
zero_rate
0

total_housing_units

numeric feature high_skew outliers
Counts of housing units per record, almost certainly aggregated to a geographic area (county or similar) given 3,222 rows and a median of 10,021 units. The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a max of 3,363,093 against a Q3 of just 25,939, and 13.7% of rows flag as outliers. No nulls or zeros, and 3,074 unique values out of 3,222 suggest near-distinct totals per area. Treatment: Log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,074
min
32
max
3.363e+06
mean
3.94e+04
median
10,021
std
1.201e+05
q1
4211
q3
25,939
iqr
2.173e+04
skew
12.05
kurtosis
240.5
n_outliers
443
outlier_rate
0.1375
zero_rate
0

owner_occupied

numeric feature high_skew outliers
Likely a count of owner-occupied housing units per geographic area, given the integer-like range from 0 to 1,552,164 and median of 7,325.5. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9) with 429 outliers (13.3% rate) and a mean (25,551.7) far above the median, indicating a long tail of high-population areas. Near-unique values (3,001 of 3,222) and effectively no zeros (0.03%) are consistent with a per-region count rather than a categorical flag. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,001
min
0
max
1.552e+06
mean
2.555e+04
median
7326
std
6.755e+04
q1
3148
q3
1.886e+04
iqr
1.572e+04
skew
9.516
kurtosis
146.9
n_outliers
429
outlier_rate
0.1331
zero_rate
0.0003104

renter_occupied

numeric feature high_skew outliers
Counts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of 2,579.5 but a mean of 13,851. The distribution is severely right-skewed (skew 15.82, kurtosis 398.15) and 13.9% of rows fall outside the IQR fence, consistent with a small number of very large geographies dominating the tail. Treatment: Log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,709
min
28
max
1.811e+06
mean
1.385e+04
median
2580
std
5.535e+04
q1
1004
q3
7396
iqr
6,392
skew
15.82
kurtosis
398.2
n_outliers
449
outlier_rate
0.1394
zero_rate
0

pct_renter

numeric feature
Percent of renter-occupied housing units, reported per row across 3222 records with no nulls and no zeros. Values span 3.01 to 100.0 with a mean of 27.35 and median of 26.07, and the distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 high-side outliers (2.7%). The 100.0 maximum is worth checking — it suggests at least one fully-renter geography that may warrant verification. Treatment: Mild right-skew; consider a log1p or sqrt transform before linear modelling and inspect the 100.0 cases. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,925
min
3.01
max
100
mean
27.35
median
26.07
std
8.564
q1
21.64
q3
31.66
iqr
10.02
skew
1.317
kurtosis
4.412
n_outliers
88
outlier_rate
0.02731
zero_rate
0

annual_rent

numeric feature high_skew outliers
Likely an annual rent amount in currency units, with a typical lease near the median of 9810 and an interquartile band from 8616 to 11736. The column is corrupted by sentinel-like negatives: the min is -7999999992 and the mean of -24818640.7 is impossible for rent, driving extreme skew (-17.87) and kurtosis (317.2). About 7.3% of rows (235) flag as outliers, while 0% are null or zero, suggesting missing values were encoded as large negatives rather than NaN. Treatment: Replace large-magnitude negatives with NaN, then winsorize or log-transform before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
984
min
-8e+09
max
33,660
mean
-2.482e+07
median
9,810
std
4.451e+08
q1
8,616
q3
11,736
iqr
3,120
skew
-17.87
kurtosis
317.2
n_outliers
235
outlier_rate
0.07294
zero_rate
0

rent_to_income_ratio

numeric feature high_skew
Likely a rent-to-income ratio feature, with a tight interquartile range between 15.07 and 19.3875 and a median of 17.05 that suggests typical values are well-behaved percentages. However, the column is severely corrupted: the minimum is -24357569.09, the mean is -37244.13, std is 752361.7, skew is -22.74 and kurtosis is 570.21, indicating extreme negative outliers that are implausible for a ratio. 114 outliers (3.54%) are flagged and the max of 1200.0 is also suspicious. Treatment: Investigate and clip or null the negative and extreme values, then consider a robust scaler or log-transform before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,278
min
-2.436e+07
max
1,200
mean
-3.724e+04
median
17.05
std
7.524e+05
q1
15.07
q3
19.39
iqr
4.317
skew
-22.74
kurtosis
570.2
n_outliers
114
outlier_rate
0.03538
zero_rate
0

affordability_category

categorical label imbalance
A 3-level categorical bucket classifying affordability, almost certainly derived from a rent or income ratio. The distribution is severely degenerate: 'Affordable' covers 3192 of 3222 rows (top_rate 0.9907), 'Moderately Burdened' has 29, and 'Extremely Burdened' has just 1, yielding an entropy ratio of 0.049. With effectively no variance, this column carries little discriminative signal. Treatment: Drop or collapse to binary (Affordable vs. Burdened); too imbalanced for direct modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3
top_value
Affordable
top_rate
0.9907
cardinality
3
entropy
0.07815
entropy_ratio
0.04931

hours_at_min_wage_for_rent

numeric feature high_skew outliers
This column appears to be the number of minimum-wage hours required to afford rent, with a typical value around 113 hours (median) and an interquartile range of 99-135. However, the data is severely corrupted by at least one extreme negative value (min = -91,954,023), which drags the mean to -285,271 despite a sensible median, and produces extreme skew (-17.87) and kurtosis (317.20). 232 outliers (7.2%) are flagged, suggesting the negatives are likely sentinel codes or data-entry errors rather than real measurements. Treatment: Filter or null out negative sentinel values, then consider a log or robust scaling before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
230
min
-9.195e+07
max
387
mean
-2.853e+05
median
113
std
5.116e+06
q1
99
q3
135
iqr
36
skew
-17.87
kurtosis
317.2
n_outliers
232
outlier_rate
0.072
zero_rate
0

weeks_at_min_wage_for_rent

numeric feature high_skew outliers
Likely the number of weeks of minimum-wage labor required to cover rent, with a typical value near 2.8 weeks and an interquartile range of 2.5–3.4. The distribution is corrupted by extreme negatives: the minimum is -2,298,850.6 and the mean is -7,131.79, driving skew of -17.87 and kurtosis of 317.2. 7.2% of rows (232) are flagged outliers, suggesting sentinel values or unit/sign errors rather than genuine measurements. Treatment: Investigate and clip/null the negative sentinel values before any modelling or aggregation. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
72
min
-2.299e+06
max
9.7
mean
-7132
median
2.8
std
1.279e+05
q1
2.5
q3
3.4
iqr
0.9
skew
-17.87
kurtosis
317.2
n_outliers
232
outlier_rate
0.072
zero_rate
0