saturn·

housing housing crisis merged

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/us-inequality-atlas/housing/housing_crisis_merged.csv

Saturn profiled 3,222 rows across 16 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-inequality-atlas/housing/housing_crisis_merged.csv",
    "--findings", "housing-housing_crisis_merged.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset covers 3,222 U.S. counties (one row per county, identified by FIPS code) with 16 columns spanning housing stock, rent burden, income, and affordability metrics. The headline finding is that the affordability_category field is overwhelmingly imbalanced — 'Affordable' covers 3,192 of 3,222 counties (top_rate 0.99), with only 29 'Moderately Burdened' and 1 'Extremely Burdened', so this label likely needs reworking before it's useful. The rent-burden percentages tell a richer story: pct_rent_burdened_30plus has a mean of 36.4% and pct_rent_burdened_50plus a mean of 17.4%, suggesting real stress that the categorical label hides. Housing-count columns (owner_occupied, renter_occupied, total_housing_units) are extremely right-skewed (skew 9.5–15.8) with hundreds of outliers, reflecting a few very large urban counties — log scales recommended. Also note rent_to_income_ratio has an extreme max of 1200 with skew ~54, hinting at data-quality issues worth checking.

citing: row_count · column_count · affordability_category.top_rate · affordability_category.top_values · pct_rent_burdened_30plus.mean · pct_rent_burdened_50plus.mean · owner_occupied.skew · renter_occupied.skew · total_housing_units.skew · rent_to_income_ratio.max · rent_to_income_ratio.skew · median_household_income.mean

Out[4]:

saturn.schema() · 16 columns

column kind n null% unique alerts
fips numeric 3,222 0.0% 3,222
county_name text 3,222 0.0% 3,222 near_unique
total_renters numeric 3,222 0.0% 2,709 high_skew outliers
pct_rent_burdened_30plus numeric 3,222 0.0% 2,146
pct_rent_burdened_50plus numeric 3,222 0.0% 1,769
median_gross_rent numeric 3,222 0.3% 983 outliers
median_household_income numeric 3,222 0.0% 3,098 outliers
total_housing_units numeric 3,222 0.0% 3,074 high_skew outliers
owner_occupied numeric 3,222 0.0% 3,001 high_skew outliers
renter_occupied numeric 3,222 0.0% 2,709 high_skew outliers
pct_renter numeric 3,222 0.0% 1,925
annual_rent numeric 3,222 0.3% 983 outliers
rent_to_income_ratio numeric 3,222 0.3% 1,269 high_skew
affordability_category categorical 3,222 0.0% 3 imbalance
hours_at_min_wage_for_rent numeric 3,222 0.3% 229 outliers
weeks_at_min_wage_for_rent numeric 3,222 0.3% 71 outliers
Fig 1.
affordability_category · Shows how nearly every county falls into 'Affordable', exposing how unbalanced this label is.
Show data table
Top values for affordability_category (3 unique shown, of 3 total).
valuecountshare
Affordable319299.1%
Moderately Burdened290.9%
Extremely Burdened10.0%
Fig 2.
pct_rent_burdened_30plus · Distribution of the share of renters paying 30%+ of income on rent — a more honest view of burden than the categorical field.
Show data table
Histogram bins for pct_rent_burdened_30plus (median: 37.36).
bincount
0 – 1.6249
1.624 – 3.2485
3.248 – 4.8723
4.872 – 6.4965
6.496 – 8.129
8.12 – 9.74413
9.744 – 11.3711
11.37 – 12.9916
12.99 – 14.6226
14.62 – 16.2419
16.24 – 17.8635
17.86 – 19.4943
19.49 – 21.1152
21.11 – 22.7452
22.74 – 24.3673
24.36 – 25.9899
25.98 – 27.61109
27.61 – 29.23116
29.23 – 30.86132
30.86 – 32.48159
32.48 – 34.1189
34.1 – 35.73209
35.73 – 37.35227
37.35 – 38.98239
38.98 – 40.6205
40.6 – 42.22209
42.22 – 43.85210
43.85 – 45.47190
45.47 – 47.1131
47.1 – 48.72114
48.72 – 50.34118
50.34 – 51.9769
51.97 – 53.5951
53.59 – 55.2234
55.22 – 56.8424
56.84 – 58.466
58.46 – 60.093
60.09 – 61.712
61.71 – 63.343
63.34 – 64.963
Fig 3.
pct_rent_burdened_50plus · Severe rent burden at the county level; watch the right tail for the most stressed places.
Show data table
Histogram bins for pct_rent_burdened_50plus (median: 17.62).
bincount
0 – 1.62442
1.624 – 3.24827
3.248 – 4.87234
4.872 – 6.49663
6.496 – 8.12102
8.12 – 9.744148
9.744 – 11.37163
11.37 – 12.99214
12.99 – 14.62242
14.62 – 16.24310
16.24 – 17.86315
17.86 – 19.49332
19.49 – 21.11335
21.11 – 22.74264
22.74 – 24.36219
24.36 – 25.98150
25.98 – 27.6199
27.61 – 29.2364
29.23 – 30.8639
30.86 – 32.4820
32.48 – 34.121
34.1 – 35.739
35.73 – 37.352
37.35 – 38.983
38.98 – 40.61
40.6 – 42.221
42.22 – 43.851
43.85 – 45.470
45.47 – 47.11
47.1 – 48.720
48.72 – 50.340
50.34 – 51.970
51.97 – 53.590
53.59 – 55.220
55.22 – 56.840
56.84 – 58.460
58.46 – 60.090
60.09 – 61.710
61.71 – 63.340
63.34 – 64.961
Fig 4.
median_household_income · County income distribution to contextualise rent-burden numbers; mildly right-skewed with a long upper tail.
Show data table
Histogram bins for median_household_income (median: 60461.0).
bincount
1.452e+04 – 1.842e+0414
1.842e+04 – 2.232e+0430
2.232e+04 – 2.622e+0426
2.622e+04 – 3.012e+0410
3.012e+04 – 3.402e+0428
3.402e+04 – 3.792e+0452
3.792e+04 – 4.181e+04102
4.181e+04 – 4.571e+04154
4.571e+04 – 4.961e+04237
4.961e+04 – 5.351e+04286
5.351e+04 – 5.741e+04352
5.741e+04 – 6.131e+04400
6.131e+04 – 6.52e+04342
6.52e+04 – 6.91e+04294
6.91e+04 – 7.3e+04231
7.3e+04 – 7.69e+04150
7.69e+04 – 8.08e+04136
8.08e+04 – 8.47e+0499
8.47e+04 – 8.86e+0452
8.86e+04 – 9.249e+0443
9.249e+04 – 9.639e+0444
9.639e+04 – 1.003e+0526
1.003e+05 – 1.042e+0523
1.042e+05 – 1.081e+0523
1.081e+05 – 1.12e+0511
1.12e+05 – 1.159e+059
1.159e+05 – 1.198e+0512
1.198e+05 – 1.237e+0510
1.237e+05 – 1.276e+055
1.276e+05 – 1.315e+054
1.315e+05 – 1.354e+053
1.354e+05 – 1.393e+056
1.393e+05 – 1.432e+052
1.432e+05 – 1.471e+051
1.471e+05 – 1.51e+051
1.51e+05 – 1.549e+051
1.549e+05 – 1.588e+050
1.588e+05 – 1.627e+050
1.627e+05 – 1.666e+051
1.666e+05 – 1.705e+051
Fig 5.
pct_renter · Share of renter households per county — most counties cluster near 27% but a few reach 100%, worth inspecting.
Show data table
Histogram bins for pct_renter (median: 26.07).
bincount
3.01 – 5.4351
5.435 – 7.8593
7.859 – 10.289
10.28 – 12.7126
12.71 – 15.1363
15.13 – 17.56156
17.56 – 19.98316
19.98 – 22.41371
22.41 – 24.83450
24.83 – 27.26419
27.26 – 29.68357
29.68 – 32.11301
32.11 – 34.53203
34.53 – 36.96169
36.96 – 39.38115
39.38 – 41.8175
41.81 – 44.2356
44.23 – 46.6645
46.66 – 49.0825
49.08 – 51.515
51.5 – 53.9311
53.93 – 56.3510
56.35 – 58.788
58.78 – 61.24
61.2 – 63.634
63.63 – 66.051
66.05 – 68.481
68.48 – 70.93
70.9 – 73.331
73.33 – 75.751
75.75 – 78.180
78.18 – 80.61
80.6 – 83.030
83.03 – 85.451
85.45 – 87.880
87.88 – 90.30
90.3 – 92.730
92.73 – 95.150
95.15 – 97.580
97.58 – 1001
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
total_rentersnumeric0.0%
pct_rent_burdened_30plusnumeric0.0%
pct_rent_burdened_50plusnumeric0.0%
median_gross_rentnumeric0.3%
median_household_incomenumeric0.0%
total_housing_unitsnumeric0.0%
owner_occupiednumeric0.0%
renter_occupiednumeric0.0%
pct_renternumeric0.0%
annual_rentnumeric0.3%
rent_to_income_rationumeric0.3%
affordability_categorycategorical0.0%
hours_at_min_wage_for_rentnumeric0.3%
weeks_at_min_wage_for_rentnumeric0.3%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
fipstotal_renterspct_rent_burdened_30pluspct_rent_burdened_50plusmedian_gross_rentmedian_household_incometotal_housing_unitsowner_occupiedrenter_occupiedpct_renterannual_rentrent_to_income_ratio
fips+1.00-0.06-0.16-0.10-0.12-0.11-0.06-0.06-0.06-0.10-0.12+0.05
total_renters-0.06+1.00+0.23+0.20+0.17+0.12+0.99+0.96+1.00+0.22+0.17+0.06
pct_rent_burdened_30plus-0.16+0.23+1.00+0.82+0.18+0.07+0.26+0.28+0.23+0.19+0.18+0.08
pct_rent_burdened_50plus-0.10+0.20+0.82+1.00+0.13+0.00+0.23+0.25+0.20+0.22+0.13+0.07
median_gross_rent-0.12+0.17+0.18+0.13+1.00+0.35+0.16+0.15+0.17+0.12+1.00+0.18
median_household_income-0.11+0.12+0.07+0.00+0.35+1.00+0.13+0.14+0.12+0.02+0.35-0.21
total_housing_units-0.06+0.99+0.26+0.23+0.16+0.13+1.00+0.99+0.99+0.19+0.16+0.05
owner_occupied-0.06+0.96+0.28+0.25+0.15+0.14+0.99+1.00+0.96+0.16+0.15+0.05
renter_occupied-0.06+1.00+0.23+0.20+0.17+0.12+0.99+0.96+1.00+0.22+0.17+0.06
pct_renter-0.10+0.22+0.19+0.22+0.12+0.02+0.19+0.16+0.22+1.00+0.12+0.09
annual_rent-0.12+0.17+0.18+0.13+1.00+0.35+0.16+0.15+0.17+0.12+1.00+0.18
rent_to_income_ratio+0.05+0.06+0.08+0.07+0.18-0.21+0.05+0.05+0.06+0.09+0.18+1.00

fips numeric identifier

This is the US FIPS county code: every one of the 3222 rows is unique, there are no nulls, and the value range (1001 to 72153) matches the standard 2-digit state + 3-digit county encoding. Distribution stats like mean 31377.89 and skew 0.157 are not meaningful here since the integers are categorical identifiers, not quantities.

Treatment: Treat as a categorical key; left-join on this code to bring in county/state attributes rather than using it as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["fips"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
min 1,001
max 72,153
mean 3.138e+04
median 30,022
std 1.63e+04
q1 1.903e+04
q3 4.61e+04
iqr 27,075
skew 0.1574
kurtosis -0.6314
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30022.0).
bincount
1001 – 278097
2780 – 455915
4559 – 6337133
6337 – 811659
8116 – 989514
9895 – 1.167e+044
1.167e+04 – 1.345e+04226
1.345e+04 – 1.523e+045
1.523e+04 – 1.701e+0449
1.701e+04 – 1.879e+04189
1.879e+04 – 2.057e+04204
2.057e+04 – 2.235e+04184
2.235e+04 – 2.413e+0439
2.413e+04 – 2.59e+0415
2.59e+04 – 2.768e+04170
2.768e+04 – 2.946e+04196
2.946e+04 – 3.124e+04150
3.124e+04 – 3.302e+0427
3.302e+04 – 3.48e+0421
3.48e+04 – 3.658e+0495
3.658e+04 – 3.836e+04153
3.836e+04 – 4.013e+04155
4.013e+04 – 4.191e+0446
4.191e+04 – 4.369e+0467
4.369e+04 – 4.547e+0451
4.547e+04 – 4.725e+04161
4.725e+04 – 4.903e+04268
4.903e+04 – 5.081e+0429
5.081e+04 – 5.259e+04133
5.259e+04 – 5.436e+0494
5.436e+04 – 5.614e+0495
5.614e+04 – 5.792e+040
5.792e+04 – 5.97e+040
5.97e+04 – 6.148e+040
6.148e+04 – 6.326e+040
6.326e+04 – 6.504e+040
6.504e+04 – 6.682e+040
6.682e+04 – 6.86e+040
6.86e+04 – 7.037e+040
7.037e+04 – 7.215e+0478

county_name text identifier

This column holds fully-qualified US county names (e.g., 'X County, State'), with the token 'county,' appearing in 2999 of 3222 rows and state names like Texas (256), Virginia (189), and Georgia (159) topping the word frequencies. Every one of the 3222 values is unique with zero nulls or duplicates, and lengths cluster tightly between 16 and 31 characters (mean 24.3). The 223 rows missing the 'county,' token likely correspond to parishes (Louisiana), boroughs/census areas (Alaska), or independent cities, which an analyst should not treat as data quality issues.

Treatment: Split into county and state fields and left-join on a county FIPS lookup.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["county_name"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,222
len_min 16
len_max 59
len_mean 24.32
len_median 24
len_p95 31
word_mean 3.248
word_median 3
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,990
readability_flesch_mean 10.28
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 9.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 24.324022346368714).
charscount
16 – 1726
17 – 1872
18 – 19121
19 – 20190
20 – 21264
21 – 22407
22 – 24420
24 – 25363
25 – 26320
26 – 27240
27 – 28231
28 – 29152
29 – 30139
30 – 31165
31 – 3241
32 – 3328
33 – 3416
34 – 3510
35 – 365
36 – 380
38 – 391
39 – 401
40 – 410
41 – 421
42 – 431
43 – 440
44 – 452
45 – 460
46 – 471
47 – 481
48 – 490
49 – 500
50 – 510
51 – 530
53 – 542
54 – 551
55 – 560
56 – 570
57 – 580
58 – 591

total_renters numeric feature

This column reports a count of renters per record, ranging from 28 to 1,810,929 with a median of 2,579.5 and a mean of 13,851.1 — consistent with geographic or administrative aggregates rather than individual-level data. The distribution is severely right-skewed (skew 15.82, kurtosis 398.15) and 449 of 3,222 rows (14.0%) flag as outliers, with the std (55,351.6) dwarfing the IQR (6,392). No nulls or zeros are present, and 2,709 of 3,222 values are unique.

Treatment: Log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["total_renters"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2,709
min 28
max 1.811e+06
mean 1.385e+04
median 2580
std 5.535e+04
q1 1004
q3 7396
iqr 6,392
skew 15.82
kurtosis 398.2
n_outliers 449
outlier_rate 0.1394
zero_rate 0
alert: high_skewskew=+15.82
alert: outliers13.9% rows beyond 1.5 IQR
Fig 10.
Distribution of total_renters. Vertical dash marks the median.
Show data table
Histogram bins for total_renters (median: 2579.5).
bincount
28 – 4.53e+043019
4.53e+04 – 9.057e+04109
9.057e+04 – 1.358e+0538
1.358e+05 – 1.811e+0517
1.811e+05 – 2.264e+0511
2.264e+05 – 2.717e+059
2.717e+05 – 3.169e+055
3.169e+05 – 3.622e+050
3.622e+05 – 4.075e+052
4.075e+05 – 4.528e+052
4.528e+05 – 4.98e+053
4.98e+05 – 5.433e+051
5.433e+05 – 5.886e+051
5.886e+05 – 6.338e+051
6.338e+05 – 6.791e+050
6.791e+05 – 7.244e+051
7.244e+05 – 7.697e+051
7.697e+05 – 8.149e+050
8.149e+05 – 8.602e+050
8.602e+05 – 9.055e+051
9.055e+05 – 9.508e+050
9.508e+05 – 9.96e+050
9.96e+05 – 1.041e+060
1.041e+06 – 1.087e+060
1.087e+06 – 1.132e+060
1.132e+06 – 1.177e+060
1.177e+06 – 1.222e+060
1.222e+06 – 1.268e+060
1.268e+06 – 1.313e+060
1.313e+06 – 1.358e+060
1.358e+06 – 1.403e+060
1.403e+06 – 1.449e+060
1.449e+06 – 1.494e+060
1.494e+06 – 1.539e+060
1.539e+06 – 1.585e+060
1.585e+06 – 1.63e+060
1.63e+06 – 1.675e+060
1.675e+06 – 1.72e+060
1.72e+06 – 1.766e+060
1.766e+06 – 1.811e+061

pct_rent_burdened_30plus numeric feature

Percentage of renter households spending 30%+ of income on rent, reported per record (n=3222). Distribution is roughly centered with median 37.36 and IQR 30.67–43.48, mildly left-skewed (-0.57) and ranging 0 to 64.96, with 58 outliers (1.8%) and a small zero_rate of 0.25%. With 2146 unique values out of 3222, granularity is high but not near-unique.

Treatment: Use as-is as a numeric feature; no transform needed given near-symmetric, bounded percentage scale.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["pct_rent_burdened_30plus"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2,146
min 0
max 64.96
mean 36.44
median 37.36
std 10.01
q1 30.67
q3 43.48
iqr 12.81
skew -0.5673
kurtosis 0.5032
n_outliers 58
outlier_rate 0.018
zero_rate 0.002483
Fig 11.
Distribution of pct_rent_burdened_30plus. Vertical dash marks the median.
Show data table
Histogram bins for pct_rent_burdened_30plus (median: 37.36).
bincount
0 – 1.6249
1.624 – 3.2485
3.248 – 4.8723
4.872 – 6.4965
6.496 – 8.129
8.12 – 9.74413
9.744 – 11.3711
11.37 – 12.9916
12.99 – 14.6226
14.62 – 16.2419
16.24 – 17.8635
17.86 – 19.4943
19.49 – 21.1152
21.11 – 22.7452
22.74 – 24.3673
24.36 – 25.9899
25.98 – 27.61109
27.61 – 29.23116
29.23 – 30.86132
30.86 – 32.48159
32.48 – 34.1189
34.1 – 35.73209
35.73 – 37.35227
37.35 – 38.98239
38.98 – 40.6205
40.6 – 42.22209
42.22 – 43.85210
43.85 – 45.47190
45.47 – 47.1131
47.1 – 48.72114
48.72 – 50.34118
50.34 – 51.9769
51.97 – 53.5951
53.59 – 55.2234
55.22 – 56.8424
56.84 – 58.466
58.46 – 60.093
60.09 – 61.712
61.71 – 63.343
63.34 – 64.963

pct_rent_burdened_50plus numeric feature

Likely a county- or tract-level percentage of renter households spending 50%+ of income on rent (severely rent-burdened). Values span 0 to 64.96 with mean 17.35 and median 17.62, and the distribution is nearly symmetric (skew 0.05, kurtosis 0.98) with only 1.5% outliers. About 0.9% of rows are exactly zero and there are no nulls across 3,222 records.

Treatment: Use as-is in modelling; no transform needed given near-symmetric distribution.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["pct_rent_burdened_50plus"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,769
min 0
max 64.96
mean 17.35
median 17.62
std 6.577
q1 13.07
q3 21.63
iqr 8.557
skew 0.05436
kurtosis 0.9823
n_outliers 47
outlier_rate 0.01459
zero_rate 0.009311
Fig 12.
Distribution of pct_rent_burdened_50plus. Vertical dash marks the median.
Show data table
Histogram bins for pct_rent_burdened_50plus (median: 17.62).
bincount
0 – 1.62442
1.624 – 3.24827
3.248 – 4.87234
4.872 – 6.49663
6.496 – 8.12102
8.12 – 9.744148
9.744 – 11.37163
11.37 – 12.99214
12.99 – 14.62242
14.62 – 16.24310
16.24 – 17.86315
17.86 – 19.49332
19.49 – 21.11335
21.11 – 22.74264
22.74 – 24.36219
24.36 – 25.98150
25.98 – 27.6199
27.61 – 29.2364
29.23 – 30.8639
30.86 – 32.4820
32.48 – 34.121
34.1 – 35.739
35.73 – 37.352
37.35 – 38.983
38.98 – 40.61
40.6 – 42.221
42.22 – 43.851
43.85 – 45.470
45.47 – 47.11
47.1 – 48.720
48.72 – 50.340
50.34 – 51.970
51.97 – 53.590
53.59 – 55.220
55.22 – 56.840
56.84 – 58.460
58.46 – 60.090
60.09 – 61.710
61.71 – 63.340
63.34 – 64.961

median_gross_rent numeric feature

Numeric column capturing the median gross rent (presumably USD per month) across 3,222 rows with only 0.31% missing and no zeros. The distribution is right-skewed (skew 1.76, kurtosis 4.55) with median 818 and mean 890.9, and 225 values (7.0%) flagged as outliers stretching up to 2,805 against a Q3 of 978.

Treatment: Log-transform or winsorize before regression to tame the right-skew and high-rent outliers.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["median_gross_rent"].stats

statvalue
n3,222
nulls10 (0.3%)
unique983
min 297
max 2,805
mean 890.9
median 818
std 283.4
q1 718
q3 978
iqr 260
skew 1.763
kurtosis 4.55
n_outliers 225
outlier_rate 0.07005
zero_rate 0
alert: outliers7.0% rows beyond 1.5 IQR
Fig 13.
Distribution of median_gross_rent. Vertical dash marks the median.
Show data table
Histogram bins for median_gross_rent (median: 818.0).
bincount
297 – 359.75
359.7 – 422.414
422.4 – 485.132
485.1 – 547.869
547.8 – 610.5128
610.5 – 673.2242
673.2 – 735.9457
735.9 – 798.6515
798.6 – 861.3423
861.3 – 924306
924 – 986.7251
986.7 – 1049140
1049 – 1112105
1112 – 117598
1175 – 123879
1238 – 130071
1300 – 136352
1363 – 142648
1426 – 148826
1488 – 155122
1551 – 161433
1614 – 167613
1676 – 173919
1739 – 180210
1802 – 186413
1864 – 19278
1927 – 199011
1990 – 20536
2053 – 21154
2115 – 21783
2178 – 22414
2241 – 23031
2303 – 23661
2366 – 24290
2429 – 24921
2492 – 25540
2554 – 26170
2617 – 26800
2680 – 27421
2742 – 28051

median_household_income numeric feature

Median household income in dollars, almost certainly at a US county or similar geography given n=3222 and the typical 14525-170463 range. Distribution is right-skewed (skew 0.95, kurtosis 2.96) with 187 high-side outliers (5.8%) pulling the mean (62327) above the median (60461). Near-complete coverage with only a 0.03% null rate and no zeros.

Treatment: Log-transform before regression to tame the right skew and high-income outliers.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["median_household_income"].stats

statvalue
n3,222
nulls1 (0.0%)
unique3,098
min 14,525
max 170,463
mean 6.233e+04
median 60,461
std 1.777e+04
q1 51,823
q3 70,379
iqr 18,556
skew 0.9478
kurtosis 2.962
n_outliers 187
outlier_rate 0.05806
zero_rate 0
alert: outliers5.8% rows beyond 1.5 IQR
Fig 14.
Distribution of median_household_income. Vertical dash marks the median.
Show data table
Histogram bins for median_household_income (median: 60461.0).
bincount
1.452e+04 – 1.842e+0414
1.842e+04 – 2.232e+0430
2.232e+04 – 2.622e+0426
2.622e+04 – 3.012e+0410
3.012e+04 – 3.402e+0428
3.402e+04 – 3.792e+0452
3.792e+04 – 4.181e+04102
4.181e+04 – 4.571e+04154
4.571e+04 – 4.961e+04237
4.961e+04 – 5.351e+04286
5.351e+04 – 5.741e+04352
5.741e+04 – 6.131e+04400
6.131e+04 – 6.52e+04342
6.52e+04 – 6.91e+04294
6.91e+04 – 7.3e+04231
7.3e+04 – 7.69e+04150
7.69e+04 – 8.08e+04136
8.08e+04 – 8.47e+0499
8.47e+04 – 8.86e+0452
8.86e+04 – 9.249e+0443
9.249e+04 – 9.639e+0444
9.639e+04 – 1.003e+0526
1.003e+05 – 1.042e+0523
1.042e+05 – 1.081e+0523
1.081e+05 – 1.12e+0511
1.12e+05 – 1.159e+059
1.159e+05 – 1.198e+0512
1.198e+05 – 1.237e+0510
1.237e+05 – 1.276e+055
1.276e+05 – 1.315e+054
1.315e+05 – 1.354e+053
1.354e+05 – 1.393e+056
1.393e+05 – 1.432e+052
1.432e+05 – 1.471e+051
1.471e+05 – 1.51e+051
1.51e+05 – 1.549e+051
1.549e+05 – 1.588e+050
1.588e+05 – 1.627e+050
1.627e+05 – 1.666e+051
1.666e+05 – 1.705e+051

total_housing_units numeric feature

Counts of total housing units per record, almost certainly aggregated to a geography (likely US counties given n=3222). The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a median of 10,021 but a max of 3,363,093, and 443 rows (13.7%) flag as outliers — consistent with a few massive metros dwarfing thousands of small areas. No nulls or zeros, and 3,074 of 3,222 values are unique.

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["total_housing_units"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,074
min 32
max 3.363e+06
mean 3.94e+04
median 10,021
std 1.201e+05
q1 4211
q3 25,939
iqr 2.173e+04
skew 12.05
kurtosis 240.5
n_outliers 443
outlier_rate 0.1375
zero_rate 0
alert: high_skewskew=+12.05
alert: outliers13.7% rows beyond 1.5 IQR
Fig 15.
Distribution of total_housing_units. Vertical dash marks the median.
Show data table
Histogram bins for total_housing_units (median: 10021.0).
bincount
32 – 8.411e+042907
8.411e+04 – 1.682e+05153
1.682e+05 – 2.523e+0562
2.523e+05 – 3.363e+0538
3.363e+05 – 4.204e+0522
4.204e+05 – 5.045e+056
5.045e+05 – 5.886e+0511
5.886e+05 – 6.726e+055
6.726e+05 – 7.567e+055
7.567e+05 – 8.408e+053
8.408e+05 – 9.249e+051
9.249e+05 – 1.009e+063
1.009e+06 – 1.093e+061
1.093e+06 – 1.177e+061
1.177e+06 – 1.261e+060
1.261e+06 – 1.345e+060
1.345e+06 – 1.429e+060
1.429e+06 – 1.513e+060
1.513e+06 – 1.597e+060
1.597e+06 – 1.682e+061
1.682e+06 – 1.766e+061
1.766e+06 – 1.85e+060
1.85e+06 – 1.934e+060
1.934e+06 – 2.018e+060
2.018e+06 – 2.102e+061
2.102e+06 – 2.186e+060
2.186e+06 – 2.27e+060
2.27e+06 – 2.354e+060
2.354e+06 – 2.438e+060
2.438e+06 – 2.522e+060
2.522e+06 – 2.606e+060
2.606e+06 – 2.69e+060
2.69e+06 – 2.775e+060
2.775e+06 – 2.859e+060
2.859e+06 – 2.943e+060
2.943e+06 – 3.027e+060
3.027e+06 – 3.111e+060
3.111e+06 – 3.195e+060
3.195e+06 – 3.279e+060
3.279e+06 – 3.363e+061

owner_occupied numeric feature

Likely a count of owner-occupied housing units per geographic area, with 3001 unique values across 3222 rows and effectively no zeros (zero_rate 0.0003) or nulls. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9): median is 7325.5 but the mean is 25551.7 and the max reaches 1,552,164, producing 429 outliers (13.3%).

Treatment: Log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["owner_occupied"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3,001
min 0
max 1.552e+06
mean 2.555e+04
median 7326
std 6.755e+04
q1 3148
q3 1.886e+04
iqr 1.572e+04
skew 9.516
kurtosis 146.9
n_outliers 429
outlier_rate 0.1331
zero_rate 0.0003104
alert: high_skewskew=+9.52
alert: outliers13.3% rows beyond 1.5 IQR
Fig 16.
Distribution of owner_occupied. Vertical dash marks the median.
Show data table
Histogram bins for owner_occupied (median: 7325.5).
bincount
0 – 3.88e+042761
3.88e+04 – 7.761e+04225
7.761e+04 – 1.164e+0578
1.164e+05 – 1.552e+0552
1.552e+05 – 1.94e+0536
1.94e+05 – 2.328e+0520
2.328e+05 – 2.716e+0510
2.716e+05 – 3.104e+0510
3.104e+05 – 3.492e+056
3.492e+05 – 3.88e+056
3.88e+05 – 4.268e+053
4.268e+05 – 4.656e+053
4.656e+05 – 5.045e+054
5.045e+05 – 5.433e+052
5.433e+05 – 5.821e+050
5.821e+05 – 6.209e+051
6.209e+05 – 6.597e+051
6.597e+05 – 6.985e+050
6.985e+05 – 7.373e+050
7.373e+05 – 7.761e+050
7.761e+05 – 8.149e+050
8.149e+05 – 8.537e+050
8.537e+05 – 8.925e+050
8.925e+05 – 9.313e+051
9.313e+05 – 9.701e+050
9.701e+05 – 1.009e+060
1.009e+06 – 1.048e+060
1.048e+06 – 1.087e+061
1.087e+06 – 1.125e+060
1.125e+06 – 1.164e+060
1.164e+06 – 1.203e+061
1.203e+06 – 1.242e+060
1.242e+06 – 1.281e+060
1.281e+06 – 1.319e+060
1.319e+06 – 1.358e+060
1.358e+06 – 1.397e+060
1.397e+06 – 1.436e+060
1.436e+06 – 1.475e+060
1.475e+06 – 1.513e+060
1.513e+06 – 1.552e+061

renter_occupied numeric feature

Counts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of just 2,579.5. The distribution is severely right-skewed (skew 15.82, kurtosis 398.15) with 449 outliers (14% of rows), consistent with a few very large geographies dominating an otherwise small-county distribution. No nulls or zeros, and 2,709 unique values across 3,222 rows suggest county- or tract-level granularity.

Treatment: log-transform before regression to tame the extreme right skew.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["renter_occupied"].stats

statvalue
n3,222
nulls0 (0.0%)
unique2,709
min 28
max 1.811e+06
mean 1.385e+04
median 2580
std 5.535e+04
q1 1004
q3 7396
iqr 6,392
skew 15.82
kurtosis 398.2
n_outliers 449
outlier_rate 0.1394
zero_rate 0
alert: high_skewskew=+15.82
alert: outliers13.9% rows beyond 1.5 IQR
Fig 17.
Distribution of renter_occupied. Vertical dash marks the median.
Show data table
Histogram bins for renter_occupied (median: 2579.5).
bincount
28 – 4.53e+043019
4.53e+04 – 9.057e+04109
9.057e+04 – 1.358e+0538
1.358e+05 – 1.811e+0517
1.811e+05 – 2.264e+0511
2.264e+05 – 2.717e+059
2.717e+05 – 3.169e+055
3.169e+05 – 3.622e+050
3.622e+05 – 4.075e+052
4.075e+05 – 4.528e+052
4.528e+05 – 4.98e+053
4.98e+05 – 5.433e+051
5.433e+05 – 5.886e+051
5.886e+05 – 6.338e+051
6.338e+05 – 6.791e+050
6.791e+05 – 7.244e+051
7.244e+05 – 7.697e+051
7.697e+05 – 8.149e+050
8.149e+05 – 8.602e+050
8.602e+05 – 9.055e+051
9.055e+05 – 9.508e+050
9.508e+05 – 9.96e+050
9.96e+05 – 1.041e+060
1.041e+06 – 1.087e+060
1.087e+06 – 1.132e+060
1.132e+06 – 1.177e+060
1.177e+06 – 1.222e+060
1.222e+06 – 1.268e+060
1.268e+06 – 1.313e+060
1.313e+06 – 1.358e+060
1.358e+06 – 1.403e+060
1.403e+06 – 1.449e+060
1.449e+06 – 1.494e+060
1.494e+06 – 1.539e+060
1.539e+06 – 1.585e+060
1.585e+06 – 1.63e+060
1.63e+06 – 1.675e+060
1.675e+06 – 1.72e+060
1.72e+06 – 1.766e+060
1.766e+06 – 1.811e+061

pct_renter numeric feature

Percentage of renter-occupied housing units across 3,222 records, ranging from 3.01 to 100.0 with a mean of 27.35 and median of 26.07. The distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 high-side outliers (2.7%); the 100.0 maximum stands out against a Q3 of 31.66 and suggests a few all-renter localities.

Treatment: Use as-is or apply a mild transform (e.g., logit on the 0-100 scale) before linear models given the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["pct_renter"].stats

statvalue
n3,222
nulls0 (0.0%)
unique1,925
min 3.01
max 100
mean 27.35
median 26.07
std 8.564
q1 21.64
q3 31.66
iqr 10.02
skew 1.317
kurtosis 4.412
n_outliers 88
outlier_rate 0.02731
zero_rate 0
Fig 18.
Distribution of pct_renter. Vertical dash marks the median.
Show data table
Histogram bins for pct_renter (median: 26.07).
bincount
3.01 – 5.4351
5.435 – 7.8593
7.859 – 10.289
10.28 – 12.7126
12.71 – 15.1363
15.13 – 17.56156
17.56 – 19.98316
19.98 – 22.41371
22.41 – 24.83450
24.83 – 27.26419
27.26 – 29.68357
29.68 – 32.11301
32.11 – 34.53203
34.53 – 36.96169
36.96 – 39.38115
39.38 – 41.8175
41.81 – 44.2356
44.23 – 46.6645
46.66 – 49.0825
49.08 – 51.515
51.5 – 53.9311
53.93 – 56.3510
56.35 – 58.788
58.78 – 61.24
61.2 – 63.634
63.63 – 66.051
66.05 – 68.481
68.48 – 70.93
70.9 – 73.331
73.33 – 75.751
75.75 – 78.180
78.18 – 80.61
80.6 – 83.030
83.03 – 85.451
85.45 – 87.880
87.88 – 90.30
90.3 – 92.730
92.73 – 95.150
95.15 – 97.580
97.58 – 1001

annual_rent numeric feature

Likely an annual rent figure in currency units, with 3222 records and 983 distinct values ranging from 3564 to 33660 and a median of 9816. The distribution is right-skewed (skew 1.76, kurtosis 4.55) and 225 rows (7.0%) sit beyond the outlier fences, suggesting a long tail of high-rent cases above the Q3 of 11736. Nulls are negligible (0.31%) and there are no zero values.

Treatment: Log-transform before regression to dampen the right-skew and high-rent outliers.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["annual_rent"].stats

statvalue
n3,222
nulls10 (0.3%)
unique983
min 3,564
max 33,660
mean 1.069e+04
median 9,816
std 3400
q1 8,616
q3 11,736
iqr 3,120
skew 1.763
kurtosis 4.55
n_outliers 225
outlier_rate 0.07005
zero_rate 0
alert: outliers7.0% rows beyond 1.5 IQR
Fig 19.
Distribution of annual_rent. Vertical dash marks the median.
Show data table
Histogram bins for annual_rent (median: 9816.0).
bincount
3564 – 43165
4316 – 506914
5069 – 582132
5821 – 657469
6574 – 7326128
7326 – 8078242
8078 – 8831457
8831 – 9583515
9583 – 1.034e+04423
1.034e+04 – 1.109e+04306
1.109e+04 – 1.184e+04251
1.184e+04 – 1.259e+04140
1.259e+04 – 1.335e+04105
1.335e+04 – 1.41e+0498
1.41e+04 – 1.485e+0479
1.485e+04 – 1.56e+0471
1.56e+04 – 1.635e+0452
1.635e+04 – 1.711e+0448
1.711e+04 – 1.786e+0426
1.786e+04 – 1.861e+0422
1.861e+04 – 1.936e+0433
1.936e+04 – 2.012e+0413
2.012e+04 – 2.087e+0419
2.087e+04 – 2.162e+0410
2.162e+04 – 2.237e+0413
2.237e+04 – 2.313e+048
2.313e+04 – 2.388e+0411
2.388e+04 – 2.463e+046
2.463e+04 – 2.538e+044
2.538e+04 – 2.614e+043
2.614e+04 – 2.689e+044
2.689e+04 – 2.764e+041
2.764e+04 – 2.839e+041
2.839e+04 – 2.915e+040
2.915e+04 – 2.99e+041
2.99e+04 – 3.065e+040
3.065e+04 – 3.14e+040
3.14e+04 – 3.216e+040
3.216e+04 – 3.291e+041
3.291e+04 – 3.366e+041

rent_to_income_ratio numeric feature

Likely a rent-to-income ratio expressed as a percentage, with a tight interquartile band between 15.1 and 19.39 and median 17.06. The distribution is severely contaminated: skew of 53.98 and kurtosis of 3007 are driven by a max of 1200.0 against a mean of 17.89, and 107 outliers (3.33%) sit far outside the IQR of 4.29. Nulls are negligible at 0.28% and there are no zeros, but the extreme tail suggests data-entry errors or unit inconsistencies.

Treatment: Winsorize or cap extreme values and log-transform before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["rent_to_income_ratio"].stats

statvalue
n3,222
nulls9 (0.3%)
unique1,269
min 6.1
max 1,200
mean 17.89
median 17.06
std 21.2
q1 15.1
q3 19.39
iqr 4.29
skew 53.98
kurtosis 3007
n_outliers 107
outlier_rate 0.0333
zero_rate 0
alert: high_skewskew=+53.98
Fig 20.
Distribution of rent_to_income_ratio. Vertical dash marks the median.
Show data table
Histogram bins for rent_to_income_ratio (median: 17.06).
bincount
6.1 – 35.953207
35.95 – 65.85
65.8 – 95.640
95.64 – 125.50
125.5 – 155.30
155.3 – 185.20
185.2 – 2150
215 – 244.90
244.9 – 274.70
274.7 – 304.60
304.6 – 334.40
334.4 – 364.30
364.3 – 394.10
394.1 – 4240
424 – 453.80
453.8 – 483.70
483.7 – 513.50
513.5 – 543.40
543.4 – 573.20
573.2 – 603.10
603.1 – 632.90
632.9 – 662.70
662.7 – 692.60
692.6 – 722.40
722.4 – 752.30
752.3 – 782.10
782.1 – 8120
812 – 841.80
841.8 – 871.70
871.7 – 901.50
901.5 – 931.40
931.4 – 961.20
961.2 – 991.10
991.1 – 10210
1021 – 10510
1051 – 10810
1081 – 11100
1110 – 11400
1140 – 11700
1170 – 12001

affordability_category categorical label

A 3-level categorical flag bucketing rows into housing affordability tiers. The distribution is extremely degenerate: 'Affordable' covers 3192 of 3222 rows (top_rate 0.9907), 'Moderately Burdened' has 29, and 'Extremely Burdened' has just 1, yielding an entropy_ratio of 0.049. As a predictor it carries almost no information, and the single 'Extremely Burdened' row will not survive any train/test split.

Treatment: Collapse to a binary Affordable vs. Burdened flag or drop; near-constant as-is.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["affordability_category"].stats

statvalue
n3,222
nulls0 (0.0%)
unique3
top_value Affordable
top_rate 0.9907
cardinality 3
entropy 0.07815
entropy_ratio 0.04931
alert: imbalancetop value is 99.1% of rows
Fig 21.
Top values for affordability_category.
Show data table
Top values for affordability_category (3 unique shown, of 3 total).
valuecountshare
Affordable319299.1%
Moderately Burdened290.9%
Extremely Burdened10.0%

hours_at_min_wage_for_rent numeric feature

This column reports the number of minimum-wage work hours required to afford rent, with values ranging from 41 to 387 (median 113, mean 122.9). The distribution is right-skewed (skew 1.76, kurtosis 4.55) and 222 rows (6.9%) flag as outliers in the upper tail, suggesting a subset of high-cost areas where rent demands far more hours than typical. Nulls are negligible (0.31%) and there are no zeros, so coverage is essentially complete.

Treatment: Log-transform or winsorize before regression to dampen the right-tail outliers.

anthropic:claude-opus-4-7 · confidence high
Out[55]:

saturn.columns["hours_at_min_wage_for_rent"].stats

statvalue
n3,222
nulls10 (0.3%)
unique229
min 41
max 387
mean 122.9
median 113
std 39.09
q1 99
q3 135
iqr 36
skew 1.763
kurtosis 4.546
n_outliers 222
outlier_rate 0.06912
zero_rate 0
alert: outliers6.9% rows beyond 1.5 IQR
Fig 22.
Distribution of hours_at_min_wage_for_rent. Vertical dash marks the median.
Show data table
Histogram bins for hours_at_min_wage_for_rent (median: 113.0).
bincount
41 – 49.655
49.65 – 58.314
58.3 – 66.9529
66.95 – 75.672
75.6 – 84.25132
84.25 – 92.9232
92.9 – 101.6463
101.6 – 110.2536
110.2 – 118.9392
118.9 – 127.5319
127.5 – 136.2251
136.2 – 144.8131
144.8 – 153.4111
153.4 – 162.1103
162.1 – 170.874
170.8 – 179.473
179.4 – 188.153
188.1 – 196.744
196.7 – 205.327
205.3 – 21420
214 – 222.735
222.7 – 231.314
231.3 – 24017
240 – 248.611
248.6 – 257.213
257.2 – 265.98
265.9 – 274.611
274.6 – 283.26
283.2 – 291.94
291.9 – 300.53
300.5 – 309.24
309.2 – 317.81
317.8 – 326.41
326.4 – 335.10
335.1 – 343.81
343.8 – 352.40
352.4 – 361.10
361.1 – 369.70
369.7 – 378.41
378.4 – 3871

weeks_at_min_wage_for_rent numeric feature

This column reports the number of weeks of minimum-wage work needed to cover rent, ranging from 1.0 to 9.7 with a median of 2.8 and IQR of 0.9. The distribution is right-skewed (skew 1.76, kurtosis 4.57) and 222 rows (6.9%) flag as outliers on the high end, pointing to localities where rent dramatically outpaces minimum wage. Nulls are negligible (0.31%) and only 71 unique values appear across 3222 rows, suggesting rounded or coarsely binned figures.

Treatment: Log-transform or winsorize before regression to dampen the right tail.

anthropic:claude-opus-4-7 · confidence high
Out[58]:

saturn.columns["weeks_at_min_wage_for_rent"].stats

statvalue
n3,222
nulls10 (0.3%)
unique71
min 1
max 9.7
mean 3.072
median 2.8
std 0.9775
q1 2.5
q3 3.4
iqr 0.9
skew 1.763
kurtosis 4.567
n_outliers 222
outlier_rate 0.06912
zero_rate 0
alert: outliers6.9% rows beyond 1.5 IQR
Fig 23.
Distribution of weeks_at_min_wage_for_rent. Vertical dash marks the median.
Show data table
Histogram bins for weeks_at_min_wage_for_rent (median: 2.8).
bincount
1 – 1.2185
1.218 – 1.43514
1.435 – 1.65229
1.652 – 1.8761
1.87 – 2.087107
2.087 – 2.305294
2.305 – 2.522437
2.522 – 2.74483
2.74 – 2.957408
2.957 – 3.175298
3.175 – 3.392230
3.392 – 3.61240
3.61 – 3.82795
3.827 – 4.04589
4.045 – 4.26274
4.262 – 4.4868
4.48 – 4.69748
4.697 – 4.91556
4.915 – 5.13225
5.132 – 5.3520
5.35 – 5.56734
5.567 – 5.78511
5.785 – 6.00223
6.002 – 6.2213
6.22 – 6.43712
6.437 – 6.6555
6.655 – 6.87211
6.872 – 7.095
7.09 – 7.3075
7.307 – 7.5253
7.525 – 7.7424
7.742 – 7.961
7.96 – 8.1771
8.177 – 8.3950
8.395 – 8.6121
8.612 – 8.830
8.83 – 9.0470
9.047 – 9.2650
9.265 – 9.4821
9.482 – 9.71

How to cite

click to copy

BibTeX
@misc{saturn-housing-housing-crisis-merged-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: housing housing crisis merged},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/housing-housing_crisis_merged}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: housing housing crisis merged. Source: /home/coolhand/datasets/us-inequality-atlas/housing/housing_crisis_merged.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/housing-housing_crisis_merged