saturn·

nyc housing nyc housing metrics merged

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv

Saturn profiled 2,327 rows across 23 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv",
    "--findings", "nyc_housing-nyc_housing_metrics_merged.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset covers 2,327 NYC census tracts with 23 columns describing housing tenure, rent burden, income, and rent levels across the five boroughs. The most urgent issue is data hygiene: median_gross_rent and median_household_income both contain a sentinel value of -666666666, which drags their means to roughly -41.5M and -36M respectively despite sensible medians (~$1,735 rent, ~$76,833 income) — these need to be filtered before any analysis. Beyond that, the substantive story is rent burden: pct_rent_burdened has a median of 50% with an IQR of 40.9–58.8, meaning half of NYC tracts have a majority of renters paying 30%+ of income on rent. Brooklyn (Kings) dominates the tract count at 35%, followed by Queens (31%) and the Bronx (15%), so any borough-level comparison should weight accordingly. The state column is constant (all 36, New York) and can be dropped.

citing: median_gross_rent · median_household_income · pct_rent_burdened · pct_severe_burden · pct_owner_occupied · county_name · state · total_households

Out[4]:

saturn.schema() · 23 columns

column kind n null% unique alerts
total_renter_households numeric 2,327 0.0% 1,418
rent_30_to_34_9_pct numeric 2,327 0.0% 355 high_skew outliers
rent_35_to_39_9_pct numeric 2,327 0.0% 270 high_skew
rent_40_to_49_9_pct numeric 2,327 0.0% 322 high_skew
rent_50_pct_or_more numeric 2,327 0.0% 706
NAME text 2,327 0.0% 2,327 near_unique
state numeric 2,327 0.0% 1 constant
county numeric 2,327 0.0% 5
tract numeric 2,327 0.0% 1,530 high_skew
county_name categorical 2,327 0.0% 5
moderate_burden numeric 2,327 0.0% 639
severe_burden numeric 2,327 0.0% 706
pct_moderate_burden numeric 2,327 4.4% 461
pct_severe_burden numeric 2,327 4.4% 518
rent_burdened numeric 2,327 0.0% 1,013
pct_rent_burdened numeric 2,327 4.4% 596
median_gross_rent numeric 2,327 0.0% 1,232 high_skew outliers
median_household_income numeric 2,327 0.0% 2,106 high_skew outliers
total_households numeric 2,327 0.0% 1,495
owner_occupied numeric 2,327 0.0% 1,001 outliers
renter_occupied numeric 2,327 0.0% 1,418
pct_owner_occupied numeric 2,327 4.1% 823
pct_renter_occupied numeric 2,327 4.1% 823
Fig 1.
county_name · Tract counts by borough — Brooklyn and Queens dominate the sample.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%
Fig 2.
pct_rent_burdened · Share of renters paying 30%+ of income; median is 50%, showing widespread burden.
Show data table
Histogram bins for pct_rent_burdened (median: 50.0).
bincount
0 – 2.58
2.5 – 53
5 – 7.51
7.5 – 105
10 – 12.57
12.5 – 158
15 – 17.512
17.5 – 2014
20 – 22.514
22.5 – 2524
25 – 27.535
27.5 – 3042
30 – 32.553
32.5 – 3580
35 – 37.591
37.5 – 40119
40 – 42.5129
42.5 – 45144
45 – 47.5146
47.5 – 50177
50 – 52.5139
52.5 – 55178
55 – 57.5162
57.5 – 60131
60 – 62.5117
62.5 – 6597
65 – 67.560
67.5 – 7057
70 – 72.554
72.5 – 7528
75 – 77.520
77.5 – 8025
80 – 82.58
82.5 – 854
85 – 87.512
87.5 – 905
90 – 92.55
92.5 – 953
95 – 97.50
97.5 – 1008
Fig 3.
pct_severe_burden · Severe burden (50%+ of income on rent) — see how thick the right tail runs.
Show data table
Histogram bins for pct_severe_burden (median: 26.2).
bincount
0 – 2.545
2.5 – 514
5 – 7.541
7.5 – 1053
10 – 12.594
12.5 – 15115
15 – 17.5131
17.5 – 20160
20 – 22.5170
22.5 – 25188
25 – 27.5188
27.5 – 30168
30 – 32.5173
32.5 – 35157
35 – 37.5115
37.5 – 4097
40 – 42.573
42.5 – 4562
45 – 47.544
47.5 – 5035
50 – 52.529
52.5 – 5519
55 – 57.518
57.5 – 6012
60 – 62.56
62.5 – 654
65 – 67.54
67.5 – 702
70 – 72.51
72.5 – 751
75 – 77.51
77.5 – 801
80 – 82.50
82.5 – 851
85 – 87.51
87.5 – 900
90 – 92.51
92.5 – 950
95 – 97.50
97.5 – 1001
Fig 4.
pct_owner_occupied · Owner-occupancy share varies widely (IQR 16–56%); useful for segmenting tracts.
Show data table
Histogram bins for pct_owner_occupied (median: 34.4).
bincount
0 – 2.5141
2.5 – 586
5 – 7.571
7.5 – 1063
10 – 12.586
12.5 – 1565
15 – 17.572
17.5 – 2088
20 – 22.598
22.5 – 2588
25 – 27.579
27.5 – 3067
30 – 32.570
32.5 – 3556
35 – 37.576
37.5 – 4068
40 – 42.559
42.5 – 4572
45 – 47.558
47.5 – 5054
50 – 52.570
52.5 – 5559
55 – 57.555
57.5 – 6039
60 – 62.544
62.5 – 6540
65 – 67.552
67.5 – 7034
70 – 72.540
72.5 – 7547
75 – 77.532
77.5 – 8048
80 – 82.531
82.5 – 8527
85 – 87.526
87.5 – 9024
90 – 92.523
92.5 – 958
95 – 97.56
97.5 – 1009
Fig 5.
total_households · Tract size distribution — right-skewed, so per-capita rates beat raw counts.
Show data table
Histogram bins for total_households (median: 1252.0).
bincount
0 – 205.2123
205.2 – 410.441
410.4 – 615.7203
615.7 – 820.9272
820.9 – 1026269
1026 – 1231237
1231 – 1437215
1437 – 1642221
1642 – 1847162
1847 – 2052134
2052 – 225794
2257 – 2463101
2463 – 266866
2668 – 287339
2873 – 307835
3078 – 328424
3284 – 348922
3489 – 36948
3694 – 38997
3899 – 41049
4104 – 431013
4310 – 45159
4515 – 47205
4720 – 49255
4925 – 51313
5131 – 53362
5336 – 55412
5541 – 57460
5746 – 59521
5952 – 61571
6157 – 63620
6362 – 65670
6567 – 67721
6772 – 69782
6978 – 71830
7183 – 73880
7388 – 75930
7593 – 77990
7799 – 80040
8004 – 82091
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
total_renter_householdsnumeric0.0%
rent_30_to_34_9_pctnumeric0.0%
rent_35_to_39_9_pctnumeric0.0%
rent_40_to_49_9_pctnumeric0.0%
rent_50_pct_or_morenumeric0.0%
NAMEtext0.0%
statenumeric0.0%
countynumeric0.0%
tractnumeric0.0%
county_namecategorical0.0%
moderate_burdennumeric0.0%
severe_burdennumeric0.0%
pct_moderate_burdennumeric4.4%
pct_severe_burdennumeric4.4%
rent_burdenednumeric0.0%
pct_rent_burdenednumeric4.4%
median_gross_rentnumeric0.0%
median_household_incomenumeric0.0%
total_householdsnumeric0.0%
owner_occupiednumeric0.0%
renter_occupiednumeric0.0%
pct_owner_occupiednumeric4.1%
pct_renter_occupiednumeric4.1%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
total_renter_householdsrent_30_to_34_9_pctrent_35_to_39_9_pctrent_40_to_49_9_pctrent_50_pct_or_morestatecountytractmoderate_burdensevere_burdenpct_moderate_burdenpct_severe_burden
total_renter_households+1.00+0.76+0.73+0.76+0.84+nan-0.18-0.23+0.89+0.84-0.03+0.07
rent_30_to_34_9_pct+0.76+1.00+0.55+0.57+0.56+nan-0.14-0.18+0.87+0.56-0.04+0.09
rent_35_to_39_9_pct+0.73+0.55+1.00+0.60+0.61+nan-0.13-0.12+0.81+0.61-0.01+0.04
rent_40_to_49_9_pct+0.76+0.57+0.60+1.00+0.62+nan-0.15-0.15+0.85+0.62-0.02+0.03
rent_50_pct_or_more+0.84+0.56+0.61+0.62+1.00+nan-0.32-0.21+0.70+1.00+0.03+0.11
state+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
county-0.18-0.14-0.13-0.15-0.32+nan+1.00+0.18-0.16-0.32-0.14-0.07
tract-0.23-0.18-0.12-0.15-0.21+nan+0.18+1.00-0.18-0.21-0.04-0.04
moderate_burden+0.89+0.87+0.81+0.85+0.70+nan-0.16-0.18+1.00+0.70-0.03+0.06
severe_burden+0.84+0.56+0.61+0.62+1.00+nan-0.32-0.21+0.70+1.00+0.03+0.11
pct_moderate_burden-0.03-0.04-0.01-0.02+0.03+nan-0.14-0.04-0.03+0.03+1.00-0.26
pct_severe_burden+0.07+0.09+0.04+0.03+0.11+nan-0.07-0.04+0.06+0.11-0.26+1.00

total_renter_households numeric feature

This column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end and 4.38% zero values. No nulls, and 1418 unique values across 2327 rows suggests aggregation at a geographic or administrative unit.

Treatment: Log-transform before regression to tame the right skew, and decide whether zero-count rows should be modelled separately.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["total_renter_households"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,418
min 0
max 8,209
mean 946.1
median 726
std 815.4
q1 346
q3 1,357
iqr 1,011
skew 1.595
kurtosis 4.627
n_outliers 69
outlier_rate 0.02965
zero_rate 0.04383
Fig 8.
Distribution of total_renter_households. Vertical dash marks the median.
Show data table
Histogram bins for total_renter_households (median: 726.0).
bincount
0 – 205.2349
205.2 – 410.4358
410.4 – 615.7292
615.7 – 820.9268
820.9 – 1026207
1026 – 1231175
1231 – 1437168
1437 – 1642110
1642 – 1847100
1847 – 205268
2052 – 225763
2257 – 246342
2463 – 266836
2668 – 287322
2873 – 307819
3078 – 328417
3284 – 34896
3489 – 36945
3694 – 38994
3899 – 41046
4104 – 43105
4310 – 45153
4515 – 47201
4720 – 49250
4925 – 51311
5131 – 53361
5336 – 55410
5541 – 57460
5746 – 59520
5952 – 61570
6157 – 63620
6362 – 65670
6567 – 67720
6772 – 69780
6978 – 71830
7183 – 73880
7388 – 75930
7593 – 77990
7799 – 80040
8004 – 82091

rent_30_to_34_9_pct numeric feature

Likely a count of households paying 30%-34.9% of income on rent within some geographic unit, given the integer-like values, zero floor, and max of 1205. The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with a median of 51 against a mean of 83.05, and 16.2% of rows are exactly zero. 124 outliers (5.33%) extend far above the Q3 of 116, consistent with a few large areas dominating.

Treatment: log1p-transform before modelling to tame the skew and zero inflation.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["rent_30_to_34_9_pct"].stats

statvalue
n2,327
nulls0 (0.0%)
unique355
min 0
max 1,205
mean 83.05
median 51
std 100.3
q1 15
q3 116
iqr 101
skew 2.755
kurtosis 13.86
n_outliers 124
outlier_rate 0.05329
zero_rate 0.1616
alert: high_skewskew=+2.76
alert: outliers5.3% rows beyond 1.5 IQR
Fig 9.
Distribution of rent_30_to_34_9_pct. Vertical dash marks the median.
Show data table
Histogram bins for rent_30_to_34_9_pct (median: 51.0).
bincount
0 – 30.12836
30.12 – 60.25444
60.25 – 90.38275
90.38 – 120.5217
120.5 – 150.6152
150.6 – 180.8105
180.8 – 210.990
210.9 – 24148
241 – 271.141
271.1 – 301.225
301.2 – 331.417
331.4 – 361.521
361.5 – 391.615
391.6 – 421.89
421.8 – 451.911
451.9 – 4825
482 – 512.12
512.1 – 542.22
542.2 – 572.44
572.4 – 602.51
602.5 – 632.60
632.6 – 662.81
662.8 – 692.91
692.9 – 7230
723 – 753.11
753.1 – 783.22
783.2 – 813.40
813.4 – 843.50
843.5 – 873.61
873.6 – 903.80
903.8 – 933.90
933.9 – 9640
964 – 994.10
994.1 – 10240
1024 – 10540
1054 – 10840
1084 – 11150
1115 – 11450
1145 – 11750
1175 – 12051

rent_35_to_39_9_pct numeric feature

Likely a count of households (or housing units) paying 35% to 39.9% of income on rent within some geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with a median of 35 but a max of 633, and nearly 20% of rows are zero (zero_rate 0.196), suggesting many small areas have no households in this rent burden bracket. 110 outliers (4.7%) sit well above the Q3 of 83.

Treatment: Log1p-transform before regression to tame the right skew and zero inflation.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["rent_35_to_39_9_pct"].stats

statvalue
n2,327
nulls0 (0.0%)
unique270
min 0
max 633
mean 58.35
median 35
std 69.85
q1 10
q3 83
iqr 73
skew 2.395
kurtosis 9.275
n_outliers 110
outlier_rate 0.04727
zero_rate 0.1964
alert: high_skewskew=+2.40
Fig 10.
Distribution of rent_35_to_39_9_pct. Vertical dash marks the median.
Show data table
Histogram bins for rent_35_to_39_9_pct (median: 35.0).
bincount
0 – 15.82719
15.82 – 31.65371
31.65 – 47.47261
47.47 – 63.3212
63.3 – 79.12163
79.12 – 94.95103
94.95 – 110.897
110.8 – 126.677
126.6 – 142.477
142.4 – 158.258
158.2 – 174.132
174.1 – 189.941
189.9 – 205.720
205.7 – 221.518
221.5 – 237.415
237.4 – 253.210
253.2 – 26914
269 – 284.85
284.8 – 300.73
300.7 – 316.58
316.5 – 332.36
332.3 – 348.12
348.1 – 3642
364 – 379.82
379.8 – 395.61
395.6 – 411.41
411.4 – 427.32
427.3 – 443.12
443.1 – 458.90
458.9 – 474.80
474.8 – 490.61
490.6 – 506.40
506.4 – 522.20
522.2 – 5380
538 – 553.90
553.9 – 569.72
569.7 – 585.50
585.5 – 601.41
601.4 – 617.20
617.2 – 6331

rent_40_to_49_9_pct numeric feature

Likely a count of housing units paying rent in the 40-49.9% income bracket per geographic area. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740 and 111 outliers (4.77%), and 15.6% of rows are zero — consistent with small geographies sitting alongside dense ones.

Treatment: log1p-transform before modelling to tame the right skew and zero mass.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["rent_40_to_49_9_pct"].stats

statvalue
n2,327
nulls0 (0.0%)
unique322
min 0
max 740
mean 74.68
median 49
std 83.79
q1 14
q3 106
iqr 92
skew 2.137
kurtosis 7.139
n_outliers 111
outlier_rate 0.0477
zero_rate 0.1556
alert: high_skewskew=+2.14
Fig 11.
Distribution of rent_40_to_49_9_pct. Vertical dash marks the median.
Show data table
Histogram bins for rent_40_to_49_9_pct (median: 49.0).
bincount
0 – 18.5671
18.5 – 37306
37 – 55.5270
55.5 – 74201
74 – 92.5196
92.5 – 111134
111 – 129.5102
129.5 – 14886
148 – 166.578
166.5 – 18563
185 – 203.539
203.5 – 22238
222 – 240.526
240.5 – 25927
259 – 277.516
277.5 – 29616
296 – 314.513
314.5 – 33311
333 – 351.55
351.5 – 3703
370 – 388.55
388.5 – 4073
407 – 425.54
425.5 – 4441
444 – 462.54
462.5 – 4810
481 – 499.51
499.5 – 5182
518 – 536.50
536.5 – 5550
555 – 573.52
573.5 – 5920
592 – 610.51
610.5 – 6291
629 – 647.51
647.5 – 6660
666 – 684.50
684.5 – 7030
703 – 721.50
721.5 – 7401

rent_50_pct_or_more numeric feature

Counts of households spending 50% or more of income on rent, aggregated per geographic unit across 2327 rows with no nulls. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with a median of 184 well below the mean of 253.18 and a max of 1918, and 6.27% of rows are zero. About 3.74% of values fall outside the Tukey fence.

Treatment: Log1p-transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["rent_50_pct_or_more"].stats

statvalue
n2,327
nulls0 (0.0%)
unique706
min 0
max 1,918
mean 253.2
median 184
std 236.6
q1 82
q3 360
iqr 278
skew 1.603
kurtosis 3.435
n_outliers 87
outlier_rate 0.03739
zero_rate 0.06274
Fig 12.
Distribution of rent_50_pct_or_more. Vertical dash marks the median.
Show data table
Histogram bins for rent_50_pct_or_more (median: 184.0).
bincount
0 – 47.95368
47.95 – 95.9293
95.9 – 143.9290
143.9 – 191.8249
191.8 – 239.8186
239.8 – 287.7175
287.7 – 335.7122
335.7 – 383.6114
383.6 – 431.6101
431.6 – 479.583
479.5 – 527.562
527.5 – 575.445
575.4 – 623.448
623.4 – 671.331
671.3 – 719.241
719.2 – 767.228
767.2 – 815.215
815.2 – 863.112
863.1 – 911.114
911.1 – 9597
959 – 10079
1007 – 10558
1055 – 11039
1103 – 11513
1151 – 11994
1199 – 12475
1247 – 12951
1295 – 13432
1343 – 13910
1391 – 14380
1438 – 14860
1486 – 15340
1534 – 15820
1582 – 16301
1630 – 16780
1678 – 17260
1726 – 17740
1774 – 18220
1822 – 18700
1870 – 19181

NAME text identifier

This column holds fully-qualified Census Tract names for New York City, every one of 2327 rows unique with zero nulls and tightly bounded length (38-46 chars, median 41). The vocabulary is formulaic: 'new', 'york', 'census', 'tract', 'county;' appear in essentially every row, with the borough split dominated by Kings (805), Queens (725), Bronx (361), and Richmond (126). Because each value is a one-to-one tract label, it functions as a geographic key rather than a modelling feature.

Treatment: Treat as a tract-level key; parse out borough or join to a geo table rather than feeding the raw string to a model.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["NAME"].stats

statvalue
n2,327
nulls0 (0.0%)
unique2,327
len_min 38
len_max 46
len_mean 41.65
len_median 41
len_p95 46
word_mean 7.133
word_median 7
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,539
readability_flesch_mean 91.45
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 13.
Character-length distribution for NAME.
Show data table
Character-length distribution for NAME (mean: 41.64890416845724).
charscount
38 – 387
38 – 380
38 – 390
39 – 390
39 – 390
39 – 39104
39 – 390
39 – 400
40 – 400
40 – 400
40 – 40785
40 – 400
40 – 410
41 – 410
41 – 410
41 – 41447
41 – 410
41 – 420
42 – 420
42 – 420
42 – 42200
42 – 420
42 – 430
43 – 430
43 – 430
43 – 43378
43 – 430
43 – 440
44 – 440
44 – 440
44 – 44190
44 – 440
44 – 450
45 – 450
45 – 450
45 – 4582
45 – 450
45 – 460
46 – 460
46 – 46134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or pipeline stage code that was filtered upstream.

Treatment: Drop, constant column.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["state"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1
min 36
max 36
mean 36
median 36
std 0
q1 36
q3 36
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 14.
Distribution of state. Vertical dash marks the median.
Show data table
Histogram bins for state (median: 36.0).
bincount
35.5 – 35.520
35.52 – 35.550
35.55 – 35.580
35.58 – 35.60
35.6 – 35.620
35.62 – 35.650
35.65 – 35.670
35.67 – 35.70
35.7 – 35.730
35.73 – 35.750
35.75 – 35.770
35.77 – 35.80
35.8 – 35.830
35.83 – 35.850
35.85 – 35.880
35.88 – 35.90
35.9 – 35.920
35.92 – 35.950
35.95 – 35.980
35.98 – 360
36 – 36.022327
36.02 – 36.050
36.05 – 36.080
36.08 – 36.10
36.1 – 36.120
36.12 – 36.150
36.15 – 36.170
36.17 – 36.20
36.2 – 36.230
36.23 – 36.250
36.25 – 36.270
36.27 – 36.30
36.3 – 36.330
36.33 – 36.350
36.35 – 36.380
36.38 – 36.40
36.4 – 36.420
36.42 – 36.450
36.45 – 36.480
36.48 – 36.50

county numeric feature

Encoded county identifier stored as a numeric code, with only 5 distinct values across 2327 rows and no nulls. The values (min 5, max 85, median 47) look like sparse categorical codes rather than a continuous measurement, and the negative skew (-0.72) reflects uneven frequency across those 5 codes.

Treatment: Cast to categorical and one-hot or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["county"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
min 5
max 85
mean 55
median 47
std 25.97
q1 47
q3 81
iqr 34
skew -0.72
kurtosis -0.4531
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 15.
Distribution of county. Vertical dash marks the median.
Show data table
Histogram bins for county (median: 47.0).
bincount
5 – 7361
7 – 90
9 – 110
11 – 130
13 – 150
15 – 170
17 – 190
19 – 210
21 – 230
23 – 250
25 – 270
27 – 290
29 – 310
31 – 330
33 – 350
35 – 370
37 – 390
39 – 410
41 – 430
43 – 450
45 – 470
47 – 49805
49 – 510
51 – 530
53 – 550
55 – 570
57 – 590
59 – 610
61 – 63310
63 – 650
65 – 670
67 – 690
69 – 710
71 – 730
73 – 750
75 – 770
77 – 790
79 – 810
81 – 83725
83 – 85126

tract numeric identifier

This is almost certainly a U.S. Census tract code stored as a numeric, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a median of 30100 but a max of 990100, which is the expected pattern for tract codes rather than a true magnitude — values are categorical identifiers padded into a numeric range. The 63 flagged outliers (2.7%) are likely just tracts in higher-numbered county/state ranges, not data errors.

Treatment: Treat as a categorical geographic code; cast to zero-padded string and join to tract-level reference data rather than using as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["tract"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,530
min 100
max 990,100
mean 4.225e+04
median 30,100
std 4.827e+04
q1 15,200
q3 5.79e+04
iqr 4.27e+04
skew 10.14
kurtosis 189.8
n_outliers 63
outlier_rate 0.02707
zero_rate 0
alert: high_skewskew=+10.14
Fig 16.
Distribution of tract. Vertical dash marks the median.
Show data table
Histogram bins for tract (median: 30100.0).
bincount
100 – 2.485e+04982
2.485e+04 – 4.96e+04617
4.96e+04 – 7.435e+04329
7.435e+04 – 9.91e+04197
9.91e+04 – 1.238e+05145
1.238e+05 – 1.486e+0537
1.486e+05 – 1.734e+0517
1.734e+05 – 1.981e+050
1.981e+05 – 2.228e+050
2.228e+05 – 2.476e+050
2.476e+05 – 2.724e+050
2.724e+05 – 2.971e+050
2.971e+05 – 3.218e+050
3.218e+05 – 3.466e+050
3.466e+05 – 3.714e+050
3.714e+05 – 3.961e+050
3.961e+05 – 4.208e+050
4.208e+05 – 4.456e+050
4.456e+05 – 4.704e+050
4.704e+05 – 4.951e+050
4.951e+05 – 5.198e+050
5.198e+05 – 5.446e+050
5.446e+05 – 5.694e+050
5.694e+05 – 5.941e+050
5.941e+05 – 6.188e+050
6.188e+05 – 6.436e+050
6.436e+05 – 6.684e+050
6.684e+05 – 6.931e+050
6.931e+05 – 7.178e+050
7.178e+05 – 7.426e+050
7.426e+05 – 7.674e+050
7.674e+05 – 7.921e+050
7.921e+05 – 8.168e+050
8.168e+05 – 8.416e+050
8.416e+05 – 8.664e+050
8.664e+05 – 8.911e+050
8.911e+05 – 9.158e+050
9.158e+05 – 9.406e+050
9.406e+05 – 9.654e+050
9.654e+05 – 9.901e+053

county_name categorical feature

This column lists New York City borough/county names across 2327 rows, with exactly 5 unique values and no nulls. Distribution mirrors NYC borough sizes: Brooklyn (Kings) leads at 805 (34.6%), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.90 indicates a fairly balanced spread across the five categories with no extreme concentration.

Treatment: One-hot or target-encode for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["county_name"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
top_value Brooklyn (Kings)
top_rate 0.3459
cardinality 5
entropy 2.086
entropy_ratio 0.8985
Fig 17.
Top values for county_name.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%

moderate_burden numeric feature

A non-negative integer count named 'moderate_burden', spanning 0 to 1732 with a median of 159 and mean of 216 across 2327 rows, no nulls. The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds bulk.

Treatment: Apply a log1p transform before regression to tame the right-skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["moderate_burden"].stats

statvalue
n2,327
nulls0 (0.0%)
unique639
min 0
max 1,732
mean 216.1
median 159
std 210.4
q1 64
q3 311
iqr 247
skew 1.934
kurtosis 6.052
n_outliers 86
outlier_rate 0.03696
zero_rate 0.06403
Fig 18.
Distribution of moderate_burden. Vertical dash marks the median.
Show data table
Histogram bins for moderate_burden (median: 159.0).
bincount
0 – 43.3431
43.3 – 86.6317
86.6 – 129.9256
129.9 – 173.2245
173.2 – 216.5190
216.5 – 259.8149
259.8 – 303.1137
303.1 – 346.4109
346.4 – 389.7105
389.7 – 43389
433 – 476.361
476.3 – 519.653
519.6 – 562.928
562.9 – 606.233
606.2 – 649.528
649.5 – 692.817
692.8 – 736.111
736.1 – 779.416
779.4 – 822.77
822.7 – 86610
866 – 909.36
909.3 – 952.69
952.6 – 995.92
995.9 – 10390
1039 – 10823
1082 – 11266
1126 – 11691
1169 – 12120
1212 – 12561
1256 – 12991
1299 – 13421
1342 – 13860
1386 – 14290
1429 – 14720
1472 – 15160
1516 – 15593
1559 – 16020
1602 – 16451
1645 – 16890
1689 – 17321

severe_burden numeric feature

Numeric count-like column 'severe_burden' with 2327 rows, no nulls, and 706 unique integer values ranging from 0 to 1918 (median 184, mean 253.18). The distribution is right-skewed (skew 1.60, kurtosis 3.44) with 6.27% zeros and 87 outliers (3.74%) above the upper whisker. The wide IQR (278) and std (236.60) relative to the median suggest substantial dispersion across units.

Treatment: Apply a log1p transform before regression to tame the right skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["severe_burden"].stats

statvalue
n2,327
nulls0 (0.0%)
unique706
min 0
max 1,918
mean 253.2
median 184
std 236.6
q1 82
q3 360
iqr 278
skew 1.603
kurtosis 3.435
n_outliers 87
outlier_rate 0.03739
zero_rate 0.06274
Fig 19.
Distribution of severe_burden. Vertical dash marks the median.
Show data table
Histogram bins for severe_burden (median: 184.0).
bincount
0 – 47.95368
47.95 – 95.9293
95.9 – 143.9290
143.9 – 191.8249
191.8 – 239.8186
239.8 – 287.7175
287.7 – 335.7122
335.7 – 383.6114
383.6 – 431.6101
431.6 – 479.583
479.5 – 527.562
527.5 – 575.445
575.4 – 623.448
623.4 – 671.331
671.3 – 719.241
719.2 – 767.228
767.2 – 815.215
815.2 – 863.112
863.1 – 911.114
911.1 – 9597
959 – 10079
1007 – 10558
1055 – 11039
1103 – 11513
1151 – 11994
1199 – 12475
1247 – 12951
1295 – 13432
1343 – 13910
1391 – 14380
1438 – 14860
1486 – 15340
1534 – 15820
1582 – 16301
1630 – 16780
1678 – 17260
1726 – 17740
1774 – 18220
1822 – 18700
1870 – 19181

pct_moderate_burden numeric feature

This is a percentage feature measuring the share of some population under moderate housing burden, ranging 0–100 with mean 22.74 and median 21.8. The distribution is right-skewed (skew 1.51, kurtosis 6.70) with 59 outliers (2.65%) and a 4.38% null rate. About 2.1% of rows are exact zeros and the IQR is tight at 12.3, so the upper tail past q3=28.2 stretches all the way to 100.

Treatment: Impute the ~4% nulls and consider a mild transform or winsorization to tame the right tail before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["pct_moderate_burden"].stats

statvalue
n2,327
nulls102 (4.4%)
unique461
min 0
max 100
mean 22.74
median 21.8
std 11.36
q1 15.9
q3 28.2
iqr 12.3
skew 1.509
kurtosis 6.704
n_outliers 59
outlier_rate 0.02652
zero_rate 0.02112
Fig 20.
Distribution of pct_moderate_burden. Vertical dash marks the median.
Show data table
Histogram bins for pct_moderate_burden (median: 21.8).
bincount
0 – 2.555
2.5 – 524
5 – 7.549
7.5 – 1078
10 – 12.5108
12.5 – 15160
15 – 17.5213
17.5 – 20251
20 – 22.5238
22.5 – 25240
25 – 27.5193
27.5 – 30172
30 – 32.5129
32.5 – 3579
35 – 37.571
37.5 – 4041
40 – 42.531
42.5 – 4524
45 – 47.514
47.5 – 5012
50 – 52.56
52.5 – 554
55 – 57.54
57.5 – 606
60 – 62.50
62.5 – 653
65 – 67.53
67.5 – 705
70 – 72.50
72.5 – 751
75 – 77.51
77.5 – 801
80 – 82.51
82.5 – 851
85 – 87.50
87.5 – 901
90 – 92.52
92.5 – 951
95 – 97.50
97.5 – 1003

pct_severe_burden numeric feature

A percentage metric (0–100 range) capturing the share of some population under severe burden, with a mean of 27.12 and median of 26.2 suggesting a fairly typical right-skewed distribution (skew 0.57). Spread is moderate (std 12.68, IQR 15.9) and only 1.35% of rows are flagged as outliers, though a max of 100.0 alongside a 1.98% zero rate hints at a few extreme records worth inspecting. Note the 4.38% null rate, which will need handling.

Treatment: Impute the 4.4% missing values and use as-is; mild skew does not require transformation.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["pct_severe_burden"].stats

statvalue
n2,327
nulls102 (4.4%)
unique518
min 0
max 100
mean 27.12
median 26.2
std 12.68
q1 18.7
q3 34.6
iqr 15.9
skew 0.5663
kurtosis 1.222
n_outliers 30
outlier_rate 0.01348
zero_rate 0.01978
Fig 21.
Distribution of pct_severe_burden. Vertical dash marks the median.
Show data table
Histogram bins for pct_severe_burden (median: 26.2).
bincount
0 – 2.545
2.5 – 514
5 – 7.541
7.5 – 1053
10 – 12.594
12.5 – 15115
15 – 17.5131
17.5 – 20160
20 – 22.5170
22.5 – 25188
25 – 27.5188
27.5 – 30168
30 – 32.5173
32.5 – 35157
35 – 37.5115
37.5 – 4097
40 – 42.573
42.5 – 4562
45 – 47.544
47.5 – 5035
50 – 52.529
52.5 – 5519
55 – 57.518
57.5 – 6012
60 – 62.56
62.5 – 654
65 – 67.54
67.5 – 702
70 – 72.51
72.5 – 751
75 – 77.51
77.5 – 801
80 – 82.50
82.5 – 851
85 – 87.51
87.5 – 900
90 – 92.51
92.5 – 950
95 – 97.50
97.5 – 1001

rent_burdened numeric feature

Likely a count or dollar measure of rent-burdened households (or burden amount) per record, ranging from 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 outliers (3.5%) and 4.7% exact zeros, so a long tail dominates the upper end.

Treatment: Apply a log1p transform before regression to tame the right skew and zero mass.

anthropic:claude-opus-4-7 · confidence medium
Out[55]:

saturn.columns["rent_burdened"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,013
min 0
max 3,153
mean 469.3
median 358
std 415.3
q1 164.5
q3 670
iqr 505.5
skew 1.494
kurtosis 3.005
n_outliers 82
outlier_rate 0.03524
zero_rate 0.04727
Fig 22.
Distribution of rent_burdened. Vertical dash marks the median.
Show data table
Histogram bins for rent_burdened (median: 358.0).
bincount
0 – 78.83310
78.83 – 157.7256
157.7 – 236.5264
236.5 – 315.3231
315.3 – 394.1190
394.1 – 473180
473 – 551.8147
551.8 – 630.6113
630.6 – 709.4108
709.4 – 788.275
788.2 – 867.191
867.1 – 945.973
945.9 – 102557
1025 – 110439
1104 – 118241
1182 – 126123
1261 – 134026
1340 – 141920
1419 – 149819
1498 – 157611
1576 – 165516
1655 – 17346
1734 – 18135
1813 – 18926
1892 – 19714
1971 – 20492
2049 – 21283
2128 – 22075
2207 – 22860
2286 – 23650
2365 – 24441
2444 – 25221
2522 – 26012
2601 – 26801
2680 – 27590
2759 – 28380
2838 – 29170
2917 – 29950
2995 – 30740
3074 – 31531

pct_rent_burdened numeric feature

This is a numeric percentage indicating the share of rent-burdened households per record, ranging from 0 to 100 with a mean of 49.87 and median of 50.0. The distribution is nearly symmetric (skew -0.04) and reasonably tight around the middle (IQR 17.9, std 14.6), with 4.38% nulls and only 0.36% zeros. 62 outliers (2.79%) sit beyond the whiskers, but no severe tail or drift is evident.

Treatment: Impute the ~4% nulls and use as-is; no transform needed given near-symmetric bounded percentage.

anthropic:claude-opus-4-7 · confidence high
Out[58]:

saturn.columns["pct_rent_burdened"].stats

statvalue
n2,327
nulls102 (4.4%)
unique596
min 0
max 100
mean 49.87
median 50
std 14.62
q1 40.9
q3 58.8
iqr 17.9
skew -0.03839
kurtosis 0.7849
n_outliers 62
outlier_rate 0.02787
zero_rate 0.003596
Fig 23.
Distribution of pct_rent_burdened. Vertical dash marks the median.
Show data table
Histogram bins for pct_rent_burdened (median: 50.0).
bincount
0 – 2.58
2.5 – 53
5 – 7.51
7.5 – 105
10 – 12.57
12.5 – 158
15 – 17.512
17.5 – 2014
20 – 22.514
22.5 – 2524
25 – 27.535
27.5 – 3042
30 – 32.553
32.5 – 3580
35 – 37.591
37.5 – 40119
40 – 42.5129
42.5 – 45144
45 – 47.5146
47.5 – 50177
50 – 52.5139
52.5 – 55178
55 – 57.5162
57.5 – 60131
60 – 62.5117
62.5 – 6597
65 – 67.560
67.5 – 7057
70 – 72.554
72.5 – 7528
75 – 77.520
77.5 – 8025
80 – 82.58
82.5 – 854
85 – 87.512
87.5 – 905
90 – 92.55
92.5 – 953
95 – 97.50
97.5 – 1008

median_gross_rent numeric feature

This is a numeric feature for median gross rent, with 2327 non-null values and 1232 unique levels. The middle of the distribution looks plausible (median 1735, IQR 1441.5–2049, max 3501), but the minimum is -666666666 and the mean is -41539608.8 with std 161182638.7, indicating sentinel values masquerading as numbers and producing severe negative skew (-3.62) and 289 outliers (12.4%).

Treatment: Replace the -666666666 sentinel with null before any modelling or aggregation.

anthropic:claude-opus-4-7 · confidence high
Out[61]:

saturn.columns["median_gross_rent"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,232
min -6.667e+08
max 3,501
mean -4.154e+07
median 1,735
std 1.612e+08
q1 1442
q3 2,049
iqr 607.5
skew -3.621
kurtosis 11.11
n_outliers 289
outlier_rate 0.1242
zero_rate 0
alert: high_skewskew=-3.62
alert: outliers12.4% rows beyond 1.5 IQR
Fig 24.
Distribution of median_gross_rent. Vertical dash marks the median.
Show data table
Histogram bins for median_gross_rent (median: 1735.0).
bincount
-6.667e+08 – -6.5e+08145
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.667e+080
-5.667e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.167e+080
-5.167e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.667e+080
-4.667e+08 – -4.5e+080
-4.5e+08 – -4.333e+080
-4.333e+08 – -4.167e+080
-4.167e+08 – -4e+080
-4e+08 – -3.833e+080
-3.833e+08 – -3.667e+080
-3.667e+08 – -3.5e+080
-3.5e+08 – -3.333e+080
-3.333e+08 – -3.167e+080
-3.167e+08 – -3e+080
-3e+08 – -2.833e+080
-2.833e+08 – -2.667e+080
-2.667e+08 – -2.5e+080
-2.5e+08 – -2.333e+080
-2.333e+08 – -2.167e+080
-2.167e+08 – -2e+080
-2e+08 – -1.833e+080
-1.833e+08 – -1.667e+080
-1.667e+08 – -1.5e+080
-1.5e+08 – -1.333e+080
-1.333e+08 – -1.167e+080
-1.167e+08 – -1e+080
-1e+08 – -8.333e+070
-8.333e+07 – -6.666e+070
-6.666e+07 – -5e+070
-5e+07 – -3.333e+070
-3.333e+07 – -1.666e+070
-1.666e+07 – 35012182

median_household_income numeric feature

Median household income in dollars per record, fully populated across 2,327 rows with 2,106 unique values and a sensible median of 76,833 and IQR of 49,117. The mean of -36,017,397 and minimum of -666,666,666 are sentinel-coded missing values masquerading as numbers, which drag skew to -3.94 and kurtosis to 13.53. Roughly 8.9% of rows (208) are flagged as outliers, almost certainly the same sentinel contamination.

Treatment: Replace -666666666 sentinel with null, then consider log-transform or winsorisation before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[64]:

saturn.columns["median_household_income"].stats

statvalue
n2,327
nulls0 (0.0%)
unique2,106
min -6.667e+08
max 250,001
mean -3.602e+07
median 76,833
std 1.509e+08
q1 5.324e+04
q3 1.024e+05
iqr 49,117
skew -3.94
kurtosis 13.53
n_outliers 208
outlier_rate 0.08939
zero_rate 0
alert: high_skewskew=-3.94
alert: outliers8.9% rows beyond 1.5 IQR
Fig 25.
Distribution of median_household_income. Vertical dash marks the median.
Show data table
Histogram bins for median_household_income (median: 76833.0).
bincount
-6.667e+08 – -6.5e+08126
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.166e+080
-6.166e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -4.999e+080
-4.999e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.499e+080
-4.499e+08 – -4.332e+080
-4.332e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.832e+080
-3.832e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.332e+080
-3.332e+08 – -3.165e+080
-3.165e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.665e+080
-2.665e+08 – -2.498e+080
-2.498e+08 – -2.332e+080
-2.332e+08 – -2.165e+080
-2.165e+08 – -1.998e+080
-1.998e+08 – -1.832e+080
-1.832e+08 – -1.665e+080
-1.665e+08 – -1.498e+080
-1.498e+08 – -1.331e+080
-1.331e+08 – -1.165e+080
-1.165e+08 – -9.979e+070
-9.979e+07 – -8.311e+070
-8.311e+07 – -6.644e+070
-6.644e+07 – -4.977e+070
-4.977e+07 – -3.31e+070
-3.31e+07 – -1.642e+070
-1.642e+07 – 2.5e+052201

total_households numeric feature

Counts of households per record, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zero, which may indicate unpopulated or placeholder areas.

Treatment: Log-transform or winsorize before modelling and decide whether zero-household rows should be filtered.

anthropic:claude-opus-4-7 · confidence high
Out[67]:

saturn.columns["total_households"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,495
min 0
max 8,209
mean 1411
median 1,252
std 923.3
q1 773.5
q3 1,850
iqr 1076
skew 1.479
kurtosis 4.377
n_outliers 70
outlier_rate 0.03008
zero_rate 0.04125
Fig 26.
Distribution of total_households. Vertical dash marks the median.
Show data table
Histogram bins for total_households (median: 1252.0).
bincount
0 – 205.2123
205.2 – 410.441
410.4 – 615.7203
615.7 – 820.9272
820.9 – 1026269
1026 – 1231237
1231 – 1437215
1437 – 1642221
1642 – 1847162
1847 – 2052134
2052 – 225794
2257 – 2463101
2463 – 266866
2668 – 287339
2873 – 307835
3078 – 328424
3284 – 348922
3489 – 36948
3694 – 38997
3899 – 41049
4104 – 431013
4310 – 45159
4515 – 47205
4720 – 49255
4925 – 51313
5131 – 53362
5336 – 55412
5541 – 57460
5746 – 59521
5952 – 61571
6157 – 63620
6362 – 65670
6567 – 67721
6772 – 69782
6978 – 71830
7183 – 73880
7388 – 75930
7593 – 77990
7799 – 80040
8004 – 82091

owner_occupied numeric feature

Despite the boolean-sounding name 'owner_occupied', this is a numeric count column with 1001 unique values ranging from 0 to 3052 and a mean of 464.6 — likely a tally of owner-occupied units per record (e.g., per tract or block). The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% zeros. No nulls are present.

Treatment: Log-transform (log1p to handle the 7% zeros) before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence medium
Out[70]:

saturn.columns["owner_occupied"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,001
min 0
max 3,052
mean 464.6
median 371
std 422.6
q1 177
q3 608
iqr 431
skew 1.761
kurtosis 4.254
n_outliers 143
outlier_rate 0.06145
zero_rate 0.0722
alert: outliers6.1% rows beyond 1.5 IQR
Fig 27.
Distribution of owner_occupied. Vertical dash marks the median.
Show data table
Histogram bins for owner_occupied (median: 371.0).
bincount
0 – 76.3343
76.3 – 152.6175
152.6 – 228.9191
228.9 – 305.2236
305.2 – 381.5258
381.5 – 457.8245
457.8 – 534.1167
534.1 – 610.4134
610.4 – 686.798
686.7 – 76369
763 – 839.361
839.3 – 915.649
915.6 – 991.953
991.9 – 106843
1068 – 114433
1144 – 122121
1221 – 129720
1297 – 137328
1373 – 145016
1450 – 152618
1526 – 16029
1602 – 167913
1679 – 17559
1755 – 18318
1831 – 19085
1908 – 19842
1984 – 20604
2060 – 21363
2136 – 22133
2213 – 22891
2289 – 23652
2365 – 24422
2442 – 25183
2518 – 25940
2594 – 26703
2670 – 27470
2747 – 28231
2823 – 28990
2899 – 29760
2976 – 30521

renter_occupied numeric feature

Counts of renter-occupied units per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) and 4.38% zeros, consistent with area-level housing tallies rather than a per-household flag.

Treatment: log-transform or scale before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[73]:

saturn.columns["renter_occupied"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,418
min 0
max 8,209
mean 946.1
median 726
std 815.4
q1 346
q3 1,357
iqr 1,011
skew 1.595
kurtosis 4.627
n_outliers 69
outlier_rate 0.02965
zero_rate 0.04383
Fig 28.
Distribution of renter_occupied. Vertical dash marks the median.
Show data table
Histogram bins for renter_occupied (median: 726.0).
bincount
0 – 205.2349
205.2 – 410.4358
410.4 – 615.7292
615.7 – 820.9268
820.9 – 1026207
1026 – 1231175
1231 – 1437168
1437 – 1642110
1642 – 1847100
1847 – 205268
2052 – 225763
2257 – 246342
2463 – 266836
2668 – 287322
2873 – 307819
3078 – 328417
3284 – 34896
3489 – 36945
3694 – 38994
3899 – 41046
4104 – 43105
4310 – 45153
4515 – 47201
4720 – 49250
4925 – 51311
5131 – 53361
5336 – 55410
5541 – 57460
5746 – 59520
5952 – 61570
6157 – 63620
6362 – 65670
6567 – 67720
6772 – 69780
6978 – 71830
7183 – 73880
7388 – 75930
7593 – 77990
7799 – 80040
8004 – 82091

pct_owner_occupied numeric feature

Percentage of owner-occupied housing per record, ranging the full 0-100 scale with a mean of 37.5 and median of 34.4. The distribution is wide (std 25.7, IQR 39.7) and slightly right-skewed (0.39) with negative kurtosis (-0.85), indicating a flat, near-uniform spread rather than a tight central mass. About 3.2% of rows are exactly zero and 4.1% are null, but no statistical outliers were flagged.

Treatment: Use as-is as a bounded percentage feature; impute the 4% nulls with the median or a missingness flag.

anthropic:claude-opus-4-7 · confidence high
Out[76]:

saturn.columns["pct_owner_occupied"].stats

statvalue
n2,327
nulls96 (4.1%)
unique823
min 0
max 100
mean 37.51
median 34.4
std 25.65
q1 16.4
q3 56.1
iqr 39.7
skew 0.3948
kurtosis -0.854
n_outliers 0
outlier_rate 0
zero_rate 0.03227
Fig 29.
Distribution of pct_owner_occupied. Vertical dash marks the median.
Show data table
Histogram bins for pct_owner_occupied (median: 34.4).
bincount
0 – 2.5141
2.5 – 586
5 – 7.571
7.5 – 1063
10 – 12.586
12.5 – 1565
15 – 17.572
17.5 – 2088
20 – 22.598
22.5 – 2588
25 – 27.579
27.5 – 3067
30 – 32.570
32.5 – 3556
35 – 37.576
37.5 – 4068
40 – 42.559
42.5 – 4572
45 – 47.558
47.5 – 5054
50 – 52.570
52.5 – 5559
55 – 57.555
57.5 – 6039
60 – 62.544
62.5 – 6540
65 – 67.552
67.5 – 7034
70 – 72.540
72.5 – 7547
75 – 77.532
77.5 – 8048
80 – 82.531
82.5 – 8527
85 – 87.526
87.5 – 9024
90 – 92.523
92.5 – 958
95 – 97.56
97.5 – 1009

pct_renter_occupied numeric feature

Numeric percentage of renter-occupied units, ranging the full 0–100 with mean 62.5 and median 65.6, suggesting these records skew toward rental-heavy geographies. Spread is wide (std 25.7, IQR 39.7) and the distribution is mildly left-skewed (-0.39) and flat (kurtosis -0.85), so no outliers were flagged. About 4.1% of rows are null and only 0.27% are exact zeros, with 823 distinct values across 2,327 rows.

Treatment: Use as-is as a bounded percentage feature; impute the 4.1% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[79]:

saturn.columns["pct_renter_occupied"].stats

statvalue
n2,327
nulls96 (4.1%)
unique823
min 0
max 100
mean 62.49
median 65.6
std 25.65
q1 43.9
q3 83.6
iqr 39.7
skew -0.3948
kurtosis -0.854
n_outliers 0
outlier_rate 0
zero_rate 0.002689
Fig 30.
Distribution of pct_renter_occupied. Vertical dash marks the median.
Show data table
Histogram bins for pct_renter_occupied (median: 65.6).
bincount
0 – 2.59
2.5 – 56
5 – 7.57
7.5 – 1023
10 – 12.524
12.5 – 1526
15 – 17.527
17.5 – 2032
20 – 22.547
22.5 – 2531
25 – 27.548
27.5 – 3040
30 – 32.534
32.5 – 3550
35 – 37.539
37.5 – 4046
40 – 42.541
42.5 – 4553
45 – 47.557
47.5 – 5072
50 – 52.554
52.5 – 5554
55 – 57.573
57.5 – 6062
60 – 62.569
62.5 – 6572
65 – 67.560
67.5 – 7065
70 – 72.572
72.5 – 7575
75 – 77.591
77.5 – 8094
80 – 82.592
82.5 – 8573
85 – 87.564
87.5 – 9083
90 – 92.565
92.5 – 9571
95 – 97.583
97.5 – 100147

How to cite

click to copy

BibTeX
@misc{saturn-nyc-housing-nyc-housing-metrics-merged-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: nyc housing nyc housing metrics merged},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/nyc_housing-nyc_housing_metrics_merged}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: nyc housing nyc housing metrics merged. Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/nyc_housing-nyc_housing_metrics_merged