nyc_housing-nyc_housing_metrics_merged

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv

Saturn profiled 2,327 rows across 23 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv",
    "--findings", "nyc_housing-nyc_housing_metrics_merged.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset covers 2,327 NYC census tracts with 23 columns describing housing tenure, rent burden, income, and rent levels across the five boroughs. The most urgent issue is data hygiene: median_gross_rent and median_household_income both contain a sentinel value of -666666666, which drags their means to roughly -41.5M and -36M respectively despite sensible medians (~$1,735 rent, ~$76,833 income) — these need to be filtered before any analysis. Beyond that, the substantive story is rent burden: pct_rent_burdened has a median of 50% with an IQR of 40.9–58.8, meaning half of NYC tracts have a majority of renters paying 30%+ of income on rent. Brooklyn (Kings) dominates the tract count at 35%, followed by Queens (31%) and the Bronx (15%), so any borough-level comparison should weight accordingly. The state column is constant (all 36, New York) and can be dropped.

citing: median_gross_rent · median_household_income · pct_rent_burdened · pct_severe_burden · pct_owner_occupied · county_name · state · total_households

Out[4]:

saturn.schema() · 23 columns

column	kind	n	null%	unique	alerts
total_renter_households	numeric	2,327	0.0%	1,418
rent_30_to_34_9_pct	numeric	2,327	0.0%	355	high_skew outliers
rent_35_to_39_9_pct	numeric	2,327	0.0%	270	high_skew
rent_40_to_49_9_pct	numeric	2,327	0.0%	322	high_skew
rent_50_pct_or_more	numeric	2,327	0.0%	706
NAME	text	2,327	0.0%	2,327	near_unique
state	numeric	2,327	0.0%	1	constant
county	numeric	2,327	0.0%	5
tract	numeric	2,327	0.0%	1,530	high_skew
county_name	categorical	2,327	0.0%	5
moderate_burden	numeric	2,327	0.0%	639
severe_burden	numeric	2,327	0.0%	706
pct_moderate_burden	numeric	2,327	4.4%	461
pct_severe_burden	numeric	2,327	4.4%	518
rent_burdened	numeric	2,327	0.0%	1,013
pct_rent_burdened	numeric	2,327	4.4%	596
median_gross_rent	numeric	2,327	0.0%	1,232	high_skew outliers
median_household_income	numeric	2,327	0.0%	2,106	high_skew outliers
total_households	numeric	2,327	0.0%	1,495
owner_occupied	numeric	2,327	0.0%	1,001	outliers
renter_occupied	numeric	2,327	0.0%	1,418
pct_owner_occupied	numeric	2,327	4.1%	823
pct_renter_occupied	numeric	2,327	4.1%	823

Fig 1.

county_name · Tract counts by borough — Brooklyn and Queens dominate the sample.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

Fig 2.

pct_rent_burdened · Share of renters paying 30%+ of income; median is 50%, showing widespread burden.

Show data table

Histogram bins for pct_rent_burdened (median: 50.0).
bin	count
0 – 2.5	8
2.5 – 5	3
5 – 7.5	1
7.5 – 10	5
10 – 12.5	7
12.5 – 15	8
15 – 17.5	12
17.5 – 20	14
20 – 22.5	14
22.5 – 25	24
25 – 27.5	35
27.5 – 30	42
30 – 32.5	53
32.5 – 35	80
35 – 37.5	91
37.5 – 40	119
40 – 42.5	129
42.5 – 45	144
45 – 47.5	146
47.5 – 50	177
50 – 52.5	139
52.5 – 55	178
55 – 57.5	162
57.5 – 60	131
60 – 62.5	117
62.5 – 65	97
65 – 67.5	60
67.5 – 70	57
70 – 72.5	54
72.5 – 75	28
75 – 77.5	20
77.5 – 80	25
80 – 82.5	8
82.5 – 85	4
85 – 87.5	12
87.5 – 90	5
90 – 92.5	5
92.5 – 95	3
95 – 97.5	0
97.5 – 100	8

Fig 3.

pct_severe_burden · Severe burden (50%+ of income on rent) — see how thick the right tail runs.

Show data table

Histogram bins for pct_severe_burden (median: 26.2).
bin	count
0 – 2.5	45
2.5 – 5	14
5 – 7.5	41
7.5 – 10	53
10 – 12.5	94
12.5 – 15	115
15 – 17.5	131
17.5 – 20	160
20 – 22.5	170
22.5 – 25	188
25 – 27.5	188
27.5 – 30	168
30 – 32.5	173
32.5 – 35	157
35 – 37.5	115
37.5 – 40	97
40 – 42.5	73
42.5 – 45	62
45 – 47.5	44
47.5 – 50	35
50 – 52.5	29
52.5 – 55	19
55 – 57.5	18
57.5 – 60	12
60 – 62.5	6
62.5 – 65	4
65 – 67.5	4
67.5 – 70	2
70 – 72.5	1
72.5 – 75	1
75 – 77.5	1
77.5 – 80	1
80 – 82.5	0
82.5 – 85	1
85 – 87.5	1
87.5 – 90	0
90 – 92.5	1
92.5 – 95	0
95 – 97.5	0
97.5 – 100	1

Fig 4.

pct_owner_occupied · Owner-occupancy share varies widely (IQR 16–56%); useful for segmenting tracts.

Show data table

Histogram bins for pct_owner_occupied (median: 34.4).
bin	count
0 – 2.5	141
2.5 – 5	86
5 – 7.5	71
7.5 – 10	63
10 – 12.5	86
12.5 – 15	65
15 – 17.5	72
17.5 – 20	88
20 – 22.5	98
22.5 – 25	88
25 – 27.5	79
27.5 – 30	67
30 – 32.5	70
32.5 – 35	56
35 – 37.5	76
37.5 – 40	68
40 – 42.5	59
42.5 – 45	72
45 – 47.5	58
47.5 – 50	54
50 – 52.5	70
52.5 – 55	59
55 – 57.5	55
57.5 – 60	39
60 – 62.5	44
62.5 – 65	40
65 – 67.5	52
67.5 – 70	34
70 – 72.5	40
72.5 – 75	47
75 – 77.5	32
77.5 – 80	48
80 – 82.5	31
82.5 – 85	27
85 – 87.5	26
87.5 – 90	24
90 – 92.5	23
92.5 – 95	8
95 – 97.5	6
97.5 – 100	9

Fig 5.

total_households · Tract size distribution — right-skewed, so per-capita rates beat raw counts.

Show data table

Histogram bins for total_households (median: 1252.0).
bin	count
0 – 205.2	123
205.2 – 410.4	41
410.4 – 615.7	203
615.7 – 820.9	272
820.9 – 1026	269
1026 – 1231	237
1231 – 1437	215
1437 – 1642	221
1642 – 1847	162
1847 – 2052	134
2052 – 2257	94
2257 – 2463	101
2463 – 2668	66
2668 – 2873	39
2873 – 3078	35
3078 – 3284	24
3284 – 3489	22
3489 – 3694	8
3694 – 3899	7
3899 – 4104	9
4104 – 4310	13
4310 – 4515	9
4515 – 4720	5
4720 – 4925	5
4925 – 5131	3
5131 – 5336	2
5336 – 5541	2
5541 – 5746	0
5746 – 5952	1
5952 – 6157	1
6157 – 6362	0
6362 – 6567	0
6567 – 6772	1
6772 – 6978	2
6978 – 7183	0
7183 – 7388	0
7388 – 7593	0
7593 – 7799	0
7799 – 8004	0
8004 – 8209	1

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
total_renter_households	numeric	0.0%
rent_30_to_34_9_pct	numeric	0.0%
rent_35_to_39_9_pct	numeric	0.0%
rent_40_to_49_9_pct	numeric	0.0%
rent_50_pct_or_more	numeric	0.0%
NAME	text	0.0%
state	numeric	0.0%
county	numeric	0.0%
tract	numeric	0.0%
county_name	categorical	0.0%
moderate_burden	numeric	0.0%
severe_burden	numeric	0.0%
pct_moderate_burden	numeric	4.4%
pct_severe_burden	numeric	4.4%
rent_burdened	numeric	0.0%
pct_rent_burdened	numeric	4.4%
median_gross_rent	numeric	0.0%
median_household_income	numeric	0.0%
total_households	numeric	0.0%
owner_occupied	numeric	0.0%
renter_occupied	numeric	0.0%
pct_owner_occupied	numeric	4.1%
pct_renter_occupied	numeric	4.1%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
	total_renter_households	rent_30_to_34_9_pct	rent_35_to_39_9_pct	rent_40_to_49_9_pct	rent_50_pct_or_more	state	county	tract	moderate_burden	severe_burden	pct_moderate_burden	pct_severe_burden
total_renter_households	+1.00	+0.76	+0.73	+0.76	+0.84	+nan	-0.18	-0.23	+0.89	+0.84	-0.03	+0.07
rent_30_to_34_9_pct	+0.76	+1.00	+0.55	+0.57	+0.56	+nan	-0.14	-0.18	+0.87	+0.56	-0.04	+0.09
rent_35_to_39_9_pct	+0.73	+0.55	+1.00	+0.60	+0.61	+nan	-0.13	-0.12	+0.81	+0.61	-0.01	+0.04
rent_40_to_49_9_pct	+0.76	+0.57	+0.60	+1.00	+0.62	+nan	-0.15	-0.15	+0.85	+0.62	-0.02	+0.03
rent_50_pct_or_more	+0.84	+0.56	+0.61	+0.62	+1.00	+nan	-0.32	-0.21	+0.70	+1.00	+0.03	+0.11
state	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan
county	-0.18	-0.14	-0.13	-0.15	-0.32	+nan	+1.00	+0.18	-0.16	-0.32	-0.14	-0.07
tract	-0.23	-0.18	-0.12	-0.15	-0.21	+nan	+0.18	+1.00	-0.18	-0.21	-0.04	-0.04
moderate_burden	+0.89	+0.87	+0.81	+0.85	+0.70	+nan	-0.16	-0.18	+1.00	+0.70	-0.03	+0.06
severe_burden	+0.84	+0.56	+0.61	+0.62	+1.00	+nan	-0.32	-0.21	+0.70	+1.00	+0.03	+0.11
pct_moderate_burden	-0.03	-0.04	-0.01	-0.02	+0.03	+nan	-0.14	-0.04	-0.03	+0.03	+1.00	-0.26
pct_severe_burden	+0.07	+0.09	+0.04	+0.03	+0.11	+nan	-0.07	-0.04	+0.06	+0.11	-0.26	+1.00

total_renter_households numeric feature

This column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end and 4.38% zero values. No nulls, and 1418 unique values across 2327 rows suggests aggregation at a geographic or administrative unit.

Treatment: Log-transform before regression to tame the right skew, and decide whether zero-count rows should be modelled separately.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["total_renter_households"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,418
min	0
max	8,209
mean	946.1
median	726
std	815.4
q1	346
q3	1,357
iqr	1,011
skew	1.595
kurtosis	4.627
n_outliers	69
outlier_rate	0.02965
zero_rate	0.04383

Fig 8.

Distribution of total_renter_households. Vertical dash marks the median.

Show data table

Histogram bins for total_renter_households (median: 726.0).
bin	count
0 – 205.2	349
205.2 – 410.4	358
410.4 – 615.7	292
615.7 – 820.9	268
820.9 – 1026	207
1026 – 1231	175
1231 – 1437	168
1437 – 1642	110
1642 – 1847	100
1847 – 2052	68
2052 – 2257	63
2257 – 2463	42
2463 – 2668	36
2668 – 2873	22
2873 – 3078	19
3078 – 3284	17
3284 – 3489	6
3489 – 3694	5
3694 – 3899	4
3899 – 4104	6
4104 – 4310	5
4310 – 4515	3
4515 – 4720	1
4720 – 4925	0
4925 – 5131	1
5131 – 5336	1
5336 – 5541	0
5541 – 5746	0
5746 – 5952	0
5952 – 6157	0
6157 – 6362	0
6362 – 6567	0
6567 – 6772	0
6772 – 6978	0
6978 – 7183	0
7183 – 7388	0
7388 – 7593	0
7593 – 7799	0
7799 – 8004	0
8004 – 8209	1

rent_30_to_34_9_pct numeric feature

Likely a count of households paying 30%-34.9% of income on rent within some geographic unit, given the integer-like values, zero floor, and max of 1205. The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with a median of 51 against a mean of 83.05, and 16.2% of rows are exactly zero. 124 outliers (5.33%) extend far above the Q3 of 116, consistent with a few large areas dominating.

Treatment: log1p-transform before modelling to tame the skew and zero inflation.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["rent_30_to_34_9_pct"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	355
min	0
max	1,205
mean	83.05
median	51
std	100.3
q1	15
q3	116
iqr	101
skew	2.755
kurtosis	13.86
n_outliers	124
outlier_rate	0.05329
zero_rate	0.1616
alert: high_skew	skew=+2.76
alert: outliers	5.3% rows beyond 1.5 IQR

Fig 9.

Distribution of rent_30_to_34_9_pct. Vertical dash marks the median.

Show data table

Histogram bins for rent_30_to_34_9_pct (median: 51.0).
bin	count
0 – 30.12	836
30.12 – 60.25	444
60.25 – 90.38	275
90.38 – 120.5	217
120.5 – 150.6	152
150.6 – 180.8	105
180.8 – 210.9	90
210.9 – 241	48
241 – 271.1	41
271.1 – 301.2	25
301.2 – 331.4	17
331.4 – 361.5	21
361.5 – 391.6	15
391.6 – 421.8	9
421.8 – 451.9	11
451.9 – 482	5
482 – 512.1	2
512.1 – 542.2	2
542.2 – 572.4	4
572.4 – 602.5	1
602.5 – 632.6	0
632.6 – 662.8	1
662.8 – 692.9	1
692.9 – 723	0
723 – 753.1	1
753.1 – 783.2	2
783.2 – 813.4	0
813.4 – 843.5	0
843.5 – 873.6	1
873.6 – 903.8	0
903.8 – 933.9	0
933.9 – 964	0
964 – 994.1	0
994.1 – 1024	0
1024 – 1054	0
1054 – 1084	0
1084 – 1115	0
1115 – 1145	0
1145 – 1175	0
1175 – 1205	1

rent_35_to_39_9_pct numeric feature

Likely a count of households (or housing units) paying 35% to 39.9% of income on rent within some geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with a median of 35 but a max of 633, and nearly 20% of rows are zero (zero_rate 0.196), suggesting many small areas have no households in this rent burden bracket. 110 outliers (4.7%) sit well above the Q3 of 83.

Treatment: Log1p-transform before regression to tame the right skew and zero inflation.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["rent_35_to_39_9_pct"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	270
min	0
max	633
mean	58.35
median	35
std	69.85
q1	10
q3	83
iqr	73
skew	2.395
kurtosis	9.275
n_outliers	110
outlier_rate	0.04727
zero_rate	0.1964
alert: high_skew	skew=+2.40

Fig 10.

Distribution of rent_35_to_39_9_pct. Vertical dash marks the median.

Show data table

Histogram bins for rent_35_to_39_9_pct (median: 35.0).
bin	count
0 – 15.82	719
15.82 – 31.65	371
31.65 – 47.47	261
47.47 – 63.3	212
63.3 – 79.12	163
79.12 – 94.95	103
94.95 – 110.8	97
110.8 – 126.6	77
126.6 – 142.4	77
142.4 – 158.2	58
158.2 – 174.1	32
174.1 – 189.9	41
189.9 – 205.7	20
205.7 – 221.5	18
221.5 – 237.4	15
237.4 – 253.2	10
253.2 – 269	14
269 – 284.8	5
284.8 – 300.7	3
300.7 – 316.5	8
316.5 – 332.3	6
332.3 – 348.1	2
348.1 – 364	2
364 – 379.8	2
379.8 – 395.6	1
395.6 – 411.4	1
411.4 – 427.3	2
427.3 – 443.1	2
443.1 – 458.9	0
458.9 – 474.8	0
474.8 – 490.6	1
490.6 – 506.4	0
506.4 – 522.2	0
522.2 – 538	0
538 – 553.9	0
553.9 – 569.7	2
569.7 – 585.5	0
585.5 – 601.4	1
601.4 – 617.2	0
617.2 – 633	1

rent_40_to_49_9_pct numeric feature

Likely a count of housing units paying rent in the 40-49.9% income bracket per geographic area. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740 and 111 outliers (4.77%), and 15.6% of rows are zero — consistent with small geographies sitting alongside dense ones.

Treatment: log1p-transform before modelling to tame the right skew and zero mass.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["rent_40_to_49_9_pct"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	322
min	0
max	740
mean	74.68
median	49
std	83.79
q1	14
q3	106
iqr	92
skew	2.137
kurtosis	7.139
n_outliers	111
outlier_rate	0.0477
zero_rate	0.1556
alert: high_skew	skew=+2.14

Fig 11.

Distribution of rent_40_to_49_9_pct. Vertical dash marks the median.

Show data table

Histogram bins for rent_40_to_49_9_pct (median: 49.0).
bin	count
0 – 18.5	671
18.5 – 37	306
37 – 55.5	270
55.5 – 74	201
74 – 92.5	196
92.5 – 111	134
111 – 129.5	102
129.5 – 148	86
148 – 166.5	78
166.5 – 185	63
185 – 203.5	39
203.5 – 222	38
222 – 240.5	26
240.5 – 259	27
259 – 277.5	16
277.5 – 296	16
296 – 314.5	13
314.5 – 333	11
333 – 351.5	5
351.5 – 370	3
370 – 388.5	5
388.5 – 407	3
407 – 425.5	4
425.5 – 444	1
444 – 462.5	4
462.5 – 481	0
481 – 499.5	1
499.5 – 518	2
518 – 536.5	0
536.5 – 555	0
555 – 573.5	2
573.5 – 592	0
592 – 610.5	1
610.5 – 629	1
629 – 647.5	1
647.5 – 666	0
666 – 684.5	0
684.5 – 703	0
703 – 721.5	0
721.5 – 740	1

rent_50_pct_or_more numeric feature

Counts of households spending 50% or more of income on rent, aggregated per geographic unit across 2327 rows with no nulls. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with a median of 184 well below the mean of 253.18 and a max of 1918, and 6.27% of rows are zero. About 3.74% of values fall outside the Tukey fence.

Treatment: Log1p-transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["rent_50_pct_or_more"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	706
min	0
max	1,918
mean	253.2
median	184
std	236.6
q1	82
q3	360
iqr	278
skew	1.603
kurtosis	3.435
n_outliers	87
outlier_rate	0.03739
zero_rate	0.06274

Fig 12.

Distribution of rent_50_pct_or_more. Vertical dash marks the median.

Show data table

Histogram bins for rent_50_pct_or_more (median: 184.0).
bin	count
0 – 47.95	368
47.95 – 95.9	293
95.9 – 143.9	290
143.9 – 191.8	249
191.8 – 239.8	186
239.8 – 287.7	175
287.7 – 335.7	122
335.7 – 383.6	114
383.6 – 431.6	101
431.6 – 479.5	83
479.5 – 527.5	62
527.5 – 575.4	45
575.4 – 623.4	48
623.4 – 671.3	31
671.3 – 719.2	41
719.2 – 767.2	28
767.2 – 815.2	15
815.2 – 863.1	12
863.1 – 911.1	14
911.1 – 959	7
959 – 1007	9
1007 – 1055	8
1055 – 1103	9
1103 – 1151	3
1151 – 1199	4
1199 – 1247	5
1247 – 1295	1
1295 – 1343	2
1343 – 1391	0
1391 – 1438	0
1438 – 1486	0
1486 – 1534	0
1534 – 1582	0
1582 – 1630	1
1630 – 1678	0
1678 – 1726	0
1726 – 1774	0
1774 – 1822	0
1822 – 1870	0
1870 – 1918	1

NAME text identifier

This column holds fully-qualified Census Tract names for New York City, every one of 2327 rows unique with zero nulls and tightly bounded length (38-46 chars, median 41). The vocabulary is formulaic: 'new', 'york', 'census', 'tract', 'county;' appear in essentially every row, with the borough split dominated by Kings (805), Queens (725), Bronx (361), and Richmond (126). Because each value is a one-to-one tract label, it functions as a geographic key rather than a modelling feature.

Treatment: Treat as a tract-level key; parse out borough or join to a geo table rather than feeding the raw string to a model.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["NAME"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	2,327
len_min	38
len_max	46
len_mean	41.65
len_median	41
len_p95	46
word_mean	7.133
word_median	7
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,539
readability_flesch_mean	91.45
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 13.

Character-length distribution for NAME.

Show data table

Character-length distribution for NAME (mean: 41.64890416845724).
chars	count
38 – 38	7
38 – 38	0
38 – 39	0
39 – 39	0
39 – 39	0
39 – 39	104
39 – 39	0
39 – 40	0
40 – 40	0
40 – 40	0
40 – 40	785
40 – 40	0
40 – 41	0
41 – 41	0
41 – 41	0
41 – 41	447
41 – 41	0
41 – 42	0
42 – 42	0
42 – 42	0
42 – 42	200
42 – 42	0
42 – 43	0
43 – 43	0
43 – 43	0
43 – 43	378
43 – 43	0
43 – 44	0
44 – 44	0
44 – 44	0
44 – 44	190
44 – 44	0
44 – 45	0
45 – 45	0
45 – 45	0
45 – 45	82
45 – 45	0
45 – 46	0
46 – 46	0
46 – 46	134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or pipeline stage code that was filtered upstream.

Treatment: Drop, constant column.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["state"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1
min	36
max	36
mean	36
median	36
std	0
q1	36
q3	36
iqr	0
skew	0
kurtosis	0
n_outliers	0
outlier_rate	0
zero_rate	0
alert: constant	only one distinct value

Fig 14.

Distribution of state. Vertical dash marks the median.

Show data table

Histogram bins for state (median: 36.0).
bin	count
35.5 – 35.52	0
35.52 – 35.55	0
35.55 – 35.58	0
35.58 – 35.6	0
35.6 – 35.62	0
35.62 – 35.65	0
35.65 – 35.67	0
35.67 – 35.7	0
35.7 – 35.73	0
35.73 – 35.75	0
35.75 – 35.77	0
35.77 – 35.8	0
35.8 – 35.83	0
35.83 – 35.85	0
35.85 – 35.88	0
35.88 – 35.9	0
35.9 – 35.92	0
35.92 – 35.95	0
35.95 – 35.98	0
35.98 – 36	0
36 – 36.02	2327
36.02 – 36.05	0
36.05 – 36.08	0
36.08 – 36.1	0
36.1 – 36.12	0
36.12 – 36.15	0
36.15 – 36.17	0
36.17 – 36.2	0
36.2 – 36.23	0
36.23 – 36.25	0
36.25 – 36.27	0
36.27 – 36.3	0
36.3 – 36.33	0
36.33 – 36.35	0
36.35 – 36.38	0
36.38 – 36.4	0
36.4 – 36.42	0
36.42 – 36.45	0
36.45 – 36.48	0
36.48 – 36.5	0

county numeric feature

Encoded county identifier stored as a numeric code, with only 5 distinct values across 2327 rows and no nulls. The values (min 5, max 85, median 47) look like sparse categorical codes rather than a continuous measurement, and the negative skew (-0.72) reflects uneven frequency across those 5 codes.

Treatment: Cast to categorical and one-hot or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["county"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
min	5
max	85
mean	55
median	47
std	25.97
q1	47
q3	81
iqr	34
skew	-0.72
kurtosis	-0.4531
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 15.

Distribution of county. Vertical dash marks the median.

Show data table

Histogram bins for county (median: 47.0).
bin	count
5 – 7	361
7 – 9	0
9 – 11	0
11 – 13	0
13 – 15	0
15 – 17	0
17 – 19	0
19 – 21	0
21 – 23	0
23 – 25	0
25 – 27	0
27 – 29	0
29 – 31	0
31 – 33	0
33 – 35	0
35 – 37	0
37 – 39	0
39 – 41	0
41 – 43	0
43 – 45	0
45 – 47	0
47 – 49	805
49 – 51	0
51 – 53	0
53 – 55	0
55 – 57	0
57 – 59	0
59 – 61	0
61 – 63	310
63 – 65	0
65 – 67	0
67 – 69	0
69 – 71	0
71 – 73	0
73 – 75	0
75 – 77	0
77 – 79	0
79 – 81	0
81 – 83	725
83 – 85	126

tract numeric identifier

This is almost certainly a U.S. Census tract code stored as a numeric, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a median of 30100 but a max of 990100, which is the expected pattern for tract codes rather than a true magnitude — values are categorical identifiers padded into a numeric range. The 63 flagged outliers (2.7%) are likely just tracts in higher-numbered county/state ranges, not data errors.

Treatment: Treat as a categorical geographic code; cast to zero-padded string and join to tract-level reference data rather than using as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["tract"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,530
min	100
max	990,100
mean	4.225e+04
median	30,100
std	4.827e+04
q1	15,200
q3	5.79e+04
iqr	4.27e+04
skew	10.14
kurtosis	189.8
n_outliers	63
outlier_rate	0.02707
zero_rate	0
alert: high_skew	skew=+10.14

Fig 16.

Distribution of tract. Vertical dash marks the median.

Show data table

Histogram bins for tract (median: 30100.0).
bin	count
100 – 2.485e+04	982
2.485e+04 – 4.96e+04	617
4.96e+04 – 7.435e+04	329
7.435e+04 – 9.91e+04	197
9.91e+04 – 1.238e+05	145
1.238e+05 – 1.486e+05	37
1.486e+05 – 1.734e+05	17
1.734e+05 – 1.981e+05	0
1.981e+05 – 2.228e+05	0
2.228e+05 – 2.476e+05	0
2.476e+05 – 2.724e+05	0
2.724e+05 – 2.971e+05	0
2.971e+05 – 3.218e+05	0
3.218e+05 – 3.466e+05	0
3.466e+05 – 3.714e+05	0
3.714e+05 – 3.961e+05	0
3.961e+05 – 4.208e+05	0
4.208e+05 – 4.456e+05	0
4.456e+05 – 4.704e+05	0
4.704e+05 – 4.951e+05	0
4.951e+05 – 5.198e+05	0
5.198e+05 – 5.446e+05	0
5.446e+05 – 5.694e+05	0
5.694e+05 – 5.941e+05	0
5.941e+05 – 6.188e+05	0
6.188e+05 – 6.436e+05	0
6.436e+05 – 6.684e+05	0
6.684e+05 – 6.931e+05	0
6.931e+05 – 7.178e+05	0
7.178e+05 – 7.426e+05	0
7.426e+05 – 7.674e+05	0
7.674e+05 – 7.921e+05	0
7.921e+05 – 8.168e+05	0
8.168e+05 – 8.416e+05	0
8.416e+05 – 8.664e+05	0
8.664e+05 – 8.911e+05	0
8.911e+05 – 9.158e+05	0
9.158e+05 – 9.406e+05	0
9.406e+05 – 9.654e+05	0
9.654e+05 – 9.901e+05	3

county_name categorical feature

This column lists New York City borough/county names across 2327 rows, with exactly 5 unique values and no nulls. Distribution mirrors NYC borough sizes: Brooklyn (Kings) leads at 805 (34.6%), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.90 indicates a fairly balanced spread across the five categories with no extreme concentration.

Treatment: One-hot or target-encode for modelling.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["county_name"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
top_value	Brooklyn (Kings)
top_rate	0.3459
cardinality	5
entropy	2.086
entropy_ratio	0.8985

Fig 17.

Top values for county_name.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

moderate_burden numeric feature

A non-negative integer count named 'moderate_burden', spanning 0 to 1732 with a median of 159 and mean of 216 across 2327 rows, no nulls. The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds bulk.

Treatment: Apply a log1p transform before regression to tame the right-skew and outliers.

anthropic:claude-opus-4-7 · confidence high

Out[43]:

saturn.columns["moderate_burden"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	639
min	0
max	1,732
mean	216.1
median	159
std	210.4
q1	64
q3	311
iqr	247
skew	1.934
kurtosis	6.052
n_outliers	86
outlier_rate	0.03696
zero_rate	0.06403

Fig 18.

Distribution of moderate_burden. Vertical dash marks the median.

Show data table

Histogram bins for moderate_burden (median: 159.0).
bin	count
0 – 43.3	431
43.3 – 86.6	317
86.6 – 129.9	256
129.9 – 173.2	245
173.2 – 216.5	190
216.5 – 259.8	149
259.8 – 303.1	137
303.1 – 346.4	109
346.4 – 389.7	105
389.7 – 433	89
433 – 476.3	61
476.3 – 519.6	53
519.6 – 562.9	28
562.9 – 606.2	33
606.2 – 649.5	28
649.5 – 692.8	17
692.8 – 736.1	11
736.1 – 779.4	16
779.4 – 822.7	7
822.7 – 866	10
866 – 909.3	6
909.3 – 952.6	9
952.6 – 995.9	2
995.9 – 1039	0
1039 – 1082	3
1082 – 1126	6
1126 – 1169	1
1169 – 1212	0
1212 – 1256	1
1256 – 1299	1
1299 – 1342	1
1342 – 1386	0
1386 – 1429	0
1429 – 1472	0
1472 – 1516	0
1516 – 1559	3
1559 – 1602	0
1602 – 1645	1
1645 – 1689	0
1689 – 1732	1

severe_burden numeric feature

Numeric count-like column 'severe_burden' with 2327 rows, no nulls, and 706 unique integer values ranging from 0 to 1918 (median 184, mean 253.18). The distribution is right-skewed (skew 1.60, kurtosis 3.44) with 6.27% zeros and 87 outliers (3.74%) above the upper whisker. The wide IQR (278) and std (236.60) relative to the median suggest substantial dispersion across units.

Treatment: Apply a log1p transform before regression to tame the right skew and outliers.

anthropic:claude-opus-4-7 · confidence high

Out[46]:

saturn.columns["severe_burden"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	706
min	0
max	1,918
mean	253.2
median	184
std	236.6
q1	82
q3	360
iqr	278
skew	1.603
kurtosis	3.435
n_outliers	87
outlier_rate	0.03739
zero_rate	0.06274

Fig 19.

Distribution of severe_burden. Vertical dash marks the median.

Show data table

Histogram bins for severe_burden (median: 184.0).
bin	count
0 – 47.95	368
47.95 – 95.9	293
95.9 – 143.9	290
143.9 – 191.8	249
191.8 – 239.8	186
239.8 – 287.7	175
287.7 – 335.7	122
335.7 – 383.6	114
383.6 – 431.6	101
431.6 – 479.5	83
479.5 – 527.5	62
527.5 – 575.4	45
575.4 – 623.4	48
623.4 – 671.3	31
671.3 – 719.2	41
719.2 – 767.2	28
767.2 – 815.2	15
815.2 – 863.1	12
863.1 – 911.1	14
911.1 – 959	7
959 – 1007	9
1007 – 1055	8
1055 – 1103	9
1103 – 1151	3
1151 – 1199	4
1199 – 1247	5
1247 – 1295	1
1295 – 1343	2
1343 – 1391	0
1391 – 1438	0
1438 – 1486	0
1486 – 1534	0
1534 – 1582	0
1582 – 1630	1
1630 – 1678	0
1678 – 1726	0
1726 – 1774	0
1774 – 1822	0
1822 – 1870	0
1870 – 1918	1

pct_moderate_burden numeric feature

This is a percentage feature measuring the share of some population under moderate housing burden, ranging 0–100 with mean 22.74 and median 21.8. The distribution is right-skewed (skew 1.51, kurtosis 6.70) with 59 outliers (2.65%) and a 4.38% null rate. About 2.1% of rows are exact zeros and the IQR is tight at 12.3, so the upper tail past q3=28.2 stretches all the way to 100.

Treatment: Impute the ~4% nulls and consider a mild transform or winsorization to tame the right tail before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[49]:

saturn.columns["pct_moderate_burden"].stats

stat	value
n	2,327
nulls	102 (4.4%)
unique	461
min	0
max	100
mean	22.74
median	21.8
std	11.36
q1	15.9
q3	28.2
iqr	12.3
skew	1.509
kurtosis	6.704
n_outliers	59
outlier_rate	0.02652
zero_rate	0.02112

Fig 20.

Distribution of pct_moderate_burden. Vertical dash marks the median.

Show data table

Histogram bins for pct_moderate_burden (median: 21.8).
bin	count
0 – 2.5	55
2.5 – 5	24
5 – 7.5	49
7.5 – 10	78
10 – 12.5	108
12.5 – 15	160
15 – 17.5	213
17.5 – 20	251
20 – 22.5	238
22.5 – 25	240
25 – 27.5	193
27.5 – 30	172
30 – 32.5	129
32.5 – 35	79
35 – 37.5	71
37.5 – 40	41
40 – 42.5	31
42.5 – 45	24
45 – 47.5	14
47.5 – 50	12
50 – 52.5	6
52.5 – 55	4
55 – 57.5	4
57.5 – 60	6
60 – 62.5	0
62.5 – 65	3
65 – 67.5	3
67.5 – 70	5
70 – 72.5	0
72.5 – 75	1
75 – 77.5	1
77.5 – 80	1
80 – 82.5	1
82.5 – 85	1
85 – 87.5	0
87.5 – 90	1
90 – 92.5	2
92.5 – 95	1
95 – 97.5	0
97.5 – 100	3

pct_severe_burden numeric feature

A percentage metric (0–100 range) capturing the share of some population under severe burden, with a mean of 27.12 and median of 26.2 suggesting a fairly typical right-skewed distribution (skew 0.57). Spread is moderate (std 12.68, IQR 15.9) and only 1.35% of rows are flagged as outliers, though a max of 100.0 alongside a 1.98% zero rate hints at a few extreme records worth inspecting. Note the 4.38% null rate, which will need handling.

Treatment: Impute the 4.4% missing values and use as-is; mild skew does not require transformation.

anthropic:claude-opus-4-7 · confidence high

Out[52]:

saturn.columns["pct_severe_burden"].stats

stat	value
n	2,327
nulls	102 (4.4%)
unique	518
min	0
max	100
mean	27.12
median	26.2
std	12.68
q1	18.7
q3	34.6
iqr	15.9
skew	0.5663
kurtosis	1.222
n_outliers	30
outlier_rate	0.01348
zero_rate	0.01978

Fig 21.

Distribution of pct_severe_burden. Vertical dash marks the median.

Show data table

Histogram bins for pct_severe_burden (median: 26.2).
bin	count
0 – 2.5	45
2.5 – 5	14
5 – 7.5	41
7.5 – 10	53
10 – 12.5	94
12.5 – 15	115
15 – 17.5	131
17.5 – 20	160
20 – 22.5	170
22.5 – 25	188
25 – 27.5	188
27.5 – 30	168
30 – 32.5	173
32.5 – 35	157
35 – 37.5	115
37.5 – 40	97
40 – 42.5	73
42.5 – 45	62
45 – 47.5	44
47.5 – 50	35
50 – 52.5	29
52.5 – 55	19
55 – 57.5	18
57.5 – 60	12
60 – 62.5	6
62.5 – 65	4
65 – 67.5	4
67.5 – 70	2
70 – 72.5	1
72.5 – 75	1
75 – 77.5	1
77.5 – 80	1
80 – 82.5	0
82.5 – 85	1
85 – 87.5	1
87.5 – 90	0
90 – 92.5	1
92.5 – 95	0
95 – 97.5	0
97.5 – 100	1

rent_burdened numeric feature

Likely a count or dollar measure of rent-burdened households (or burden amount) per record, ranging from 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 outliers (3.5%) and 4.7% exact zeros, so a long tail dominates the upper end.

Treatment: Apply a log1p transform before regression to tame the right skew and zero mass.

anthropic:claude-opus-4-7 · confidence medium

Out[55]:

saturn.columns["rent_burdened"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,013
min	0
max	3,153
mean	469.3
median	358
std	415.3
q1	164.5
q3	670
iqr	505.5
skew	1.494
kurtosis	3.005
n_outliers	82
outlier_rate	0.03524
zero_rate	0.04727

Fig 22.

Distribution of rent_burdened. Vertical dash marks the median.

Show data table

Histogram bins for rent_burdened (median: 358.0).
bin	count
0 – 78.83	310
78.83 – 157.7	256
157.7 – 236.5	264
236.5 – 315.3	231
315.3 – 394.1	190
394.1 – 473	180
473 – 551.8	147
551.8 – 630.6	113
630.6 – 709.4	108
709.4 – 788.2	75
788.2 – 867.1	91
867.1 – 945.9	73
945.9 – 1025	57
1025 – 1104	39
1104 – 1182	41
1182 – 1261	23
1261 – 1340	26
1340 – 1419	20
1419 – 1498	19
1498 – 1576	11
1576 – 1655	16
1655 – 1734	6
1734 – 1813	5
1813 – 1892	6
1892 – 1971	4
1971 – 2049	2
2049 – 2128	3
2128 – 2207	5
2207 – 2286	0
2286 – 2365	0
2365 – 2444	1
2444 – 2522	1
2522 – 2601	2
2601 – 2680	1
2680 – 2759	0
2759 – 2838	0
2838 – 2917	0
2917 – 2995	0
2995 – 3074	0
3074 – 3153	1

pct_rent_burdened numeric feature

This is a numeric percentage indicating the share of rent-burdened households per record, ranging from 0 to 100 with a mean of 49.87 and median of 50.0. The distribution is nearly symmetric (skew -0.04) and reasonably tight around the middle (IQR 17.9, std 14.6), with 4.38% nulls and only 0.36% zeros. 62 outliers (2.79%) sit beyond the whiskers, but no severe tail or drift is evident.

Treatment: Impute the ~4% nulls and use as-is; no transform needed given near-symmetric bounded percentage.

anthropic:claude-opus-4-7 · confidence high

Out[58]:

saturn.columns["pct_rent_burdened"].stats

stat	value
n	2,327
nulls	102 (4.4%)
unique	596
min	0
max	100
mean	49.87
median	50
std	14.62
q1	40.9
q3	58.8
iqr	17.9
skew	-0.03839
kurtosis	0.7849
n_outliers	62
outlier_rate	0.02787
zero_rate	0.003596

Fig 23.

Distribution of pct_rent_burdened. Vertical dash marks the median.

Show data table

Histogram bins for pct_rent_burdened (median: 50.0).
bin	count
0 – 2.5	8
2.5 – 5	3
5 – 7.5	1
7.5 – 10	5
10 – 12.5	7
12.5 – 15	8
15 – 17.5	12
17.5 – 20	14
20 – 22.5	14
22.5 – 25	24
25 – 27.5	35
27.5 – 30	42
30 – 32.5	53
32.5 – 35	80
35 – 37.5	91
37.5 – 40	119
40 – 42.5	129
42.5 – 45	144
45 – 47.5	146
47.5 – 50	177
50 – 52.5	139
52.5 – 55	178
55 – 57.5	162
57.5 – 60	131
60 – 62.5	117
62.5 – 65	97
65 – 67.5	60
67.5 – 70	57
70 – 72.5	54
72.5 – 75	28
75 – 77.5	20
77.5 – 80	25
80 – 82.5	8
82.5 – 85	4
85 – 87.5	12
87.5 – 90	5
90 – 92.5	5
92.5 – 95	3
95 – 97.5	0
97.5 – 100	8

median_gross_rent numeric feature

This is a numeric feature for median gross rent, with 2327 non-null values and 1232 unique levels. The middle of the distribution looks plausible (median 1735, IQR 1441.5–2049, max 3501), but the minimum is -666666666 and the mean is -41539608.8 with std 161182638.7, indicating sentinel values masquerading as numbers and producing severe negative skew (-3.62) and 289 outliers (12.4%).

Treatment: Replace the -666666666 sentinel with null before any modelling or aggregation.

anthropic:claude-opus-4-7 · confidence high

Out[61]:

saturn.columns["median_gross_rent"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,232
min	-6.667e+08
max	3,501
mean	-4.154e+07
median	1,735
std	1.612e+08
q1	1442
q3	2,049
iqr	607.5
skew	-3.621
kurtosis	11.11
n_outliers	289
outlier_rate	0.1242
zero_rate	0
alert: high_skew	skew=-3.62
alert: outliers	12.4% rows beyond 1.5 IQR

Fig 24.

Distribution of median_gross_rent. Vertical dash marks the median.

Show data table

Histogram bins for median_gross_rent (median: 1735.0).
bin	count
-6.667e+08 – -6.5e+08	145
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.167e+08	0
-6.167e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.667e+08	0
-5.667e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.167e+08	0
-5.167e+08 – -5e+08	0
-5e+08 – -4.833e+08	0
-4.833e+08 – -4.667e+08	0
-4.667e+08 – -4.5e+08	0
-4.5e+08 – -4.333e+08	0
-4.333e+08 – -4.167e+08	0
-4.167e+08 – -4e+08	0
-4e+08 – -3.833e+08	0
-3.833e+08 – -3.667e+08	0
-3.667e+08 – -3.5e+08	0
-3.5e+08 – -3.333e+08	0
-3.333e+08 – -3.167e+08	0
-3.167e+08 – -3e+08	0
-3e+08 – -2.833e+08	0
-2.833e+08 – -2.667e+08	0
-2.667e+08 – -2.5e+08	0
-2.5e+08 – -2.333e+08	0
-2.333e+08 – -2.167e+08	0
-2.167e+08 – -2e+08	0
-2e+08 – -1.833e+08	0
-1.833e+08 – -1.667e+08	0
-1.667e+08 – -1.5e+08	0
-1.5e+08 – -1.333e+08	0
-1.333e+08 – -1.167e+08	0
-1.167e+08 – -1e+08	0
-1e+08 – -8.333e+07	0
-8.333e+07 – -6.666e+07	0
-6.666e+07 – -5e+07	0
-5e+07 – -3.333e+07	0
-3.333e+07 – -1.666e+07	0
-1.666e+07 – 3501	2182

median_household_income numeric feature

Median household income in dollars per record, fully populated across 2,327 rows with 2,106 unique values and a sensible median of 76,833 and IQR of 49,117. The mean of -36,017,397 and minimum of -666,666,666 are sentinel-coded missing values masquerading as numbers, which drag skew to -3.94 and kurtosis to 13.53. Roughly 8.9% of rows (208) are flagged as outliers, almost certainly the same sentinel contamination.

Treatment: Replace -666666666 sentinel with null, then consider log-transform or winsorisation before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[64]:

saturn.columns["median_household_income"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	2,106
min	-6.667e+08
max	250,001
mean	-3.602e+07
median	76,833
std	1.509e+08
q1	5.324e+04
q3	1.024e+05
iqr	49,117
skew	-3.94
kurtosis	13.53
n_outliers	208
outlier_rate	0.08939
zero_rate	0
alert: high_skew	skew=-3.94
alert: outliers	8.9% rows beyond 1.5 IQR

Fig 25.

Distribution of median_household_income. Vertical dash marks the median.

Show data table

Histogram bins for median_household_income (median: 76833.0).
bin	count
-6.667e+08 – -6.5e+08	126
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.166e+08	0
-6.166e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.666e+08	0
-5.666e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.166e+08	0
-5.166e+08 – -4.999e+08	0
-4.999e+08 – -4.833e+08	0
-4.833e+08 – -4.666e+08	0
-4.666e+08 – -4.499e+08	0
-4.499e+08 – -4.332e+08	0
-4.332e+08 – -4.166e+08	0
-4.166e+08 – -3.999e+08	0
-3.999e+08 – -3.832e+08	0
-3.832e+08 – -3.666e+08	0
-3.666e+08 – -3.499e+08	0
-3.499e+08 – -3.332e+08	0
-3.332e+08 – -3.165e+08	0
-3.165e+08 – -2.999e+08	0
-2.999e+08 – -2.832e+08	0
-2.832e+08 – -2.665e+08	0
-2.665e+08 – -2.498e+08	0
-2.498e+08 – -2.332e+08	0
-2.332e+08 – -2.165e+08	0
-2.165e+08 – -1.998e+08	0
-1.998e+08 – -1.832e+08	0
-1.832e+08 – -1.665e+08	0
-1.665e+08 – -1.498e+08	0
-1.498e+08 – -1.331e+08	0
-1.331e+08 – -1.165e+08	0
-1.165e+08 – -9.979e+07	0
-9.979e+07 – -8.311e+07	0
-8.311e+07 – -6.644e+07	0
-6.644e+07 – -4.977e+07	0
-4.977e+07 – -3.31e+07	0
-3.31e+07 – -1.642e+07	0
-1.642e+07 – 2.5e+05	2201

total_households numeric feature

Counts of households per record, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zero, which may indicate unpopulated or placeholder areas.

Treatment: Log-transform or winsorize before modelling and decide whether zero-household rows should be filtered.

anthropic:claude-opus-4-7 · confidence high

Out[67]:

saturn.columns["total_households"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,495
min	0
max	8,209
mean	1411
median	1,252
std	923.3
q1	773.5
q3	1,850
iqr	1076
skew	1.479
kurtosis	4.377
n_outliers	70
outlier_rate	0.03008
zero_rate	0.04125

Fig 26.

Distribution of total_households. Vertical dash marks the median.

Show data table

Histogram bins for total_households (median: 1252.0).
bin	count
0 – 205.2	123
205.2 – 410.4	41
410.4 – 615.7	203
615.7 – 820.9	272
820.9 – 1026	269
1026 – 1231	237
1231 – 1437	215
1437 – 1642	221
1642 – 1847	162
1847 – 2052	134
2052 – 2257	94
2257 – 2463	101
2463 – 2668	66
2668 – 2873	39
2873 – 3078	35
3078 – 3284	24
3284 – 3489	22
3489 – 3694	8
3694 – 3899	7
3899 – 4104	9
4104 – 4310	13
4310 – 4515	9
4515 – 4720	5
4720 – 4925	5
4925 – 5131	3
5131 – 5336	2
5336 – 5541	2
5541 – 5746	0
5746 – 5952	1
5952 – 6157	1
6157 – 6362	0
6362 – 6567	0
6567 – 6772	1
6772 – 6978	2
6978 – 7183	0
7183 – 7388	0
7388 – 7593	0
7593 – 7799	0
7799 – 8004	0
8004 – 8209	1

owner_occupied numeric feature

Despite the boolean-sounding name 'owner_occupied', this is a numeric count column with 1001 unique values ranging from 0 to 3052 and a mean of 464.6 — likely a tally of owner-occupied units per record (e.g., per tract or block). The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% zeros. No nulls are present.

Treatment: Log-transform (log1p to handle the 7% zeros) before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence medium

Out[70]:

saturn.columns["owner_occupied"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,001
min	0
max	3,052
mean	464.6
median	371
std	422.6
q1	177
q3	608
iqr	431
skew	1.761
kurtosis	4.254
n_outliers	143
outlier_rate	0.06145
zero_rate	0.0722
alert: outliers	6.1% rows beyond 1.5 IQR

Fig 27.

Distribution of owner_occupied. Vertical dash marks the median.

Show data table

Histogram bins for owner_occupied (median: 371.0).
bin	count
0 – 76.3	343
76.3 – 152.6	175
152.6 – 228.9	191
228.9 – 305.2	236
305.2 – 381.5	258
381.5 – 457.8	245
457.8 – 534.1	167
534.1 – 610.4	134
610.4 – 686.7	98
686.7 – 763	69
763 – 839.3	61
839.3 – 915.6	49
915.6 – 991.9	53
991.9 – 1068	43
1068 – 1144	33
1144 – 1221	21
1221 – 1297	20
1297 – 1373	28
1373 – 1450	16
1450 – 1526	18
1526 – 1602	9
1602 – 1679	13
1679 – 1755	9
1755 – 1831	8
1831 – 1908	5
1908 – 1984	2
1984 – 2060	4
2060 – 2136	3
2136 – 2213	3
2213 – 2289	1
2289 – 2365	2
2365 – 2442	2
2442 – 2518	3
2518 – 2594	0
2594 – 2670	3
2670 – 2747	0
2747 – 2823	1
2823 – 2899	0
2899 – 2976	0
2976 – 3052	1

renter_occupied numeric feature

Counts of renter-occupied units per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) and 4.38% zeros, consistent with area-level housing tallies rather than a per-household flag.

Treatment: log-transform or scale before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[73]:

saturn.columns["renter_occupied"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,418
min	0
max	8,209
mean	946.1
median	726
std	815.4
q1	346
q3	1,357
iqr	1,011
skew	1.595
kurtosis	4.627
n_outliers	69
outlier_rate	0.02965
zero_rate	0.04383

Fig 28.

Distribution of renter_occupied. Vertical dash marks the median.

Show data table

Histogram bins for renter_occupied (median: 726.0).
bin	count
0 – 205.2	349
205.2 – 410.4	358
410.4 – 615.7	292
615.7 – 820.9	268
820.9 – 1026	207
1026 – 1231	175
1231 – 1437	168
1437 – 1642	110
1642 – 1847	100
1847 – 2052	68
2052 – 2257	63
2257 – 2463	42
2463 – 2668	36
2668 – 2873	22
2873 – 3078	19
3078 – 3284	17
3284 – 3489	6
3489 – 3694	5
3694 – 3899	4
3899 – 4104	6
4104 – 4310	5
4310 – 4515	3
4515 – 4720	1
4720 – 4925	0
4925 – 5131	1
5131 – 5336	1
5336 – 5541	0
5541 – 5746	0
5746 – 5952	0
5952 – 6157	0
6157 – 6362	0
6362 – 6567	0
6567 – 6772	0
6772 – 6978	0
6978 – 7183	0
7183 – 7388	0
7388 – 7593	0
7593 – 7799	0
7799 – 8004	0
8004 – 8209	1

pct_owner_occupied numeric feature

Percentage of owner-occupied housing per record, ranging the full 0-100 scale with a mean of 37.5 and median of 34.4. The distribution is wide (std 25.7, IQR 39.7) and slightly right-skewed (0.39) with negative kurtosis (-0.85), indicating a flat, near-uniform spread rather than a tight central mass. About 3.2% of rows are exactly zero and 4.1% are null, but no statistical outliers were flagged.

Treatment: Use as-is as a bounded percentage feature; impute the 4% nulls with the median or a missingness flag.

anthropic:claude-opus-4-7 · confidence high

Out[76]:

saturn.columns["pct_owner_occupied"].stats

stat	value
n	2,327
nulls	96 (4.1%)
unique	823
min	0
max	100
mean	37.51
median	34.4
std	25.65
q1	16.4
q3	56.1
iqr	39.7
skew	0.3948
kurtosis	-0.854
n_outliers	0
outlier_rate	0
zero_rate	0.03227

Fig 29.

Distribution of pct_owner_occupied. Vertical dash marks the median.

Show data table

Histogram bins for pct_owner_occupied (median: 34.4).
bin	count
0 – 2.5	141
2.5 – 5	86
5 – 7.5	71
7.5 – 10	63
10 – 12.5	86
12.5 – 15	65
15 – 17.5	72
17.5 – 20	88
20 – 22.5	98
22.5 – 25	88
25 – 27.5	79
27.5 – 30	67
30 – 32.5	70
32.5 – 35	56
35 – 37.5	76
37.5 – 40	68
40 – 42.5	59
42.5 – 45	72
45 – 47.5	58
47.5 – 50	54
50 – 52.5	70
52.5 – 55	59
55 – 57.5	55
57.5 – 60	39
60 – 62.5	44
62.5 – 65	40
65 – 67.5	52
67.5 – 70	34
70 – 72.5	40
72.5 – 75	47
75 – 77.5	32
77.5 – 80	48
80 – 82.5	31
82.5 – 85	27
85 – 87.5	26
87.5 – 90	24
90 – 92.5	23
92.5 – 95	8
95 – 97.5	6
97.5 – 100	9

pct_renter_occupied numeric feature

Numeric percentage of renter-occupied units, ranging the full 0–100 with mean 62.5 and median 65.6, suggesting these records skew toward rental-heavy geographies. Spread is wide (std 25.7, IQR 39.7) and the distribution is mildly left-skewed (-0.39) and flat (kurtosis -0.85), so no outliers were flagged. About 4.1% of rows are null and only 0.27% are exact zeros, with 823 distinct values across 2,327 rows.

Treatment: Use as-is as a bounded percentage feature; impute the 4.1% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[79]:

saturn.columns["pct_renter_occupied"].stats

stat	value
n	2,327
nulls	96 (4.1%)
unique	823
min	0
max	100
mean	62.49
median	65.6
std	25.65
q1	43.9
q3	83.6
iqr	39.7
skew	-0.3948
kurtosis	-0.854
n_outliers	0
outlier_rate	0
zero_rate	0.002689

Fig 30.

Distribution of pct_renter_occupied. Vertical dash marks the median.

Show data table

Histogram bins for pct_renter_occupied (median: 65.6).
bin	count
0 – 2.5	9
2.5 – 5	6
5 – 7.5	7
7.5 – 10	23
10 – 12.5	24
12.5 – 15	26
15 – 17.5	27
17.5 – 20	32
20 – 22.5	47
22.5 – 25	31
25 – 27.5	48
27.5 – 30	40
30 – 32.5	34
32.5 – 35	50
35 – 37.5	39
37.5 – 40	46
40 – 42.5	41
42.5 – 45	53
45 – 47.5	57
47.5 – 50	72
50 – 52.5	54
52.5 – 55	54
55 – 57.5	73
57.5 – 60	62
60 – 62.5	69
62.5 – 65	72
65 – 67.5	60
67.5 – 70	65
70 – 72.5	72
72.5 – 75	75
75 – 77.5	91
77.5 – 80	94
80 – 82.5	92
82.5 – 85	73
85 – 87.5	64
87.5 – 90	83
90 – 92.5	65
92.5 – 95	71
95 – 97.5	83
97.5 – 100	147

nyc housing nyc housing metrics merged

Overview

Summary confidence: high

total_renter_households numeric feature

rent_30_to_34_9_pct numeric feature

rent_35_to_39_9_pct numeric feature

rent_40_to_49_9_pct numeric feature

rent_50_pct_or_more numeric feature

NAME text identifier

state numeric metadata

county numeric feature

tract numeric identifier

county_name categorical feature

moderate_burden numeric feature

severe_burden numeric feature

pct_moderate_burden numeric feature

pct_severe_burden numeric feature

rent_burdened numeric feature

pct_rent_burdened numeric feature

median_gross_rent numeric feature

median_household_income numeric feature

total_households numeric feature

owner_occupied numeric feature

renter_occupied numeric feature

pct_owner_occupied numeric feature

pct_renter_occupied numeric feature

How to cite