saturn·

nyc housing nyc rent burden by tract

source /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_rent_burden_by_tract.csv 2,327 rows 16 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers 2,327 NYC census tracts with 16 columns describing renter households and rent burden levels across the five boroughs. All tracts are in New York State (state is constant at 36) and split across five counties, with Brooklyn (Kings) the largest share at about 34.6% of tracts and Staten Island the smallest at 126 tracts. The headline housing-affordability metric, pct_rent_burdened, is roughly symmetric around a median of 50% with an IQR of 40.9 to 58.8, indicating that in a typical tract about half of renters spend 30%+ of income on rent. The raw count columns (rent_burdened, rent_50_pct_or_more, total_renter_households) are right-skewed with notable outliers, so look at the burden percentages first for cross-tract comparison and reserve the count fields for identifying the highest-volume tracts.

citing: row_count · column_count · columns · kinds

Schema

16 columns
Per-column summary. Click column name to jump to its detail.
Alerts
total_renter_households numeric 0.0% 1,418
rent_30_to_34_9_pct numeric 0.0% 355
high_skew outliers
rent_35_to_39_9_pct numeric 0.0% 270
high_skew
rent_40_to_49_9_pct numeric 0.0% 322
high_skew
rent_50_pct_or_more numeric 0.0% 706
NAME text 0.0% 2,327
near_unique
state numeric 0.0% 1
constant
county numeric 0.0% 5
tract numeric 0.0% 1,530
high_skew
county_name categorical 0.0% 5
moderate_burden numeric 0.0% 639
severe_burden numeric 0.0% 706
pct_moderate_burden numeric 4.4% 461
pct_severe_burden numeric 4.4% 518
rent_burdened numeric 0.0% 1,013
pct_rent_burdened numeric 4.4% 596

total_renter_households

numeric feature
This column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end, and 4.38% of rows are zero. No nulls, and 1418 unique values across 2327 rows suggest some repeated counts. Treatment: Log-transform (with zero handling) before regression due to right skew. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,418
min
0
max
8,209
mean
946.1
median
726
std
815.4
q1
346
q3
1,357
iqr
1,011
skew
1.595
kurtosis
4.627
n_outliers
69
outlier_rate
0.02965
zero_rate
0.04383

rent_30_to_34_9_pct

numeric feature high_skew outliers
This appears to be a count of households paying 30% to 34.9% of income on rent within some geographic unit (likely census tract or ZIP). The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with median 51 but mean 83 and max 1205, and 16.2% of rows are zero. About 5.3% of values (124 rows) flag as outliers, suggesting a few large areas dominate the tail. Treatment: Log1p-transform or convert to a share of total households before modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
355
min
0
max
1,205
mean
83.05
median
51
std
100.3
q1
15
q3
116
iqr
101
skew
2.755
kurtosis
13.86
n_outliers
124
outlier_rate
0.05329
zero_rate
0.1616

rent_35_to_39_9_pct

numeric feature high_skew
This column appears to be a count of housing units (or households) paying 35% to 39.9% of income on rent, aggregated per geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with median 35 but max 633, and nearly 20% of rows are zero (zero_rate 0.196), pointing to many small or sparsely populated areas alongside a long tail of larger ones. 110 outliers (4.7%) sit above the IQR fence of 10–83. Treatment: Log1p-transform before modelling to tame the skew and zero inflation. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
270
min
0
max
633
mean
58.35
median
35
std
69.85
q1
10
q3
83
iqr
73
skew
2.395
kurtosis
9.275
n_outliers
110
outlier_rate
0.04727
zero_rate
0.1964

rent_40_to_49_9_pct

numeric feature high_skew
Likely a count of households whose rent falls in the 40-49.9% income bracket per geographic unit. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740, and 15.6% of rows are zero, suggesting many small areas with no such households alongside a long tail of large ones. About 4.8% of values are flagged as outliers. Treatment: log1p-transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
322
min
0
max
740
mean
74.68
median
49
std
83.79
q1
14
q3
106
iqr
92
skew
2.137
kurtosis
7.139
n_outliers
111
outlier_rate
0.0477
zero_rate
0.1556

rent_50_pct_or_more

numeric feature
This column likely counts households (or housing units) spending 50% or more of income on rent within each geographic record. Values span 0 to 1918 with a median of 184 and mean of 253.2, and the distribution is right-skewed (skew 1.60, kurtosis 3.44) with 87 high-end outliers (3.7%). About 6.3% of rows are zero and there are no nulls across 2327 rows. Treatment: Log1p-transform before regression to tame the right skew, and consider normalizing by total renter households for cross-area comparability. medium · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
706
min
0
max
1,918
mean
253.2
median
184
std
236.6
q1
82
q3
360
iqr
278
skew
1.603
kurtosis
3.435
n_outliers
87
outlier_rate
0.03739
zero_rate
0.06274

NAME

text identifier near_unique
This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the boilerplate tokens 'new', 'york', 'census', 'tract', and 'county;', followed by a borough name (Kings 805, Queens 725, Bronx 361, Richmond 126). It functions as a row identifier rather than a feature, though the embedded borough token is the only varying signal worth extracting. Treatment: Treat as a unique key; if needed, parse out the borough token as a categorical feature and drop the rest. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
2,327
len_min
38
len_max
46
len_mean
41.65
len_median
41
len_p95
46
word_mean
7.133
word_median
7
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,539
readability_flesch_mean
91.45
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

numeric metadata constant
The column 'state' is a numeric field that holds the single value 36 across all 2327 rows with no nulls. It carries a 'constant' alert and contributes zero variance (std 0.0, n_unique 1), suggesting it is a leftover filter key (perhaps a state/region code) rather than a usable feature. Treatment: Drop before modelling; constant column adds no signal. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1
min
36
max
36
mean
36
median
36
std
0
q1
36
q3
36
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric identifier
Despite being typed numeric, `county` only takes 5 distinct values across 2327 rows (min 5, max 85), so these integers are almost certainly encoded county identifiers rather than measurements. The distribution is left-skewed (skew -0.72) with median 47 below mean 55, and quartiles land exactly on observed codes (Q1=47, Q3=81), confirming a small categorical support. No nulls or outliers are reported. Treatment: Cast to categorical and one-hot encode (or treat as a lookup key) rather than using as a continuous feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
min
5
max
85
mean
55
median
47
std
25.97
q1
47
q3
81
iqr
34
skew
-0.72
kurtosis
-0.4531
n_outliers
0
outlier_rate
0
zero_rate
0

tract

numeric identifier high_skew
Census tract codes stored as integers, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 against a median of 30100, which is characteristic of tract identifiers rather than a measurable quantity. The 63 flagged outliers and the heavy tail are artifacts of the coding scheme, not anomalies to clean. Treatment: Treat as a categorical geographic key; do not use as a numeric feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,530
min
100
max
990,100
mean
4.225e+04
median
30,100
std
4.827e+04
q1
15,200
q3
5.79e+04
iqr
4.27e+04
skew
10.14
kurtosis
189.8
n_outliers
63
outlier_rate
0.02707
zero_rate
0

county_name

categorical feature
This column is the NYC borough/county name, with exactly 5 unique values matching the city's five boroughs and no nulls across 2327 rows. Brooklyn (Kings) leads at 34.6% (805), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126); entropy ratio of 0.898 indicates a fairly even spread despite Staten Island being noticeably underrepresented. Treatment: One-hot encode as a low-cardinality categorical feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
top_value
Brooklyn (Kings)
top_rate
0.3459
cardinality
5
entropy
2.086
entropy_ratio
0.8985

moderate_burden

numeric feature
A non-negative integer count column named 'moderate_burden', with 2327 rows, no nulls, and 639 distinct values ranging from 0 to 1732 (median 159, mean 216). The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% exact zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds baseline. Treatment: Apply a log1p transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
639
min
0
max
1,732
mean
216.1
median
159
std
210.4
q1
64
q3
311
iqr
247
skew
1.934
kurtosis
6.052
n_outliers
86
outlier_rate
0.03696
zero_rate
0.06403

severe_burden

numeric feature
Numeric count-like column 'severe_burden' spanning 0 to 1918 across 2327 rows with no nulls and 706 distinct values. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with median 184 well below mean 253.18, an IQR of 278, and 87 outliers (3.7%); 6.3% of rows are exactly zero. Treatment: Apply a log1p transform before regression to tame the right skew and zero inflation. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
706
min
0
max
1,918
mean
253.2
median
184
std
236.6
q1
82
q3
360
iqr
278
skew
1.603
kurtosis
3.435
n_outliers
87
outlier_rate
0.03739
zero_rate
0.06274

pct_moderate_burden

numeric feature
Percentage of households with a moderate housing-cost burden, expressed on a 0-100 scale (min 0.0, max 100.0, mean 22.74, median 21.8). The distribution is right-skewed (skew 1.51, kurtosis 6.70) with a tight IQR of 12.3 around the median but a long upper tail producing 59 outliers (2.65%). About 4.38% of rows are null and 2.11% are exact zeros, both worth checking before modelling. Treatment: Impute the 4.38% nulls and consider a mild transform or winsorisation given the right skew before regression. high · anthropic:claude-opus-4-7
n
2,327
nulls
102 (4.4%)
unique
461
min
0
max
100
mean
22.74
median
21.8
std
11.36
q1
15.9
q3
28.2
iqr
12.3
skew
1.509
kurtosis
6.704
n_outliers
59
outlier_rate
0.02652
zero_rate
0.02112

pct_severe_burden

numeric feature
A percentage feature (0–100 range) capturing the share of some population under severe burden, averaging 27.1% with a median of 26.2 and IQR of 15.9. The distribution is mildly right-skewed (0.57) with 30 outliers (1.35%) reaching up to 100, and 4.38% of rows are null. With 518 unique values across 2327 rows and a 1.98% zero rate, it behaves as a continuous rate rather than a categorical bucket. Treatment: Impute the ~4% missing values and use as-is in modelling; mild skew rarely needs transformation. high · anthropic:claude-opus-4-7
n
2,327
nulls
102 (4.4%)
unique
518
min
0
max
100
mean
27.12
median
26.2
std
12.68
q1
18.7
q3
34.6
iqr
15.9
skew
0.5663
kurtosis
1.222
n_outliers
30
outlier_rate
0.01348
zero_rate
0.01978

rent_burdened

numeric feature
Likely a per-record count or dollar measure of rent-burdened households (or burden amount), ranging 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 high outliers (3.5%) and 4.7% zeros. With 1013 unique values across 2327 rows and no nulls, it behaves like a continuous feature rather than a category. Treatment: Log1p-transform before regression to tame the right skew and outliers. medium · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,013
min
0
max
3,153
mean
469.3
median
358
std
415.3
q1
164.5
q3
670
iqr
505.5
skew
1.494
kurtosis
3.005
n_outliers
82
outlier_rate
0.03524
zero_rate
0.04727

pct_rent_burdened

numeric feature
Likely the share of households that are rent-burdened, expressed as a percentage from 0 to 100. The distribution is roughly symmetric (skew -0.04) and centered near 50 (mean 49.87, median 50.0) with an IQR of 17.9, suggesting a wide spread across geographies. About 4.4% of rows are null and 62 values (2.8%) fall outside the Tukey fences, including some at the 0 and 100 extremes. Treatment: Use as-is for modelling; impute the ~4% missing and consider clipping the 0/100 extremes if they represent small denominators. high · anthropic:claude-opus-4-7
n
2,327
nulls
102 (4.4%)
unique
596
min
0
max
100
mean
49.87
median
50
std
14.62
q1
40.9
q3
58.8
iqr
17.9
skew
-0.03839
kurtosis
0.7849
n_outliers
62
outlier_rate
0.02787
zero_rate
0.003596