nyc housing nyc rent burden by tract
Reading
This dataset covers 2,327 NYC census tracts with 16 columns describing renter households and rent burden levels across the five boroughs. All tracts are in New York State (state is constant at 36) and split across five counties, with Brooklyn (Kings) the largest share at about 34.6% of tracts and Staten Island the smallest at 126 tracts. The headline housing-affordability metric, pct_rent_burdened, is roughly symmetric around a median of 50% with an IQR of 40.9 to 58.8, indicating that in a typical tract about half of renters spend 30%+ of income on rent. The raw count columns (rent_burdened, rent_50_pct_or_more, total_renter_households) are right-skewed with notable outliers, so look at the burden percentages first for cross-tract comparison and reserve the count fields for identifying the highest-volume tracts.
citing: row_count · column_count · columns · kinds
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 8 |
| 2.5 – 5 | 3 |
| 5 – 7.5 | 1 |
| 7.5 – 10 | 5 |
| 10 – 12.5 | 7 |
| 12.5 – 15 | 8 |
| 15 – 17.5 | 12 |
| 17.5 – 20 | 14 |
| 20 – 22.5 | 14 |
| 22.5 – 25 | 24 |
| 25 – 27.5 | 35 |
| 27.5 – 30 | 42 |
| 30 – 32.5 | 53 |
| 32.5 – 35 | 80 |
| 35 – 37.5 | 91 |
| 37.5 – 40 | 119 |
| 40 – 42.5 | 129 |
| 42.5 – 45 | 144 |
| 45 – 47.5 | 146 |
| 47.5 – 50 | 177 |
| 50 – 52.5 | 139 |
| 52.5 – 55 | 178 |
| 55 – 57.5 | 162 |
| 57.5 – 60 | 131 |
| 60 – 62.5 | 117 |
| 62.5 – 65 | 97 |
| 65 – 67.5 | 60 |
| 67.5 – 70 | 57 |
| 70 – 72.5 | 54 |
| 72.5 – 75 | 28 |
| 75 – 77.5 | 20 |
| 77.5 – 80 | 25 |
| 80 – 82.5 | 8 |
| 82.5 – 85 | 4 |
| 85 – 87.5 | 12 |
| 87.5 – 90 | 5 |
| 90 – 92.5 | 5 |
| 92.5 – 95 | 3 |
| 95 – 97.5 | 0 |
| 97.5 – 100 | 8 |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 45 |
| 2.5 – 5 | 14 |
| 5 – 7.5 | 41 |
| 7.5 – 10 | 53 |
| 10 – 12.5 | 94 |
| 12.5 – 15 | 115 |
| 15 – 17.5 | 131 |
| 17.5 – 20 | 160 |
| 20 – 22.5 | 170 |
| 22.5 – 25 | 188 |
| 25 – 27.5 | 188 |
| 27.5 – 30 | 168 |
| 30 – 32.5 | 173 |
| 32.5 – 35 | 157 |
| 35 – 37.5 | 115 |
| 37.5 – 40 | 97 |
| 40 – 42.5 | 73 |
| 42.5 – 45 | 62 |
| 45 – 47.5 | 44 |
| 47.5 – 50 | 35 |
| 50 – 52.5 | 29 |
| 52.5 – 55 | 19 |
| 55 – 57.5 | 18 |
| 57.5 – 60 | 12 |
| 60 – 62.5 | 6 |
| 62.5 – 65 | 4 |
| 65 – 67.5 | 4 |
| 67.5 – 70 | 2 |
| 70 – 72.5 | 1 |
| 72.5 – 75 | 1 |
| 75 – 77.5 | 1 |
| 77.5 – 80 | 1 |
| 80 – 82.5 | 0 |
| 82.5 – 85 | 1 |
| 85 – 87.5 | 1 |
| 87.5 – 90 | 0 |
| 90 – 92.5 | 1 |
| 92.5 – 95 | 0 |
| 95 – 97.5 | 0 |
| 97.5 – 100 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 205.2 | 349 |
| 205.2 – 410.4 | 358 |
| 410.4 – 615.7 | 292 |
| 615.7 – 820.9 | 268 |
| 820.9 – 1026 | 207 |
| 1026 – 1231 | 175 |
| 1231 – 1437 | 168 |
| 1437 – 1642 | 110 |
| 1642 – 1847 | 100 |
| 1847 – 2052 | 68 |
| 2052 – 2257 | 63 |
| 2257 – 2463 | 42 |
| 2463 – 2668 | 36 |
| 2668 – 2873 | 22 |
| 2873 – 3078 | 19 |
| 3078 – 3284 | 17 |
| 3284 – 3489 | 6 |
| 3489 – 3694 | 5 |
| 3694 – 3899 | 4 |
| 3899 – 4104 | 6 |
| 4104 – 4310 | 5 |
| 4310 – 4515 | 3 |
| 4515 – 4720 | 1 |
| 4720 – 4925 | 0 |
| 4925 – 5131 | 1 |
| 5131 – 5336 | 1 |
| 5336 – 5541 | 0 |
| 5541 – 5746 | 0 |
| 5746 – 5952 | 0 |
| 5952 – 6157 | 0 |
| 6157 – 6362 | 0 |
| 6362 – 6567 | 0 |
| 6567 – 6772 | 0 |
| 6772 – 6978 | 0 |
| 6978 – 7183 | 0 |
| 7183 – 7388 | 0 |
| 7388 – 7593 | 0 |
| 7593 – 7799 | 0 |
| 7799 – 8004 | 0 |
| 8004 – 8209 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 78.83 | 310 |
| 78.83 – 157.7 | 256 |
| 157.7 – 236.5 | 264 |
| 236.5 – 315.3 | 231 |
| 315.3 – 394.1 | 190 |
| 394.1 – 473 | 180 |
| 473 – 551.8 | 147 |
| 551.8 – 630.6 | 113 |
| 630.6 – 709.4 | 108 |
| 709.4 – 788.2 | 75 |
| 788.2 – 867.1 | 91 |
| 867.1 – 945.9 | 73 |
| 945.9 – 1025 | 57 |
| 1025 – 1104 | 39 |
| 1104 – 1182 | 41 |
| 1182 – 1261 | 23 |
| 1261 – 1340 | 26 |
| 1340 – 1419 | 20 |
| 1419 – 1498 | 19 |
| 1498 – 1576 | 11 |
| 1576 – 1655 | 16 |
| 1655 – 1734 | 6 |
| 1734 – 1813 | 5 |
| 1813 – 1892 | 6 |
| 1892 – 1971 | 4 |
| 1971 – 2049 | 2 |
| 2049 – 2128 | 3 |
| 2128 – 2207 | 5 |
| 2207 – 2286 | 0 |
| 2286 – 2365 | 0 |
| 2365 – 2444 | 1 |
| 2444 – 2522 | 1 |
| 2522 – 2601 | 2 |
| 2601 – 2680 | 1 |
| 2680 – 2759 | 0 |
| 2759 – 2838 | 0 |
| 2838 – 2917 | 0 |
| 2917 – 2995 | 0 |
| 2995 – 3074 | 0 |
| 3074 – 3153 | 1 |
Schema
16 columns| Alerts | ||||
|---|---|---|---|---|
| total_renter_households | numeric | 0.0% | 1,418 |
|
| rent_30_to_34_9_pct | numeric | 0.0% | 355 |
high_skew
outliers
|
| rent_35_to_39_9_pct | numeric | 0.0% | 270 |
high_skew
|
| rent_40_to_49_9_pct | numeric | 0.0% | 322 |
high_skew
|
| rent_50_pct_or_more | numeric | 0.0% | 706 |
|
| NAME | text | 0.0% | 2,327 |
near_unique
|
| state | numeric | 0.0% | 1 |
constant
|
| county | numeric | 0.0% | 5 |
|
| tract | numeric | 0.0% | 1,530 |
high_skew
|
| county_name | categorical | 0.0% | 5 |
|
| moderate_burden | numeric | 0.0% | 639 |
|
| severe_burden | numeric | 0.0% | 706 |
|
| pct_moderate_burden | numeric | 4.4% | 461 |
|
| pct_severe_burden | numeric | 4.4% | 518 |
|
| rent_burdened | numeric | 0.0% | 1,013 |
|
| pct_rent_burdened | numeric | 4.4% | 596 |
|
total_renter_households
numeric featureThis column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end, and 4.38% of rows are zero. No nulls, and 1418 unique values across 2327 rows suggest some repeated counts. Treatment: Log-transform (with zero handling) before regression due to right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,418
- min
- 0
- max
- 8,209
- mean
- 946.1
- median
- 726
- std
- 815.4
- q1
- 346
- q3
- 1,357
- iqr
- 1,011
- skew
- 1.595
- kurtosis
- 4.627
- n_outliers
- 69
- outlier_rate
- 0.02965
- zero_rate
- 0.04383
rent_30_to_34_9_pct
numeric feature high_skew outliersThis appears to be a count of households paying 30% to 34.9% of income on rent within some geographic unit (likely census tract or ZIP). The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with median 51 but mean 83 and max 1205, and 16.2% of rows are zero. About 5.3% of values (124 rows) flag as outliers, suggesting a few large areas dominate the tail. Treatment: Log1p-transform or convert to a share of total households before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 355
- min
- 0
- max
- 1,205
- mean
- 83.05
- median
- 51
- std
- 100.3
- q1
- 15
- q3
- 116
- iqr
- 101
- skew
- 2.755
- kurtosis
- 13.86
- n_outliers
- 124
- outlier_rate
- 0.05329
- zero_rate
- 0.1616
rent_35_to_39_9_pct
numeric feature high_skewThis column appears to be a count of housing units (or households) paying 35% to 39.9% of income on rent, aggregated per geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with median 35 but max 633, and nearly 20% of rows are zero (zero_rate 0.196), pointing to many small or sparsely populated areas alongside a long tail of larger ones. 110 outliers (4.7%) sit above the IQR fence of 10–83. Treatment: Log1p-transform before modelling to tame the skew and zero inflation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 270
- min
- 0
- max
- 633
- mean
- 58.35
- median
- 35
- std
- 69.85
- q1
- 10
- q3
- 83
- iqr
- 73
- skew
- 2.395
- kurtosis
- 9.275
- n_outliers
- 110
- outlier_rate
- 0.04727
- zero_rate
- 0.1964
rent_40_to_49_9_pct
numeric feature high_skewLikely a count of households whose rent falls in the 40-49.9% income bracket per geographic unit. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740, and 15.6% of rows are zero, suggesting many small areas with no such households alongside a long tail of large ones. About 4.8% of values are flagged as outliers. Treatment: log1p-transform before regression to tame the right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 322
- min
- 0
- max
- 740
- mean
- 74.68
- median
- 49
- std
- 83.79
- q1
- 14
- q3
- 106
- iqr
- 92
- skew
- 2.137
- kurtosis
- 7.139
- n_outliers
- 111
- outlier_rate
- 0.0477
- zero_rate
- 0.1556
rent_50_pct_or_more
numeric featureThis column likely counts households (or housing units) spending 50% or more of income on rent within each geographic record. Values span 0 to 1918 with a median of 184 and mean of 253.2, and the distribution is right-skewed (skew 1.60, kurtosis 3.44) with 87 high-end outliers (3.7%). About 6.3% of rows are zero and there are no nulls across 2327 rows. Treatment: Log1p-transform before regression to tame the right skew, and consider normalizing by total renter households for cross-area comparability.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 706
- min
- 0
- max
- 1,918
- mean
- 253.2
- median
- 184
- std
- 236.6
- q1
- 82
- q3
- 360
- iqr
- 278
- skew
- 1.603
- kurtosis
- 3.435
- n_outliers
- 87
- outlier_rate
- 0.03739
- zero_rate
- 0.06274
NAME
text identifier near_uniqueThis column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the boilerplate tokens 'new', 'york', 'census', 'tract', and 'county;', followed by a borough name (Kings 805, Queens 725, Bronx 361, Richmond 126). It functions as a row identifier rather than a feature, though the embedded borough token is the only varying signal worth extracting. Treatment: Treat as a unique key; if needed, parse out the borough token as a categorical feature and drop the rest.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,327
- len_min
- 38
- len_max
- 46
- len_mean
- 41.65
- len_median
- 41
- len_p95
- 46
- word_mean
- 7.133
- word_median
- 7
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,539
- readability_flesch_mean
- 91.45
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
numeric metadata constantThe column 'state' is a numeric field that holds the single value 36 across all 2327 rows with no nulls. It carries a 'constant' alert and contributes zero variance (std 0.0, n_unique 1), suggesting it is a leftover filter key (perhaps a state/region code) rather than a usable feature. Treatment: Drop before modelling; constant column adds no signal.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 36
- max
- 36
- mean
- 36
- median
- 36
- std
- 0
- q1
- 36
- q3
- 36
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric identifierDespite being typed numeric, `county` only takes 5 distinct values across 2327 rows (min 5, max 85), so these integers are almost certainly encoded county identifiers rather than measurements. The distribution is left-skewed (skew -0.72) with median 47 below mean 55, and quartiles land exactly on observed codes (Q1=47, Q3=81), confirming a small categorical support. No nulls or outliers are reported. Treatment: Cast to categorical and one-hot encode (or treat as a lookup key) rather than using as a continuous feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- min
- 5
- max
- 85
- mean
- 55
- median
- 47
- std
- 25.97
- q1
- 47
- q3
- 81
- iqr
- 34
- skew
- -0.72
- kurtosis
- -0.4531
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
tract
numeric identifier high_skewCensus tract codes stored as integers, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 against a median of 30100, which is characteristic of tract identifiers rather than a measurable quantity. The 63 flagged outliers and the heavy tail are artifacts of the coding scheme, not anomalies to clean. Treatment: Treat as a categorical geographic key; do not use as a numeric feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,530
- min
- 100
- max
- 990,100
- mean
- 4.225e+04
- median
- 30,100
- std
- 4.827e+04
- q1
- 15,200
- q3
- 5.79e+04
- iqr
- 4.27e+04
- skew
- 10.14
- kurtosis
- 189.8
- n_outliers
- 63
- outlier_rate
- 0.02707
- zero_rate
- 0
county_name
categorical featureThis column is the NYC borough/county name, with exactly 5 unique values matching the city's five boroughs and no nulls across 2327 rows. Brooklyn (Kings) leads at 34.6% (805), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126); entropy ratio of 0.898 indicates a fairly even spread despite Staten Island being noticeably underrepresented. Treatment: One-hot encode as a low-cardinality categorical feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- Brooklyn (Kings)
- top_rate
- 0.3459
- cardinality
- 5
- entropy
- 2.086
- entropy_ratio
- 0.8985
moderate_burden
numeric featureA non-negative integer count column named 'moderate_burden', with 2327 rows, no nulls, and 639 distinct values ranging from 0 to 1732 (median 159, mean 216). The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% exact zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds baseline. Treatment: Apply a log1p transform before regression to tame the right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 639
- min
- 0
- max
- 1,732
- mean
- 216.1
- median
- 159
- std
- 210.4
- q1
- 64
- q3
- 311
- iqr
- 247
- skew
- 1.934
- kurtosis
- 6.052
- n_outliers
- 86
- outlier_rate
- 0.03696
- zero_rate
- 0.06403
severe_burden
numeric featureNumeric count-like column 'severe_burden' spanning 0 to 1918 across 2327 rows with no nulls and 706 distinct values. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with median 184 well below mean 253.18, an IQR of 278, and 87 outliers (3.7%); 6.3% of rows are exactly zero. Treatment: Apply a log1p transform before regression to tame the right skew and zero inflation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 706
- min
- 0
- max
- 1,918
- mean
- 253.2
- median
- 184
- std
- 236.6
- q1
- 82
- q3
- 360
- iqr
- 278
- skew
- 1.603
- kurtosis
- 3.435
- n_outliers
- 87
- outlier_rate
- 0.03739
- zero_rate
- 0.06274
pct_moderate_burden
numeric featurePercentage of households with a moderate housing-cost burden, expressed on a 0-100 scale (min 0.0, max 100.0, mean 22.74, median 21.8). The distribution is right-skewed (skew 1.51, kurtosis 6.70) with a tight IQR of 12.3 around the median but a long upper tail producing 59 outliers (2.65%). About 4.38% of rows are null and 2.11% are exact zeros, both worth checking before modelling. Treatment: Impute the 4.38% nulls and consider a mild transform or winsorisation given the right skew before regression.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 461
- min
- 0
- max
- 100
- mean
- 22.74
- median
- 21.8
- std
- 11.36
- q1
- 15.9
- q3
- 28.2
- iqr
- 12.3
- skew
- 1.509
- kurtosis
- 6.704
- n_outliers
- 59
- outlier_rate
- 0.02652
- zero_rate
- 0.02112
pct_severe_burden
numeric featureA percentage feature (0–100 range) capturing the share of some population under severe burden, averaging 27.1% with a median of 26.2 and IQR of 15.9. The distribution is mildly right-skewed (0.57) with 30 outliers (1.35%) reaching up to 100, and 4.38% of rows are null. With 518 unique values across 2327 rows and a 1.98% zero rate, it behaves as a continuous rate rather than a categorical bucket. Treatment: Impute the ~4% missing values and use as-is in modelling; mild skew rarely needs transformation.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 518
- min
- 0
- max
- 100
- mean
- 27.12
- median
- 26.2
- std
- 12.68
- q1
- 18.7
- q3
- 34.6
- iqr
- 15.9
- skew
- 0.5663
- kurtosis
- 1.222
- n_outliers
- 30
- outlier_rate
- 0.01348
- zero_rate
- 0.01978
rent_burdened
numeric featureLikely a per-record count or dollar measure of rent-burdened households (or burden amount), ranging 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 high outliers (3.5%) and 4.7% zeros. With 1013 unique values across 2327 rows and no nulls, it behaves like a continuous feature rather than a category. Treatment: Log1p-transform before regression to tame the right skew and outliers.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,013
- min
- 0
- max
- 3,153
- mean
- 469.3
- median
- 358
- std
- 415.3
- q1
- 164.5
- q3
- 670
- iqr
- 505.5
- skew
- 1.494
- kurtosis
- 3.005
- n_outliers
- 82
- outlier_rate
- 0.03524
- zero_rate
- 0.04727
pct_rent_burdened
numeric featureLikely the share of households that are rent-burdened, expressed as a percentage from 0 to 100. The distribution is roughly symmetric (skew -0.04) and centered near 50 (mean 49.87, median 50.0) with an IQR of 17.9, suggesting a wide spread across geographies. About 4.4% of rows are null and 62 values (2.8%) fall outside the Tukey fences, including some at the 0 and 100 extremes. Treatment: Use as-is for modelling; impute the ~4% missing and consider clipping the 0/100 extremes if they represent small denominators.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 596
- min
- 0
- max
- 100
- mean
- 49.87
- median
- 50
- std
- 14.62
- q1
- 40.9
- q3
- 58.8
- iqr
- 17.9
- skew
- -0.03839
- kurtosis
- 0.7849
- n_outliers
- 62
- outlier_rate
- 0.02787
- zero_rate
- 0.003596