nyc housing nyc housing metrics merged
Reading
This dataset covers 2,327 NYC census tracts with 23 columns describing housing tenure, rent burden, income, and rent levels across the five boroughs. The most urgent issue is data hygiene: median_gross_rent and median_household_income both contain a sentinel value of -666666666, which drags their means to roughly -41.5M and -36M respectively despite sensible medians (~$1,735 rent, ~$76,833 income) — these need to be filtered before any analysis. Beyond that, the substantive story is rent burden: pct_rent_burdened has a median of 50% with an IQR of 40.9–58.8, meaning half of NYC tracts have a majority of renters paying 30%+ of income on rent. Brooklyn (Kings) dominates the tract count at 35%, followed by Queens (31%) and the Bronx (15%), so any borough-level comparison should weight accordingly. The state column is constant (all 36, New York) and can be dropped.
citing: median_gross_rent · median_household_income · pct_rent_burdened · pct_severe_burden · pct_owner_occupied · county_name · state · total_households
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 8 |
| 2.5 – 5 | 3 |
| 5 – 7.5 | 1 |
| 7.5 – 10 | 5 |
| 10 – 12.5 | 7 |
| 12.5 – 15 | 8 |
| 15 – 17.5 | 12 |
| 17.5 – 20 | 14 |
| 20 – 22.5 | 14 |
| 22.5 – 25 | 24 |
| 25 – 27.5 | 35 |
| 27.5 – 30 | 42 |
| 30 – 32.5 | 53 |
| 32.5 – 35 | 80 |
| 35 – 37.5 | 91 |
| 37.5 – 40 | 119 |
| 40 – 42.5 | 129 |
| 42.5 – 45 | 144 |
| 45 – 47.5 | 146 |
| 47.5 – 50 | 177 |
| 50 – 52.5 | 139 |
| 52.5 – 55 | 178 |
| 55 – 57.5 | 162 |
| 57.5 – 60 | 131 |
| 60 – 62.5 | 117 |
| 62.5 – 65 | 97 |
| 65 – 67.5 | 60 |
| 67.5 – 70 | 57 |
| 70 – 72.5 | 54 |
| 72.5 – 75 | 28 |
| 75 – 77.5 | 20 |
| 77.5 – 80 | 25 |
| 80 – 82.5 | 8 |
| 82.5 – 85 | 4 |
| 85 – 87.5 | 12 |
| 87.5 – 90 | 5 |
| 90 – 92.5 | 5 |
| 92.5 – 95 | 3 |
| 95 – 97.5 | 0 |
| 97.5 – 100 | 8 |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 45 |
| 2.5 – 5 | 14 |
| 5 – 7.5 | 41 |
| 7.5 – 10 | 53 |
| 10 – 12.5 | 94 |
| 12.5 – 15 | 115 |
| 15 – 17.5 | 131 |
| 17.5 – 20 | 160 |
| 20 – 22.5 | 170 |
| 22.5 – 25 | 188 |
| 25 – 27.5 | 188 |
| 27.5 – 30 | 168 |
| 30 – 32.5 | 173 |
| 32.5 – 35 | 157 |
| 35 – 37.5 | 115 |
| 37.5 – 40 | 97 |
| 40 – 42.5 | 73 |
| 42.5 – 45 | 62 |
| 45 – 47.5 | 44 |
| 47.5 – 50 | 35 |
| 50 – 52.5 | 29 |
| 52.5 – 55 | 19 |
| 55 – 57.5 | 18 |
| 57.5 – 60 | 12 |
| 60 – 62.5 | 6 |
| 62.5 – 65 | 4 |
| 65 – 67.5 | 4 |
| 67.5 – 70 | 2 |
| 70 – 72.5 | 1 |
| 72.5 – 75 | 1 |
| 75 – 77.5 | 1 |
| 77.5 – 80 | 1 |
| 80 – 82.5 | 0 |
| 82.5 – 85 | 1 |
| 85 – 87.5 | 1 |
| 87.5 – 90 | 0 |
| 90 – 92.5 | 1 |
| 92.5 – 95 | 0 |
| 95 – 97.5 | 0 |
| 97.5 – 100 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 141 |
| 2.5 – 5 | 86 |
| 5 – 7.5 | 71 |
| 7.5 – 10 | 63 |
| 10 – 12.5 | 86 |
| 12.5 – 15 | 65 |
| 15 – 17.5 | 72 |
| 17.5 – 20 | 88 |
| 20 – 22.5 | 98 |
| 22.5 – 25 | 88 |
| 25 – 27.5 | 79 |
| 27.5 – 30 | 67 |
| 30 – 32.5 | 70 |
| 32.5 – 35 | 56 |
| 35 – 37.5 | 76 |
| 37.5 – 40 | 68 |
| 40 – 42.5 | 59 |
| 42.5 – 45 | 72 |
| 45 – 47.5 | 58 |
| 47.5 – 50 | 54 |
| 50 – 52.5 | 70 |
| 52.5 – 55 | 59 |
| 55 – 57.5 | 55 |
| 57.5 – 60 | 39 |
| 60 – 62.5 | 44 |
| 62.5 – 65 | 40 |
| 65 – 67.5 | 52 |
| 67.5 – 70 | 34 |
| 70 – 72.5 | 40 |
| 72.5 – 75 | 47 |
| 75 – 77.5 | 32 |
| 77.5 – 80 | 48 |
| 80 – 82.5 | 31 |
| 82.5 – 85 | 27 |
| 85 – 87.5 | 26 |
| 87.5 – 90 | 24 |
| 90 – 92.5 | 23 |
| 92.5 – 95 | 8 |
| 95 – 97.5 | 6 |
| 97.5 – 100 | 9 |
Show data table
| bin | count |
|---|---|
| 0 – 205.2 | 123 |
| 205.2 – 410.4 | 41 |
| 410.4 – 615.7 | 203 |
| 615.7 – 820.9 | 272 |
| 820.9 – 1026 | 269 |
| 1026 – 1231 | 237 |
| 1231 – 1437 | 215 |
| 1437 – 1642 | 221 |
| 1642 – 1847 | 162 |
| 1847 – 2052 | 134 |
| 2052 – 2257 | 94 |
| 2257 – 2463 | 101 |
| 2463 – 2668 | 66 |
| 2668 – 2873 | 39 |
| 2873 – 3078 | 35 |
| 3078 – 3284 | 24 |
| 3284 – 3489 | 22 |
| 3489 – 3694 | 8 |
| 3694 – 3899 | 7 |
| 3899 – 4104 | 9 |
| 4104 – 4310 | 13 |
| 4310 – 4515 | 9 |
| 4515 – 4720 | 5 |
| 4720 – 4925 | 5 |
| 4925 – 5131 | 3 |
| 5131 – 5336 | 2 |
| 5336 – 5541 | 2 |
| 5541 – 5746 | 0 |
| 5746 – 5952 | 1 |
| 5952 – 6157 | 1 |
| 6157 – 6362 | 0 |
| 6362 – 6567 | 0 |
| 6567 – 6772 | 1 |
| 6772 – 6978 | 2 |
| 6978 – 7183 | 0 |
| 7183 – 7388 | 0 |
| 7388 – 7593 | 0 |
| 7593 – 7799 | 0 |
| 7799 – 8004 | 0 |
| 8004 – 8209 | 1 |
Schema
23 columns| Alerts | ||||
|---|---|---|---|---|
| total_renter_households | numeric | 0.0% | 1,418 |
|
| rent_30_to_34_9_pct | numeric | 0.0% | 355 |
high_skew
outliers
|
| rent_35_to_39_9_pct | numeric | 0.0% | 270 |
high_skew
|
| rent_40_to_49_9_pct | numeric | 0.0% | 322 |
high_skew
|
| rent_50_pct_or_more | numeric | 0.0% | 706 |
|
| NAME | text | 0.0% | 2,327 |
near_unique
|
| state | numeric | 0.0% | 1 |
constant
|
| county | numeric | 0.0% | 5 |
|
| tract | numeric | 0.0% | 1,530 |
high_skew
|
| county_name | categorical | 0.0% | 5 |
|
| moderate_burden | numeric | 0.0% | 639 |
|
| severe_burden | numeric | 0.0% | 706 |
|
| pct_moderate_burden | numeric | 4.4% | 461 |
|
| pct_severe_burden | numeric | 4.4% | 518 |
|
| rent_burdened | numeric | 0.0% | 1,013 |
|
| pct_rent_burdened | numeric | 4.4% | 596 |
|
| median_gross_rent | numeric | 0.0% | 1,232 |
high_skew
outliers
|
| median_household_income | numeric | 0.0% | 2,106 |
high_skew
outliers
|
| total_households | numeric | 0.0% | 1,495 |
|
| owner_occupied | numeric | 0.0% | 1,001 |
outliers
|
| renter_occupied | numeric | 0.0% | 1,418 |
|
| pct_owner_occupied | numeric | 4.1% | 823 |
|
| pct_renter_occupied | numeric | 4.1% | 823 |
|
total_renter_households
numeric featureThis column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end and 4.38% zero values. No nulls, and 1418 unique values across 2327 rows suggests aggregation at a geographic or administrative unit. Treatment: Log-transform before regression to tame the right skew, and decide whether zero-count rows should be modelled separately.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,418
- min
- 0
- max
- 8,209
- mean
- 946.1
- median
- 726
- std
- 815.4
- q1
- 346
- q3
- 1,357
- iqr
- 1,011
- skew
- 1.595
- kurtosis
- 4.627
- n_outliers
- 69
- outlier_rate
- 0.02965
- zero_rate
- 0.04383
rent_30_to_34_9_pct
numeric feature high_skew outliersLikely a count of households paying 30%-34.9% of income on rent within some geographic unit, given the integer-like values, zero floor, and max of 1205. The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with a median of 51 against a mean of 83.05, and 16.2% of rows are exactly zero. 124 outliers (5.33%) extend far above the Q3 of 116, consistent with a few large areas dominating. Treatment: log1p-transform before modelling to tame the skew and zero inflation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 355
- min
- 0
- max
- 1,205
- mean
- 83.05
- median
- 51
- std
- 100.3
- q1
- 15
- q3
- 116
- iqr
- 101
- skew
- 2.755
- kurtosis
- 13.86
- n_outliers
- 124
- outlier_rate
- 0.05329
- zero_rate
- 0.1616
rent_35_to_39_9_pct
numeric feature high_skewLikely a count of households (or housing units) paying 35% to 39.9% of income on rent within some geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with a median of 35 but a max of 633, and nearly 20% of rows are zero (zero_rate 0.196), suggesting many small areas have no households in this rent burden bracket. 110 outliers (4.7%) sit well above the Q3 of 83. Treatment: Log1p-transform before regression to tame the right skew and zero inflation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 270
- min
- 0
- max
- 633
- mean
- 58.35
- median
- 35
- std
- 69.85
- q1
- 10
- q3
- 83
- iqr
- 73
- skew
- 2.395
- kurtosis
- 9.275
- n_outliers
- 110
- outlier_rate
- 0.04727
- zero_rate
- 0.1964
rent_40_to_49_9_pct
numeric feature high_skewLikely a count of housing units paying rent in the 40-49.9% income bracket per geographic area. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740 and 111 outliers (4.77%), and 15.6% of rows are zero — consistent with small geographies sitting alongside dense ones. Treatment: log1p-transform before modelling to tame the right skew and zero mass.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 322
- min
- 0
- max
- 740
- mean
- 74.68
- median
- 49
- std
- 83.79
- q1
- 14
- q3
- 106
- iqr
- 92
- skew
- 2.137
- kurtosis
- 7.139
- n_outliers
- 111
- outlier_rate
- 0.0477
- zero_rate
- 0.1556
rent_50_pct_or_more
numeric featureCounts of households spending 50% or more of income on rent, aggregated per geographic unit across 2327 rows with no nulls. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with a median of 184 well below the mean of 253.18 and a max of 1918, and 6.27% of rows are zero. About 3.74% of values fall outside the Tukey fence. Treatment: Log1p-transform before modelling to tame the right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 706
- min
- 0
- max
- 1,918
- mean
- 253.2
- median
- 184
- std
- 236.6
- q1
- 82
- q3
- 360
- iqr
- 278
- skew
- 1.603
- kurtosis
- 3.435
- n_outliers
- 87
- outlier_rate
- 0.03739
- zero_rate
- 0.06274
NAME
text identifier near_uniqueThis column holds fully-qualified Census Tract names for New York City, every one of 2327 rows unique with zero nulls and tightly bounded length (38-46 chars, median 41). The vocabulary is formulaic: 'new', 'york', 'census', 'tract', 'county;' appear in essentially every row, with the borough split dominated by Kings (805), Queens (725), Bronx (361), and Richmond (126). Because each value is a one-to-one tract label, it functions as a geographic key rather than a modelling feature. Treatment: Treat as a tract-level key; parse out borough or join to a geo table rather than feeding the raw string to a model.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,327
- len_min
- 38
- len_max
- 46
- len_mean
- 41.65
- len_median
- 41
- len_p95
- 46
- word_mean
- 7.133
- word_median
- 7
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,539
- readability_flesch_mean
- 91.45
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
numeric metadata constantThe column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or pipeline stage code that was filtered upstream. Treatment: Drop, constant column.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 36
- max
- 36
- mean
- 36
- median
- 36
- std
- 0
- q1
- 36
- q3
- 36
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric featureEncoded county identifier stored as a numeric code, with only 5 distinct values across 2327 rows and no nulls. The values (min 5, max 85, median 47) look like sparse categorical codes rather than a continuous measurement, and the negative skew (-0.72) reflects uneven frequency across those 5 codes. Treatment: Cast to categorical and one-hot or target-encode before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- min
- 5
- max
- 85
- mean
- 55
- median
- 47
- std
- 25.97
- q1
- 47
- q3
- 81
- iqr
- 34
- skew
- -0.72
- kurtosis
- -0.4531
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
tract
numeric identifier high_skewThis is almost certainly a U.S. Census tract code stored as a numeric, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a median of 30100 but a max of 990100, which is the expected pattern for tract codes rather than a true magnitude — values are categorical identifiers padded into a numeric range. The 63 flagged outliers (2.7%) are likely just tracts in higher-numbered county/state ranges, not data errors. Treatment: Treat as a categorical geographic code; cast to zero-padded string and join to tract-level reference data rather than using as a numeric feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,530
- min
- 100
- max
- 990,100
- mean
- 4.225e+04
- median
- 30,100
- std
- 4.827e+04
- q1
- 15,200
- q3
- 5.79e+04
- iqr
- 4.27e+04
- skew
- 10.14
- kurtosis
- 189.8
- n_outliers
- 63
- outlier_rate
- 0.02707
- zero_rate
- 0
county_name
categorical featureThis column lists New York City borough/county names across 2327 rows, with exactly 5 unique values and no nulls. Distribution mirrors NYC borough sizes: Brooklyn (Kings) leads at 805 (34.6%), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.90 indicates a fairly balanced spread across the five categories with no extreme concentration. Treatment: One-hot or target-encode for modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- Brooklyn (Kings)
- top_rate
- 0.3459
- cardinality
- 5
- entropy
- 2.086
- entropy_ratio
- 0.8985
moderate_burden
numeric featureA non-negative integer count named 'moderate_burden', spanning 0 to 1732 with a median of 159 and mean of 216 across 2327 rows, no nulls. The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds bulk. Treatment: Apply a log1p transform before regression to tame the right-skew and outliers.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 639
- min
- 0
- max
- 1,732
- mean
- 216.1
- median
- 159
- std
- 210.4
- q1
- 64
- q3
- 311
- iqr
- 247
- skew
- 1.934
- kurtosis
- 6.052
- n_outliers
- 86
- outlier_rate
- 0.03696
- zero_rate
- 0.06403
severe_burden
numeric featureNumeric count-like column 'severe_burden' with 2327 rows, no nulls, and 706 unique integer values ranging from 0 to 1918 (median 184, mean 253.18). The distribution is right-skewed (skew 1.60, kurtosis 3.44) with 6.27% zeros and 87 outliers (3.74%) above the upper whisker. The wide IQR (278) and std (236.60) relative to the median suggest substantial dispersion across units. Treatment: Apply a log1p transform before regression to tame the right skew and outliers.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 706
- min
- 0
- max
- 1,918
- mean
- 253.2
- median
- 184
- std
- 236.6
- q1
- 82
- q3
- 360
- iqr
- 278
- skew
- 1.603
- kurtosis
- 3.435
- n_outliers
- 87
- outlier_rate
- 0.03739
- zero_rate
- 0.06274
pct_moderate_burden
numeric featureThis is a percentage feature measuring the share of some population under moderate housing burden, ranging 0–100 with mean 22.74 and median 21.8. The distribution is right-skewed (skew 1.51, kurtosis 6.70) with 59 outliers (2.65%) and a 4.38% null rate. About 2.1% of rows are exact zeros and the IQR is tight at 12.3, so the upper tail past q3=28.2 stretches all the way to 100. Treatment: Impute the ~4% nulls and consider a mild transform or winsorization to tame the right tail before modelling.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 461
- min
- 0
- max
- 100
- mean
- 22.74
- median
- 21.8
- std
- 11.36
- q1
- 15.9
- q3
- 28.2
- iqr
- 12.3
- skew
- 1.509
- kurtosis
- 6.704
- n_outliers
- 59
- outlier_rate
- 0.02652
- zero_rate
- 0.02112
pct_severe_burden
numeric featureA percentage metric (0–100 range) capturing the share of some population under severe burden, with a mean of 27.12 and median of 26.2 suggesting a fairly typical right-skewed distribution (skew 0.57). Spread is moderate (std 12.68, IQR 15.9) and only 1.35% of rows are flagged as outliers, though a max of 100.0 alongside a 1.98% zero rate hints at a few extreme records worth inspecting. Note the 4.38% null rate, which will need handling. Treatment: Impute the 4.4% missing values and use as-is; mild skew does not require transformation.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 518
- min
- 0
- max
- 100
- mean
- 27.12
- median
- 26.2
- std
- 12.68
- q1
- 18.7
- q3
- 34.6
- iqr
- 15.9
- skew
- 0.5663
- kurtosis
- 1.222
- n_outliers
- 30
- outlier_rate
- 0.01348
- zero_rate
- 0.01978
rent_burdened
numeric featureLikely a count or dollar measure of rent-burdened households (or burden amount) per record, ranging from 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 outliers (3.5%) and 4.7% exact zeros, so a long tail dominates the upper end. Treatment: Apply a log1p transform before regression to tame the right skew and zero mass.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,013
- min
- 0
- max
- 3,153
- mean
- 469.3
- median
- 358
- std
- 415.3
- q1
- 164.5
- q3
- 670
- iqr
- 505.5
- skew
- 1.494
- kurtosis
- 3.005
- n_outliers
- 82
- outlier_rate
- 0.03524
- zero_rate
- 0.04727
pct_rent_burdened
numeric featureThis is a numeric percentage indicating the share of rent-burdened households per record, ranging from 0 to 100 with a mean of 49.87 and median of 50.0. The distribution is nearly symmetric (skew -0.04) and reasonably tight around the middle (IQR 17.9, std 14.6), with 4.38% nulls and only 0.36% zeros. 62 outliers (2.79%) sit beyond the whiskers, but no severe tail or drift is evident. Treatment: Impute the ~4% nulls and use as-is; no transform needed given near-symmetric bounded percentage.
- n
- 2,327
- nulls
- 102 (4.4%)
- unique
- 596
- min
- 0
- max
- 100
- mean
- 49.87
- median
- 50
- std
- 14.62
- q1
- 40.9
- q3
- 58.8
- iqr
- 17.9
- skew
- -0.03839
- kurtosis
- 0.7849
- n_outliers
- 62
- outlier_rate
- 0.02787
- zero_rate
- 0.003596
median_gross_rent
numeric feature high_skew outliersThis is a numeric feature for median gross rent, with 2327 non-null values and 1232 unique levels. The middle of the distribution looks plausible (median 1735, IQR 1441.5–2049, max 3501), but the minimum is -666666666 and the mean is -41539608.8 with std 161182638.7, indicating sentinel values masquerading as numbers and producing severe negative skew (-3.62) and 289 outliers (12.4%). Treatment: Replace the -666666666 sentinel with null before any modelling or aggregation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,232
- min
- -6.667e+08
- max
- 3,501
- mean
- -4.154e+07
- median
- 1,735
- std
- 1.612e+08
- q1
- 1442
- q3
- 2,049
- iqr
- 607.5
- skew
- -3.621
- kurtosis
- 11.11
- n_outliers
- 289
- outlier_rate
- 0.1242
- zero_rate
- 0
median_household_income
numeric feature high_skew outliersMedian household income in dollars per record, fully populated across 2,327 rows with 2,106 unique values and a sensible median of 76,833 and IQR of 49,117. The mean of -36,017,397 and minimum of -666,666,666 are sentinel-coded missing values masquerading as numbers, which drag skew to -3.94 and kurtosis to 13.53. Roughly 8.9% of rows (208) are flagged as outliers, almost certainly the same sentinel contamination. Treatment: Replace -666666666 sentinel with null, then consider log-transform or winsorisation before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,106
- min
- -6.667e+08
- max
- 250,001
- mean
- -3.602e+07
- median
- 76,833
- std
- 1.509e+08
- q1
- 5.324e+04
- q3
- 1.024e+05
- iqr
- 49,117
- skew
- -3.94
- kurtosis
- 13.53
- n_outliers
- 208
- outlier_rate
- 0.08939
- zero_rate
- 0
total_households
numeric featureCounts of households per record, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zero, which may indicate unpopulated or placeholder areas. Treatment: Log-transform or winsorize before modelling and decide whether zero-household rows should be filtered.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,495
- min
- 0
- max
- 8,209
- mean
- 1411
- median
- 1,252
- std
- 923.3
- q1
- 773.5
- q3
- 1,850
- iqr
- 1076
- skew
- 1.479
- kurtosis
- 4.377
- n_outliers
- 70
- outlier_rate
- 0.03008
- zero_rate
- 0.04125
owner_occupied
numeric feature outliersDespite the boolean-sounding name 'owner_occupied', this is a numeric count column with 1001 unique values ranging from 0 to 3052 and a mean of 464.6 — likely a tally of owner-occupied units per record (e.g., per tract or block). The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% zeros. No nulls are present. Treatment: Log-transform (log1p to handle the 7% zeros) before modelling to tame the right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,001
- min
- 0
- max
- 3,052
- mean
- 464.6
- median
- 371
- std
- 422.6
- q1
- 177
- q3
- 608
- iqr
- 431
- skew
- 1.761
- kurtosis
- 4.254
- n_outliers
- 143
- outlier_rate
- 0.06145
- zero_rate
- 0.0722
renter_occupied
numeric featureCounts of renter-occupied units per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) and 4.38% zeros, consistent with area-level housing tallies rather than a per-household flag. Treatment: log-transform or scale before regression to tame the right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,418
- min
- 0
- max
- 8,209
- mean
- 946.1
- median
- 726
- std
- 815.4
- q1
- 346
- q3
- 1,357
- iqr
- 1,011
- skew
- 1.595
- kurtosis
- 4.627
- n_outliers
- 69
- outlier_rate
- 0.02965
- zero_rate
- 0.04383
pct_owner_occupied
numeric featurePercentage of owner-occupied housing per record, ranging the full 0-100 scale with a mean of 37.5 and median of 34.4. The distribution is wide (std 25.7, IQR 39.7) and slightly right-skewed (0.39) with negative kurtosis (-0.85), indicating a flat, near-uniform spread rather than a tight central mass. About 3.2% of rows are exactly zero and 4.1% are null, but no statistical outliers were flagged. Treatment: Use as-is as a bounded percentage feature; impute the 4% nulls with the median or a missingness flag.
- n
- 2,327
- nulls
- 96 (4.1%)
- unique
- 823
- min
- 0
- max
- 100
- mean
- 37.51
- median
- 34.4
- std
- 25.65
- q1
- 16.4
- q3
- 56.1
- iqr
- 39.7
- skew
- 0.3948
- kurtosis
- -0.854
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.03227
pct_renter_occupied
numeric featureNumeric percentage of renter-occupied units, ranging the full 0–100 with mean 62.5 and median 65.6, suggesting these records skew toward rental-heavy geographies. Spread is wide (std 25.7, IQR 39.7) and the distribution is mildly left-skewed (-0.39) and flat (kurtosis -0.85), so no outliers were flagged. About 4.1% of rows are null and only 0.27% are exact zeros, with 823 distinct values across 2,327 rows. Treatment: Use as-is as a bounded percentage feature; impute the 4.1% nulls before modelling.
- n
- 2,327
- nulls
- 96 (4.1%)
- unique
- 823
- min
- 0
- max
- 100
- mean
- 62.49
- median
- 65.6
- std
- 25.65
- q1
- 43.9
- q3
- 83.6
- iqr
- 39.7
- skew
- -0.3948
- kurtosis
- -0.854
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.002689