nyc housing nyc tenure by tract
Reading
This dataset contains 2,327 New York City census tracts with housing tenure breakdowns across 10 columns, covering owner- and renter-occupied household counts and percentages by county. Brooklyn (Kings) leads with 805 tracts (34.6% of rows), followed by Queens (725) and Bronx (361), while Staten Island has just 126. Renting dominates citywide: the mean share of renter-occupied households is 62.5% versus 37.5% owner-occupied, and renter counts are right-skewed with a long tail up to 8,209 per tract. Worth a closer look: the strong skew in raw household counts (owner_occupied skew 1.76, renter_occupied skew 1.59) and the ~4% null rate in the percentage columns. Note that 'state' is constant (all 36) and can be ignored.
citing: row_count · column_count · county_name.top_values · county_name.top_rate · pct_owner_occupied.mean · pct_renter_occupied.mean · owner_occupied.skew · renter_occupied.skew · renter_occupied.max · pct_owner_occupied.null_rate · state.n_unique
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 141 |
| 2.5 – 5 | 86 |
| 5 – 7.5 | 71 |
| 7.5 – 10 | 63 |
| 10 – 12.5 | 86 |
| 12.5 – 15 | 65 |
| 15 – 17.5 | 72 |
| 17.5 – 20 | 88 |
| 20 – 22.5 | 98 |
| 22.5 – 25 | 88 |
| 25 – 27.5 | 79 |
| 27.5 – 30 | 67 |
| 30 – 32.5 | 70 |
| 32.5 – 35 | 56 |
| 35 – 37.5 | 76 |
| 37.5 – 40 | 68 |
| 40 – 42.5 | 59 |
| 42.5 – 45 | 72 |
| 45 – 47.5 | 58 |
| 47.5 – 50 | 54 |
| 50 – 52.5 | 70 |
| 52.5 – 55 | 59 |
| 55 – 57.5 | 55 |
| 57.5 – 60 | 39 |
| 60 – 62.5 | 44 |
| 62.5 – 65 | 40 |
| 65 – 67.5 | 52 |
| 67.5 – 70 | 34 |
| 70 – 72.5 | 40 |
| 72.5 – 75 | 47 |
| 75 – 77.5 | 32 |
| 77.5 – 80 | 48 |
| 80 – 82.5 | 31 |
| 82.5 – 85 | 27 |
| 85 – 87.5 | 26 |
| 87.5 – 90 | 24 |
| 90 – 92.5 | 23 |
| 92.5 – 95 | 8 |
| 95 – 97.5 | 6 |
| 97.5 – 100 | 9 |
Show data table
| bin | count |
|---|---|
| 0 – 2.5 | 9 |
| 2.5 – 5 | 6 |
| 5 – 7.5 | 7 |
| 7.5 – 10 | 23 |
| 10 – 12.5 | 24 |
| 12.5 – 15 | 26 |
| 15 – 17.5 | 27 |
| 17.5 – 20 | 32 |
| 20 – 22.5 | 47 |
| 22.5 – 25 | 31 |
| 25 – 27.5 | 48 |
| 27.5 – 30 | 40 |
| 30 – 32.5 | 34 |
| 32.5 – 35 | 50 |
| 35 – 37.5 | 39 |
| 37.5 – 40 | 46 |
| 40 – 42.5 | 41 |
| 42.5 – 45 | 53 |
| 45 – 47.5 | 57 |
| 47.5 – 50 | 72 |
| 50 – 52.5 | 54 |
| 52.5 – 55 | 54 |
| 55 – 57.5 | 73 |
| 57.5 – 60 | 62 |
| 60 – 62.5 | 69 |
| 62.5 – 65 | 72 |
| 65 – 67.5 | 60 |
| 67.5 – 70 | 65 |
| 70 – 72.5 | 72 |
| 72.5 – 75 | 75 |
| 75 – 77.5 | 91 |
| 77.5 – 80 | 94 |
| 80 – 82.5 | 92 |
| 82.5 – 85 | 73 |
| 85 – 87.5 | 64 |
| 87.5 – 90 | 83 |
| 90 – 92.5 | 65 |
| 92.5 – 95 | 71 |
| 95 – 97.5 | 83 |
| 97.5 – 100 | 147 |
Show data table
| bin | count |
|---|---|
| 0 – 205.2 | 123 |
| 205.2 – 410.4 | 41 |
| 410.4 – 615.7 | 203 |
| 615.7 – 820.9 | 272 |
| 820.9 – 1026 | 269 |
| 1026 – 1231 | 237 |
| 1231 – 1437 | 215 |
| 1437 – 1642 | 221 |
| 1642 – 1847 | 162 |
| 1847 – 2052 | 134 |
| 2052 – 2257 | 94 |
| 2257 – 2463 | 101 |
| 2463 – 2668 | 66 |
| 2668 – 2873 | 39 |
| 2873 – 3078 | 35 |
| 3078 – 3284 | 24 |
| 3284 – 3489 | 22 |
| 3489 – 3694 | 8 |
| 3694 – 3899 | 7 |
| 3899 – 4104 | 9 |
| 4104 – 4310 | 13 |
| 4310 – 4515 | 9 |
| 4515 – 4720 | 5 |
| 4720 – 4925 | 5 |
| 4925 – 5131 | 3 |
| 5131 – 5336 | 2 |
| 5336 – 5541 | 2 |
| 5541 – 5746 | 0 |
| 5746 – 5952 | 1 |
| 5952 – 6157 | 1 |
| 6157 – 6362 | 0 |
| 6362 – 6567 | 0 |
| 6567 – 6772 | 1 |
| 6772 – 6978 | 2 |
| 6978 – 7183 | 0 |
| 7183 – 7388 | 0 |
| 7388 – 7593 | 0 |
| 7593 – 7799 | 0 |
| 7799 – 8004 | 0 |
| 8004 – 8209 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 205.2 | 349 |
| 205.2 – 410.4 | 358 |
| 410.4 – 615.7 | 292 |
| 615.7 – 820.9 | 268 |
| 820.9 – 1026 | 207 |
| 1026 – 1231 | 175 |
| 1231 – 1437 | 168 |
| 1437 – 1642 | 110 |
| 1642 – 1847 | 100 |
| 1847 – 2052 | 68 |
| 2052 – 2257 | 63 |
| 2257 – 2463 | 42 |
| 2463 – 2668 | 36 |
| 2668 – 2873 | 22 |
| 2873 – 3078 | 19 |
| 3078 – 3284 | 17 |
| 3284 – 3489 | 6 |
| 3489 – 3694 | 5 |
| 3694 – 3899 | 4 |
| 3899 – 4104 | 6 |
| 4104 – 4310 | 5 |
| 4310 – 4515 | 3 |
| 4515 – 4720 | 1 |
| 4720 – 4925 | 0 |
| 4925 – 5131 | 1 |
| 5131 – 5336 | 1 |
| 5336 – 5541 | 0 |
| 5541 – 5746 | 0 |
| 5746 – 5952 | 0 |
| 5952 – 6157 | 0 |
| 6157 – 6362 | 0 |
| 6362 – 6567 | 0 |
| 6567 – 6772 | 0 |
| 6772 – 6978 | 0 |
| 6978 – 7183 | 0 |
| 7183 – 7388 | 0 |
| 7388 – 7593 | 0 |
| 7593 – 7799 | 0 |
| 7799 – 8004 | 0 |
| 8004 – 8209 | 1 |
Schema
10 columns| Alerts | ||||
|---|---|---|---|---|
| total_households | numeric | 0.0% | 1,495 |
|
| owner_occupied | numeric | 0.0% | 1,001 |
outliers
|
| renter_occupied | numeric | 0.0% | 1,418 |
|
| NAME | text | 0.0% | 2,327 |
near_unique
|
| state | numeric | 0.0% | 1 |
constant
|
| county | numeric | 0.0% | 5 |
|
| tract | numeric | 0.0% | 1,530 |
high_skew
|
| county_name | categorical | 0.0% | 5 |
|
| pct_owner_occupied | numeric | 4.1% | 823 |
|
| pct_renter_occupied | numeric | 4.1% | 823 |
|
total_households
numeric featureCounts of households per geographic unit, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zeros which may indicate uninhabited or unreported areas. Treatment: Consider log1p-transform before regression to tame the right skew and zero inflation.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,495
- min
- 0
- max
- 8,209
- mean
- 1411
- median
- 1,252
- std
- 923.3
- q1
- 773.5
- q3
- 1,850
- iqr
- 1076
- skew
- 1.479
- kurtosis
- 4.377
- n_outliers
- 70
- outlier_rate
- 0.03008
- zero_rate
- 0.04125
owner_occupied
numeric feature outliersDespite the boolean-sounding name, owner_occupied is an integer count ranging 0–3052 with 1001 distinct values and a mean of 464.6 versus a median of 371, suggesting a per-area tally of owner-occupied units rather than a flag. The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% exact zeros. No nulls are present. Treatment: Log-transform or winsorize before modelling to tame the right tail.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,001
- min
- 0
- max
- 3,052
- mean
- 464.6
- median
- 371
- std
- 422.6
- q1
- 177
- q3
- 608
- iqr
- 431
- skew
- 1.761
- kurtosis
- 4.254
- n_outliers
- 143
- outlier_rate
- 0.06145
- zero_rate
- 0.0722
renter_occupied
numeric featureThis column reports the count of renter-occupied units per record, ranging from 0 to 8209 with a mean of 946 and median of 726. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 4.4% zeros and 69 outliers (2.97%) in the upper tail. No nulls and 1418 unique values across 2327 rows suggest a per-area aggregate count rather than a per-unit flag. Treatment: Log-transform (log1p to handle zeros) before regression to tame right skew.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,418
- min
- 0
- max
- 8,209
- mean
- 946.1
- median
- 726
- std
- 815.4
- q1
- 346
- q3
- 1,357
- iqr
- 1,011
- skew
- 1.595
- kurtosis
- 4.627
- n_outliers
- 69
- outlier_rate
- 0.02965
- zero_rate
- 0.04383
NAME
text identifier near_uniqueThis column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the tokens 'new', 'york', 'census', 'tract', and 'county;', with the borough breakdown skewed toward Kings (805) and Queens (725) over Bronx (361) and Richmond (126) — Manhattan/New York County appears absent from the top words, which is worth checking. With n_unique == n, this is effectively a row identifier rather than a feature. Treatment: Treat as a row label; parse out the borough token if a geographic feature is needed, otherwise drop from modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,327
- len_min
- 38
- len_max
- 46
- len_mean
- 41.65
- len_median
- 41
- len_p95
- 46
- word_mean
- 7.133
- word_median
- 7
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,539
- readability_flesch_mean
- 91.45
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
numeric metadata constantThe column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and only one unique value. It carries no information for analysis and is flagged constant. Treatment: Drop, constant column.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 36
- max
- 36
- mean
- 36
- median
- 36
- std
- 0
- q1
- 36
- q3
- 36
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric featureEncoded as numeric but only 5 distinct values across 2327 rows (min 5, max 85, median 47), this is almost certainly a categorical county code stored as an integer. The distribution is left-skewed (skew -0.72) with mean 55 sitting above median 47, suggesting one or two higher-numbered codes dominate. No nulls or outliers reported. Treatment: Cast to categorical and one-hot or target-encode rather than treating as continuous.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- min
- 5
- max
- 85
- mean
- 55
- median
- 47
- std
- 25.97
- q1
- 47
- q3
- 81
- iqr
- 34
- skew
- -0.72
- kurtosis
- -0.4531
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
tract
numeric identifier high_skewCensus tract codes stored as integers, ranging from 100 to 990100 across 1530 distinct values in 2327 rows. The skew of 10.14 and kurtosis of 189.8 are artefacts of the tract numbering scheme rather than a real distribution — these are categorical identifiers, not measurements. 63 outliers (2.7%) reflect tracts with unusually high numeric codes, not anomalous data. Treatment: Treat as categorical geographic key; do not use as a numeric feature or apply transforms.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,530
- min
- 100
- max
- 990,100
- mean
- 4.225e+04
- median
- 30,100
- std
- 4.827e+04
- q1
- 15,200
- q3
- 5.79e+04
- iqr
- 4.27e+04
- skew
- 10.14
- kurtosis
- 189.8
- n_outliers
- 63
- outlier_rate
- 0.02707
- zero_rate
- 0
county_name
categorical featureThis column lists New York City borough/county names across 2327 rows, with all 5 NYC boroughs represented and no nulls. Distribution is fairly even (entropy ratio 0.898), though Brooklyn (Kings) leads at 34.6% (805) and Staten Island (Richmond) trails at 126. The parenthetical county names suggest the source schema uses formal county labels rather than borough-only naming. Treatment: one-hot or target-encode for modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- Brooklyn (Kings)
- top_rate
- 0.3459
- cardinality
- 5
- entropy
- 2.086
- entropy_ratio
- 0.8985
pct_owner_occupied
numeric featureNumeric column on a 0-100 scale (min 0.0, max 100.0) capturing the percentage of owner-occupied housing per record. The distribution is wide and flattish (std 25.65, kurtosis -0.85) with mean 37.51 just above median 34.4, and a broad IQR from 16.4 to 56.1, indicating most areas are minority owner-occupied. About 4.13% of rows are null and 3.23% are exactly zero, which may represent fully-rental areas worth flagging. Treatment: Impute the 4.13% nulls and use as-is; no transform needed given mild skew (0.39) and no outliers.
- n
- 2,327
- nulls
- 96 (4.1%)
- unique
- 823
- min
- 0
- max
- 100
- mean
- 37.51
- median
- 34.4
- std
- 25.65
- q1
- 16.4
- q3
- 56.1
- iqr
- 39.7
- skew
- 0.3948
- kurtosis
- -0.854
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.03227
pct_renter_occupied
numeric featureNumeric share variable bounded between 0 and 100 (mean 62.49, median 65.6) — almost certainly the percentage of renter-occupied housing units in each row. The distribution is wide (std 25.65, IQR 39.7) and slightly left-skewed (skew -0.39, kurtosis -0.85), so values cluster toward the high end with a long tail of owner-dominated areas. About 4.13% of rows are null and only 0.27% are exact zeros; no outliers were flagged given the natural 0–100 bounds. Treatment: Impute the ~4% nulls and use as-is, or rescale to 0–1 before modelling.
- n
- 2,327
- nulls
- 96 (4.1%)
- unique
- 823
- min
- 0
- max
- 100
- mean
- 62.49
- median
- 65.6
- std
- 25.65
- q1
- 43.9
- q3
- 83.6
- iqr
- 39.7
- skew
- -0.3948
- kurtosis
- -0.854
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.002689