nyc housing nyc median rent by tract
Reading
This dataset contains 2,327 New York City census tracts with median gross rent values across the five boroughs. The most important issue to investigate is median_gross_rent: it has a minimum of -666,666,666 and a mean of about -41.5 million, indicating sentinel values for missing data that must be filtered before any analysis — once cleaned, the median rent of $1,735 and IQR of $1,441–$2,049 are the realistic figures. The county_name field is well-distributed across five boroughs, with Brooklyn (Kings) the largest at 805 tracts (34.6%) and Staten Island the smallest at 126. Note that 'state' is constant (all 36, New York) and can be ignored, and 'NAME' is a unique tract label rather than an analytical field.
citing: median_gross_rent.stats.min · median_gross_rent.stats.mean · median_gross_rent.stats.median · median_gross_rent.stats.q1 · median_gross_rent.stats.q3 · median_gross_rent.alerts · county_name.top_values · county_name.stats.top_rate · county_name.stats.cardinality · state.stats.min · state.stats.max · row_count
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| -6.667e+08 – -6.5e+08 | 145 |
| -6.5e+08 – -6.333e+08 | 0 |
| -6.333e+08 – -6.167e+08 | 0 |
| -6.167e+08 – -6e+08 | 0 |
| -6e+08 – -5.833e+08 | 0 |
| -5.833e+08 – -5.667e+08 | 0 |
| -5.667e+08 – -5.5e+08 | 0 |
| -5.5e+08 – -5.333e+08 | 0 |
| -5.333e+08 – -5.167e+08 | 0 |
| -5.167e+08 – -5e+08 | 0 |
| -5e+08 – -4.833e+08 | 0 |
| -4.833e+08 – -4.667e+08 | 0 |
| -4.667e+08 – -4.5e+08 | 0 |
| -4.5e+08 – -4.333e+08 | 0 |
| -4.333e+08 – -4.167e+08 | 0 |
| -4.167e+08 – -4e+08 | 0 |
| -4e+08 – -3.833e+08 | 0 |
| -3.833e+08 – -3.667e+08 | 0 |
| -3.667e+08 – -3.5e+08 | 0 |
| -3.5e+08 – -3.333e+08 | 0 |
| -3.333e+08 – -3.167e+08 | 0 |
| -3.167e+08 – -3e+08 | 0 |
| -3e+08 – -2.833e+08 | 0 |
| -2.833e+08 – -2.667e+08 | 0 |
| -2.667e+08 – -2.5e+08 | 0 |
| -2.5e+08 – -2.333e+08 | 0 |
| -2.333e+08 – -2.167e+08 | 0 |
| -2.167e+08 – -2e+08 | 0 |
| -2e+08 – -1.833e+08 | 0 |
| -1.833e+08 – -1.667e+08 | 0 |
| -1.667e+08 – -1.5e+08 | 0 |
| -1.5e+08 – -1.333e+08 | 0 |
| -1.333e+08 – -1.167e+08 | 0 |
| -1.167e+08 – -1e+08 | 0 |
| -1e+08 – -8.333e+07 | 0 |
| -8.333e+07 – -6.666e+07 | 0 |
| -6.666e+07 – -5e+07 | 0 |
| -5e+07 – -3.333e+07 | 0 |
| -3.333e+07 – -1.666e+07 | 0 |
| -1.666e+07 – 3501 | 2182 |
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| 100 – 2.485e+04 | 982 |
| 2.485e+04 – 4.96e+04 | 617 |
| 4.96e+04 – 7.435e+04 | 329 |
| 7.435e+04 – 9.91e+04 | 197 |
| 9.91e+04 – 1.238e+05 | 145 |
| 1.238e+05 – 1.486e+05 | 37 |
| 1.486e+05 – 1.734e+05 | 17 |
| 1.734e+05 – 1.981e+05 | 0 |
| 1.981e+05 – 2.228e+05 | 0 |
| 2.228e+05 – 2.476e+05 | 0 |
| 2.476e+05 – 2.724e+05 | 0 |
| 2.724e+05 – 2.971e+05 | 0 |
| 2.971e+05 – 3.218e+05 | 0 |
| 3.218e+05 – 3.466e+05 | 0 |
| 3.466e+05 – 3.714e+05 | 0 |
| 3.714e+05 – 3.961e+05 | 0 |
| 3.961e+05 – 4.208e+05 | 0 |
| 4.208e+05 – 4.456e+05 | 0 |
| 4.456e+05 – 4.704e+05 | 0 |
| 4.704e+05 – 4.951e+05 | 0 |
| 4.951e+05 – 5.198e+05 | 0 |
| 5.198e+05 – 5.446e+05 | 0 |
| 5.446e+05 – 5.694e+05 | 0 |
| 5.694e+05 – 5.941e+05 | 0 |
| 5.941e+05 – 6.188e+05 | 0 |
| 6.188e+05 – 6.436e+05 | 0 |
| 6.436e+05 – 6.684e+05 | 0 |
| 6.684e+05 – 6.931e+05 | 0 |
| 6.931e+05 – 7.178e+05 | 0 |
| 7.178e+05 – 7.426e+05 | 0 |
| 7.426e+05 – 7.674e+05 | 0 |
| 7.674e+05 – 7.921e+05 | 0 |
| 7.921e+05 – 8.168e+05 | 0 |
| 8.168e+05 – 8.416e+05 | 0 |
| 8.416e+05 – 8.664e+05 | 0 |
| 8.664e+05 – 8.911e+05 | 0 |
| 8.911e+05 – 9.158e+05 | 0 |
| 9.158e+05 – 9.406e+05 | 0 |
| 9.406e+05 – 9.654e+05 | 0 |
| 9.654e+05 – 9.901e+05 | 3 |
Show data table
| chars | count |
|---|---|
| 38 – 38 | 7 |
| 38 – 38 | 0 |
| 38 – 39 | 0 |
| 39 – 39 | 0 |
| 39 – 39 | 0 |
| 39 – 39 | 104 |
| 39 – 39 | 0 |
| 39 – 40 | 0 |
| 40 – 40 | 0 |
| 40 – 40 | 0 |
| 40 – 40 | 785 |
| 40 – 40 | 0 |
| 40 – 41 | 0 |
| 41 – 41 | 0 |
| 41 – 41 | 0 |
| 41 – 41 | 447 |
| 41 – 41 | 0 |
| 41 – 42 | 0 |
| 42 – 42 | 0 |
| 42 – 42 | 0 |
| 42 – 42 | 200 |
| 42 – 42 | 0 |
| 42 – 43 | 0 |
| 43 – 43 | 0 |
| 43 – 43 | 0 |
| 43 – 43 | 378 |
| 43 – 43 | 0 |
| 43 – 44 | 0 |
| 44 – 44 | 0 |
| 44 – 44 | 0 |
| 44 – 44 | 190 |
| 44 – 44 | 0 |
| 44 – 45 | 0 |
| 45 – 45 | 0 |
| 45 – 45 | 0 |
| 45 – 45 | 82 |
| 45 – 45 | 0 |
| 45 – 46 | 0 |
| 46 – 46 | 0 |
| 46 – 46 | 134 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| median_gross_rent | numeric | 0.0% | 1,232 |
high_skew
outliers
|
| NAME | text | 0.0% | 2,327 |
near_unique
|
| state | numeric | 0.0% | 1 |
constant
|
| county | numeric | 0.0% | 5 |
|
| tract | numeric | 0.0% | 1,530 |
high_skew
|
| county_name | categorical | 0.0% | 5 |
|
median_gross_rent
numeric feature high_skew outliersMedian gross rent per geography, with a typical value around $1,735 (IQR $1,441.5–$2,049). The column is contaminated by sentinel values: the min of -666666666 drags the mean to -41539608.82 and inflates std to 1.6e8, producing skew of -3.62 and 12.4% flagged outliers. Once sentinels are removed, the real distribution looks tight and plausible for US rents capped near $3,501. Treatment: Replace -666666666 sentinel with null, then consider winsorizing or log-transforming before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,232
- min
- -6.667e+08
- max
- 3,501
- mean
- -4.154e+07
- median
- 1,735
- std
- 1.612e+08
- q1
- 1442
- q3
- 2,049
- iqr
- 607.5
- skew
- -3.621
- kurtosis
- 11.11
- n_outliers
- 289
- outlier_rate
- 0.1242
- zero_rate
- 0
NAME
text identifier near_uniqueThis column holds fully-qualified names of New York City census tracts, one per row (e.g. 'Census Tract ...; Kings County; New York'). Every one of the 2327 values is unique with zero nulls and tightly bounded length (38-46 chars, mean 41.6 words≈7), and the top words confirm the five NYC boroughs: Kings (805), Queens (725), Bronx (361), Richmond (126), with Manhattan/New York making up the remainder. It is effectively a row identifier rather than a modelling feature. Treatment: Drop from modelling; retain as a join key or parse out the borough/tract components if geography is needed.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,327
- len_min
- 38
- len_max
- 46
- len_mean
- 41.65
- len_median
- 41
- len_p95
- 46
- word_mean
- 7.133
- word_median
- 7
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,539
- readability_flesch_mean
- 91.45
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
numeric metadata constantThe column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and zero nulls. This is a constant field carrying no information for modelling, likely a leftover state code from an upstream filter or partition. Treatment: Drop; constant column provides no signal.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 36
- max
- 36
- mean
- 36
- median
- 36
- std
- 0
- q1
- 36
- q3
- 36
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric identifierThis column holds numeric county codes (likely FIPS-style identifiers), with only 5 unique values across 2327 rows and no nulls. Despite being labelled numeric, the values 5, 47, 81, 85 etc. are categorical labels — the reported mean of 55.0 and std of 25.97 are not meaningful. The distribution is concentrated in the upper end (median 47, Q3 81), giving a negative skew of -0.72. Treatment: Cast to categorical and one-hot or target-encode; do not treat as a continuous feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- min
- 5
- max
- 85
- mean
- 55
- median
- 47
- std
- 25.97
- q1
- 47
- q3
- 81
- iqr
- 34
- skew
- -0.72
- kurtosis
- -0.4531
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
tract
numeric identifier high_skewThis is almost certainly a U.S. Census tract code rather than a true numeric measurement, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a max of 990100 sitting far above the median of 30100, which is expected behavior for tract identifiers and triggered the high_skew alert. The 63 flagged outliers (2.7%) reflect tract-numbering conventions, not data errors. Treatment: Cast to string and treat as a categorical/geographic key; do not use as a continuous numeric feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,530
- min
- 100
- max
- 990,100
- mean
- 4.225e+04
- median
- 30,100
- std
- 4.827e+04
- q1
- 15,200
- q3
- 5.79e+04
- iqr
- 4.27e+04
- skew
- 10.14
- kurtosis
- 189.8
- n_outliers
- 63
- outlier_rate
- 0.02707
- zero_rate
- 0
county_name
categorical featureThis column records NYC borough/county names across 2327 rows with no nulls and only 5 distinct values, matching the five boroughs of New York City. Distribution is uneven but balanced enough to be informative: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126), giving high entropy_ratio of 0.898. Notably, three of the five labels embed parenthetical legal county names (e.g., 'Brooklyn (Kings)'), which will need normalization if joining to standard county tables. Treatment: One-hot or target-encode after stripping the parenthetical county aliases for clean joins.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- Brooklyn (Kings)
- top_rate
- 0.3459
- cardinality
- 5
- entropy
- 2.086
- entropy_ratio
- 0.8985