nyc housing nyc median income by tract
Reading
This dataset contains 2,327 New York City census tracts with median household income, geographic identifiers (state, county, tract), and tract names. The headline issue is median_household_income: it has a minimum of -666,666,666 and a mean of about -36 million, indicating sentinel/missing-value codes that must be filtered before any analysis — the median of $76,833 is the more trustworthy central value. County coverage is uneven, with Brooklyn (Kings) holding 34.6% of tracts and Staten Island only 126, so per-borough comparisons should be normalized. The state column is constant (36 = New York) and can be dropped.
citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.alerts · county_name.top_values · county_name.stats.top_rate · state.alerts · tract.stats.skew
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| -6.667e+08 – -6.5e+08 | 126 |
| -6.5e+08 – -6.333e+08 | 0 |
| -6.333e+08 – -6.166e+08 | 0 |
| -6.166e+08 – -6e+08 | 0 |
| -6e+08 – -5.833e+08 | 0 |
| -5.833e+08 – -5.666e+08 | 0 |
| -5.666e+08 – -5.5e+08 | 0 |
| -5.5e+08 – -5.333e+08 | 0 |
| -5.333e+08 – -5.166e+08 | 0 |
| -5.166e+08 – -4.999e+08 | 0 |
| -4.999e+08 – -4.833e+08 | 0 |
| -4.833e+08 – -4.666e+08 | 0 |
| -4.666e+08 – -4.499e+08 | 0 |
| -4.499e+08 – -4.332e+08 | 0 |
| -4.332e+08 – -4.166e+08 | 0 |
| -4.166e+08 – -3.999e+08 | 0 |
| -3.999e+08 – -3.832e+08 | 0 |
| -3.832e+08 – -3.666e+08 | 0 |
| -3.666e+08 – -3.499e+08 | 0 |
| -3.499e+08 – -3.332e+08 | 0 |
| -3.332e+08 – -3.165e+08 | 0 |
| -3.165e+08 – -2.999e+08 | 0 |
| -2.999e+08 – -2.832e+08 | 0 |
| -2.832e+08 – -2.665e+08 | 0 |
| -2.665e+08 – -2.498e+08 | 0 |
| -2.498e+08 – -2.332e+08 | 0 |
| -2.332e+08 – -2.165e+08 | 0 |
| -2.165e+08 – -1.998e+08 | 0 |
| -1.998e+08 – -1.832e+08 | 0 |
| -1.832e+08 – -1.665e+08 | 0 |
| -1.665e+08 – -1.498e+08 | 0 |
| -1.498e+08 – -1.331e+08 | 0 |
| -1.331e+08 – -1.165e+08 | 0 |
| -1.165e+08 – -9.979e+07 | 0 |
| -9.979e+07 – -8.311e+07 | 0 |
| -8.311e+07 – -6.644e+07 | 0 |
| -6.644e+07 – -4.977e+07 | 0 |
| -4.977e+07 – -3.31e+07 | 0 |
| -3.31e+07 – -1.642e+07 | 0 |
| -1.642e+07 – 2.5e+05 | 2201 |
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| value | count | share |
|---|---|---|
| Brooklyn (Kings) | 805 | 34.6% |
| Queens | 725 | 31.2% |
| Bronx | 361 | 15.5% |
| Manhattan (New York) | 310 | 13.3% |
| Staten Island (Richmond) | 126 | 5.4% |
Show data table
| bin | count |
|---|---|
| 100 – 2.485e+04 | 982 |
| 2.485e+04 – 4.96e+04 | 617 |
| 4.96e+04 – 7.435e+04 | 329 |
| 7.435e+04 – 9.91e+04 | 197 |
| 9.91e+04 – 1.238e+05 | 145 |
| 1.238e+05 – 1.486e+05 | 37 |
| 1.486e+05 – 1.734e+05 | 17 |
| 1.734e+05 – 1.981e+05 | 0 |
| 1.981e+05 – 2.228e+05 | 0 |
| 2.228e+05 – 2.476e+05 | 0 |
| 2.476e+05 – 2.724e+05 | 0 |
| 2.724e+05 – 2.971e+05 | 0 |
| 2.971e+05 – 3.218e+05 | 0 |
| 3.218e+05 – 3.466e+05 | 0 |
| 3.466e+05 – 3.714e+05 | 0 |
| 3.714e+05 – 3.961e+05 | 0 |
| 3.961e+05 – 4.208e+05 | 0 |
| 4.208e+05 – 4.456e+05 | 0 |
| 4.456e+05 – 4.704e+05 | 0 |
| 4.704e+05 – 4.951e+05 | 0 |
| 4.951e+05 – 5.198e+05 | 0 |
| 5.198e+05 – 5.446e+05 | 0 |
| 5.446e+05 – 5.694e+05 | 0 |
| 5.694e+05 – 5.941e+05 | 0 |
| 5.941e+05 – 6.188e+05 | 0 |
| 6.188e+05 – 6.436e+05 | 0 |
| 6.436e+05 – 6.684e+05 | 0 |
| 6.684e+05 – 6.931e+05 | 0 |
| 6.931e+05 – 7.178e+05 | 0 |
| 7.178e+05 – 7.426e+05 | 0 |
| 7.426e+05 – 7.674e+05 | 0 |
| 7.674e+05 – 7.921e+05 | 0 |
| 7.921e+05 – 8.168e+05 | 0 |
| 8.168e+05 – 8.416e+05 | 0 |
| 8.416e+05 – 8.664e+05 | 0 |
| 8.664e+05 – 8.911e+05 | 0 |
| 8.911e+05 – 9.158e+05 | 0 |
| 9.158e+05 – 9.406e+05 | 0 |
| 9.406e+05 – 9.654e+05 | 0 |
| 9.654e+05 – 9.901e+05 | 3 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| median_household_income | numeric | 0.0% | 2,106 |
high_skew
outliers
|
| NAME | text | 0.0% | 2,327 |
near_unique
|
| state | numeric | 0.0% | 1 |
constant
|
| county | numeric | 0.0% | 5 |
|
| tract | numeric | 0.0% | 1,530 |
high_skew
|
| county_name | categorical | 0.0% | 5 |
|
median_household_income
numeric feature high_skew outliersLikely U.S. median household income in dollars, with median 76833 and IQR spanning 53242.5 to 102359.5. The minimum of -666666666 is a sentinel null code that is poisoning the mean (-36017397.46) and standard deviation (150923371.88), and 208 rows (8.94%) flag as outliers. Skew of -3.94 and kurtosis of 13.53 are entirely artifacts of that sentinel. Treatment: Replace -666666666 with NaN, then optionally cap at the 250001 top-code before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,106
- min
- -6.667e+08
- max
- 250,001
- mean
- -3.602e+07
- median
- 76,833
- std
- 1.509e+08
- q1
- 5.324e+04
- q3
- 1.024e+05
- iqr
- 49,117
- skew
- -3.94
- kurtosis
- 13.53
- n_outliers
- 208
- outlier_rate
- 0.08939
- zero_rate
- 0
NAME
text identifier near_uniqueThis column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the words 'new', 'york', 'census', 'tract', and 'county;', confirming a rigid template; the borough token (Kings 805, Queens 725, Bronx 361, Richmond 126) is the only meaningful variation. It is effectively a row identifier, not a feature. Treatment: Drop from modelling; optionally parse the borough token out as a categorical feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 2,327
- len_min
- 38
- len_max
- 46
- len_mean
- 41.65
- len_median
- 41
- len_p95
- 46
- word_mean
- 7.133
- word_median
- 7
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,539
- readability_flesch_mean
- 91.45
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
numeric metadata constantThe column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or processing-state code captured during a single-state extract. Treatment: Drop; constant column with no predictive signal.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 36
- max
- 36
- mean
- 36
- median
- 36
- std
- 0
- q1
- 36
- q3
- 36
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county
numeric featureDespite being typed as numeric, `county` has only 5 unique values across 2327 rows (5, ?, 47, 81, 85 implied by the quartiles) with no nulls — these are almost certainly FIPS-style county codes rather than a measured quantity. The distribution is left-skewed (skew -0.72) with the median at 47 and Q1 also at 47, meaning at least a quarter of rows share that single code. Treating mean (55.0) or std (25.97) as meaningful would be misleading given the categorical nature. Treatment: Cast to categorical and one-hot encode before modelling.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- min
- 5
- max
- 85
- mean
- 55
- median
- 47
- std
- 25.97
- q1
- 47
- q3
- 81
- iqr
- 34
- skew
- -0.72
- kurtosis
- -0.4531
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
tract
numeric identifier high_skewAlmost certainly U.S. Census tract codes stored as integers, with 1530 distinct values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 sitting far above the q3 of 57900.5 and median of 30100, producing 63 outliers (2.7%); this is an artifact of tract numbering conventions, not a true numeric magnitude. Treatment: Treat as a categorical geographic key (zero-pad and join with state/county FIPS); do not use as a numeric feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 1,530
- min
- 100
- max
- 990,100
- mean
- 4.225e+04
- median
- 30,100
- std
- 4.827e+04
- q1
- 15,200
- q3
- 5.79e+04
- iqr
- 4.27e+04
- skew
- 10.14
- kurtosis
- 189.8
- n_outliers
- 63
- outlier_rate
- 0.02707
- zero_rate
- 0
county_name
categorical featureThis column lists the NYC borough/county for each record, with all 5 expected values present across 2327 rows and no nulls. Distribution roughly tracks borough population: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.898 indicates the categories are fairly evenly spread rather than dominated by one value. Treatment: one-hot encode as a low-cardinality categorical feature.
- n
- 2,327
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- Brooklyn (Kings)
- top_rate
- 0.3459
- cardinality
- 5
- entropy
- 2.086
- entropy_ratio
- 0.8985