This dataset covers 2,327 NYC census tracts with 23 columns describing housing tenure, rent burden, income, and rent levels across the five boroughs. The most urgent issue is data hygiene: median_gross_rent and median_household_income both contain a sentinel value of -666666666, which drags their means to roughly -41.5M and -36M respectively despite sensible medians (~$1,735 rent, ~$76,833 income) — these need to be filtered before any analysis. Beyond that, the substantive story is rent burden: pct_rent_burdened has a median of 50% with an IQR of 40.9–58.8, meaning half of NYC tracts have a majority of renters paying 30%+ of income on rent. Brooklyn (Kings) dominates the tract count at 35%, followed by Queens (31%) and the Bronx (15%), so any borough-level comparison should weight accordingly. The state column is constant (all 36, New York) and can be dropped.
saturn
/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv 2,327 rows sample n=2,327 seed 42 2026-05-01T17:39:37+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv |
| Total rows | 2,327 |
| Profiled sample | 2,327 |
| Columns | 23 |
| Generated | 2026-05-01T17:39:37+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
This column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end and 4.38% zero values. No nulls, and 1418 unique values across 2327 rows suggests aggregation at a geographic or administrative unit.
Likely a count of households paying 30%-34.9% of income on rent within some geographic unit, given the integer-like values, zero floor, and max of 1205. The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with a median of 51 against a mean of 83.05, and 16.2% of rows are exactly zero. 124 outliers (5.33%) extend far above the Q3 of 116, consistent with a few large areas dominating.
Likely a count of households (or housing units) paying 35% to 39.9% of income on rent within some geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with a median of 35 but a max of 633, and nearly 20% of rows are zero (zero_rate 0.196), suggesting many small areas have no households in this rent burden bracket. 110 outliers (4.7%) sit well above the Q3 of 83.
Likely a count of housing units paying rent in the 40-49.9% income bracket per geographic area. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740 and 111 outliers (4.77%), and 15.6% of rows are zero — consistent with small geographies sitting alongside dense ones.
Counts of households spending 50% or more of income on rent, aggregated per geographic unit across 2327 rows with no nulls. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with a median of 184 well below the mean of 253.18 and a max of 1918, and 6.27% of rows are zero. About 3.74% of values fall outside the Tukey fence.
This column holds fully-qualified Census Tract names for New York City, every one of 2327 rows unique with zero nulls and tightly bounded length (38-46 chars, median 41). The vocabulary is formulaic: 'new', 'york', 'census', 'tract', 'county;' appear in essentially every row, with the borough split dominated by Kings (805), Queens (725), Bronx (361), and Richmond (126). Because each value is a one-to-one tract label, it functions as a geographic key rather than a modelling feature.
The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or pipeline stage code that was filtered upstream.
Encoded county identifier stored as a numeric code, with only 5 distinct values across 2327 rows and no nulls. The values (min 5, max 85, median 47) look like sparse categorical codes rather than a continuous measurement, and the negative skew (-0.72) reflects uneven frequency across those 5 codes.
This is almost certainly a U.S. Census tract code stored as a numeric, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a median of 30100 but a max of 990100, which is the expected pattern for tract codes rather than a true magnitude — values are categorical identifiers padded into a numeric range. The 63 flagged outliers (2.7%) are likely just tracts in higher-numbered county/state ranges, not data errors.
This column lists New York City borough/county names across 2327 rows, with exactly 5 unique values and no nulls. Distribution mirrors NYC borough sizes: Brooklyn (Kings) leads at 805 (34.6%), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.90 indicates a fairly balanced spread across the five categories with no extreme concentration.
A non-negative integer count named 'moderate_burden', spanning 0 to 1732 with a median of 159 and mean of 216 across 2327 rows, no nulls. The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds bulk.
Numeric count-like column 'severe_burden' with 2327 rows, no nulls, and 706 unique integer values ranging from 0 to 1918 (median 184, mean 253.18). The distribution is right-skewed (skew 1.60, kurtosis 3.44) with 6.27% zeros and 87 outliers (3.74%) above the upper whisker. The wide IQR (278) and std (236.60) relative to the median suggest substantial dispersion across units.
This is a percentage feature measuring the share of some population under moderate housing burden, ranging 0–100 with mean 22.74 and median 21.8. The distribution is right-skewed (skew 1.51, kurtosis 6.70) with 59 outliers (2.65%) and a 4.38% null rate. About 2.1% of rows are exact zeros and the IQR is tight at 12.3, so the upper tail past q3=28.2 stretches all the way to 100.
A percentage metric (0–100 range) capturing the share of some population under severe burden, with a mean of 27.12 and median of 26.2 suggesting a fairly typical right-skewed distribution (skew 0.57). Spread is moderate (std 12.68, IQR 15.9) and only 1.35% of rows are flagged as outliers, though a max of 100.0 alongside a 1.98% zero rate hints at a few extreme records worth inspecting. Note the 4.38% null rate, which will need handling.
Likely a count or dollar measure of rent-burdened households (or burden amount) per record, ranging from 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 outliers (3.5%) and 4.7% exact zeros, so a long tail dominates the upper end.
This is a numeric percentage indicating the share of rent-burdened households per record, ranging from 0 to 100 with a mean of 49.87 and median of 50.0. The distribution is nearly symmetric (skew -0.04) and reasonably tight around the middle (IQR 17.9, std 14.6), with 4.38% nulls and only 0.36% zeros. 62 outliers (2.79%) sit beyond the whiskers, but no severe tail or drift is evident.
This is a numeric feature for median gross rent, with 2327 non-null values and 1232 unique levels. The middle of the distribution looks plausible (median 1735, IQR 1441.5–2049, max 3501), but the minimum is -666666666 and the mean is -41539608.8 with std 161182638.7, indicating sentinel values masquerading as numbers and producing severe negative skew (-3.62) and 289 outliers (12.4%).
Median household income in dollars per record, fully populated across 2,327 rows with 2,106 unique values and a sensible median of 76,833 and IQR of 49,117. The mean of -36,017,397 and minimum of -666,666,666 are sentinel-coded missing values masquerading as numbers, which drag skew to -3.94 and kurtosis to 13.53. Roughly 8.9% of rows (208) are flagged as outliers, almost certainly the same sentinel contamination.
Counts of households per record, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zero, which may indicate unpopulated or placeholder areas.
Despite the boolean-sounding name 'owner_occupied', this is a numeric count column with 1001 unique values ranging from 0 to 3052 and a mean of 464.6 — likely a tally of owner-occupied units per record (e.g., per tract or block). The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% zeros. No nulls are present.
Counts of renter-occupied units per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) and 4.38% zeros, consistent with area-level housing tallies rather than a per-household flag.
Percentage of owner-occupied housing per record, ranging the full 0-100 scale with a mean of 37.5 and median of 34.4. The distribution is wide (std 25.7, IQR 39.7) and slightly right-skewed (0.39) with negative kurtosis (-0.85), indicating a flat, near-uniform spread rather than a tight central mass. About 3.2% of rows are exactly zero and 4.1% are null, but no statistical outliers were flagged.
Numeric percentage of renter-occupied units, ranging the full 0–100 with mean 62.5 and median 65.6, suggesting these records skew toward rental-heavy geographies. Spread is wide (std 25.7, IQR 39.7) and the distribution is mildly left-skewed (-0.39) and flat (kurtosis -0.85), so no outliers were flagged. About 4.1% of rows are null and only 0.27% are exact zeros, with 823 distinct values across 2,327 rows.
Numeric correlation
total_renter_households numeric
rent_30_to_34_9_pct numeric
rent_35_to_39_9_pct numeric
rent_40_to_49_9_pct numeric
rent_50_pct_or_more numeric
NAME text
Sample values (first 10)
- Census Tract 4; Bronx County; New York
- Census Tract 399.01; Queens County; New York
- Census Tract 779.08; Queens County; New York
- Census Tract 613.02; Queens County; New York
- Census Tract 780; Kings County; New York
- Census Tract 156.02; Richmond County; New York
- Census Tract 848; Kings County; New York
- Census Tract 1008.04; Queens County; New York
- Census Tract 618; Queens County; New York
- Census Tract 145; Bronx County; New York
state numeric
county numeric
tract numeric
county_name categorical
Top values (rank 1–20)
- Brooklyn (Kings) — 805
- Queens — 725
- Bronx — 361
- Manhattan (New York) — 310
- Staten Island (Richmond) — 126