saturn

/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv 2,327 rows sample n=2,327 seed 42 2026-05-01T17:39:37+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_housing_metrics_merged.csv
Total rows2,327
Profiled sample2,327
Columns23
Generated2026-05-01T17:39:37+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset covers 2,327 NYC census tracts with 23 columns describing housing tenure, rent burden, income, and rent levels across the five boroughs. The most urgent issue is data hygiene: median_gross_rent and median_household_income both contain a sentinel value of -666666666, which drags their means to roughly -41.5M and -36M respectively despite sensible medians (~$1,735 rent, ~$76,833 income) — these need to be filtered before any analysis. Beyond that, the substantive story is rent burden: pct_rent_burdened has a median of 50% with an IQR of 40.9–58.8, meaning half of NYC tracts have a majority of renters paying 30%+ of income on rent. Brooklyn (Kings) dominates the tract count at 35%, followed by Queens (31%) and the Bronx (15%), so any borough-level comparison should weight accordingly. The state column is constant (all 36, New York) and can be dropped.

total_renter_households high anthropic:claude-opus-4-7

This column counts renter households per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) on the high end and 4.38% zero values. No nulls, and 1418 unique values across 2327 rows suggests aggregation at a geographic or administrative unit.

rent_30_to_34_9_pct high anthropic:claude-opus-4-7

Likely a count of households paying 30%-34.9% of income on rent within some geographic unit, given the integer-like values, zero floor, and max of 1205. The distribution is heavily right-skewed (skew 2.76, kurtosis 13.86) with a median of 51 against a mean of 83.05, and 16.2% of rows are exactly zero. 124 outliers (5.33%) extend far above the Q3 of 116, consistent with a few large areas dominating.

rent_35_to_39_9_pct high anthropic:claude-opus-4-7

Likely a count of households (or housing units) paying 35% to 39.9% of income on rent within some geographic unit. The distribution is heavily right-skewed (skew 2.40, kurtosis 9.27) with a median of 35 but a max of 633, and nearly 20% of rows are zero (zero_rate 0.196), suggesting many small areas have no households in this rent burden bracket. 110 outliers (4.7%) sit well above the Q3 of 83.

rent_40_to_49_9_pct high anthropic:claude-opus-4-7

Likely a count of housing units paying rent in the 40-49.9% income bracket per geographic area. The distribution is heavily right-skewed (skew 2.14, kurtosis 7.14) with a median of 49 but a max of 740 and 111 outliers (4.77%), and 15.6% of rows are zero — consistent with small geographies sitting alongside dense ones.

rent_50_pct_or_more high anthropic:claude-opus-4-7

Counts of households spending 50% or more of income on rent, aggregated per geographic unit across 2327 rows with no nulls. The distribution is right-skewed (skew 1.60, kurtosis 3.44) with a median of 184 well below the mean of 253.18 and a max of 1918, and 6.27% of rows are zero. About 3.74% of values fall outside the Tukey fence.

NAME high anthropic:claude-opus-4-7

This column holds fully-qualified Census Tract names for New York City, every one of 2327 rows unique with zero nulls and tightly bounded length (38-46 chars, median 41). The vocabulary is formulaic: 'new', 'york', 'census', 'tract', 'county;' appear in essentially every row, with the borough split dominated by Kings (805), Queens (725), Bronx (361), and Richmond (126). Because each value is a one-to-one tract label, it functions as a geographic key rather than a modelling feature.

state high anthropic:claude-opus-4-7

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or pipeline stage code that was filtered upstream.

county high anthropic:claude-opus-4-7

Encoded county identifier stored as a numeric code, with only 5 distinct values across 2327 rows and no nulls. The values (min 5, max 85, median 47) look like sparse categorical codes rather than a continuous measurement, and the negative skew (-0.72) reflects uneven frequency across those 5 codes.

tract high anthropic:claude-opus-4-7

This is almost certainly a U.S. Census tract code stored as a numeric, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a median of 30100 but a max of 990100, which is the expected pattern for tract codes rather than a true magnitude — values are categorical identifiers padded into a numeric range. The 63 flagged outliers (2.7%) are likely just tracts in higher-numbered county/state ranges, not data errors.

county_name high anthropic:claude-opus-4-7

This column lists New York City borough/county names across 2327 rows, with exactly 5 unique values and no nulls. Distribution mirrors NYC borough sizes: Brooklyn (Kings) leads at 805 (34.6%), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.90 indicates a fairly balanced spread across the five categories with no extreme concentration.

moderate_burden high anthropic:claude-opus-4-7

A non-negative integer count named 'moderate_burden', spanning 0 to 1732 with a median of 159 and mean of 216 across 2327 rows, no nulls. The distribution is right-skewed (skew 1.93, kurtosis 6.05) with 86 outliers (3.7%) and 6.4% zeros, suggesting a long tail of high-burden cases over a typical mid-hundreds bulk.

severe_burden high anthropic:claude-opus-4-7

Numeric count-like column 'severe_burden' with 2327 rows, no nulls, and 706 unique integer values ranging from 0 to 1918 (median 184, mean 253.18). The distribution is right-skewed (skew 1.60, kurtosis 3.44) with 6.27% zeros and 87 outliers (3.74%) above the upper whisker. The wide IQR (278) and std (236.60) relative to the median suggest substantial dispersion across units.

pct_moderate_burden high anthropic:claude-opus-4-7

This is a percentage feature measuring the share of some population under moderate housing burden, ranging 0–100 with mean 22.74 and median 21.8. The distribution is right-skewed (skew 1.51, kurtosis 6.70) with 59 outliers (2.65%) and a 4.38% null rate. About 2.1% of rows are exact zeros and the IQR is tight at 12.3, so the upper tail past q3=28.2 stretches all the way to 100.

pct_severe_burden high anthropic:claude-opus-4-7

A percentage metric (0–100 range) capturing the share of some population under severe burden, with a mean of 27.12 and median of 26.2 suggesting a fairly typical right-skewed distribution (skew 0.57). Spread is moderate (std 12.68, IQR 15.9) and only 1.35% of rows are flagged as outliers, though a max of 100.0 alongside a 1.98% zero rate hints at a few extreme records worth inspecting. Note the 4.38% null rate, which will need handling.

rent_burdened medium anthropic:claude-opus-4-7

Likely a count or dollar measure of rent-burdened households (or burden amount) per record, ranging from 0 to 3153 with a median of 358 and mean of 469.26. The distribution is right-skewed (skew 1.49, kurtosis 3.00) with 82 outliers (3.5%) and 4.7% exact zeros, so a long tail dominates the upper end.

pct_rent_burdened high anthropic:claude-opus-4-7

This is a numeric percentage indicating the share of rent-burdened households per record, ranging from 0 to 100 with a mean of 49.87 and median of 50.0. The distribution is nearly symmetric (skew -0.04) and reasonably tight around the middle (IQR 17.9, std 14.6), with 4.38% nulls and only 0.36% zeros. 62 outliers (2.79%) sit beyond the whiskers, but no severe tail or drift is evident.

median_gross_rent high anthropic:claude-opus-4-7

This is a numeric feature for median gross rent, with 2327 non-null values and 1232 unique levels. The middle of the distribution looks plausible (median 1735, IQR 1441.5–2049, max 3501), but the minimum is -666666666 and the mean is -41539608.8 with std 161182638.7, indicating sentinel values masquerading as numbers and producing severe negative skew (-3.62) and 289 outliers (12.4%).

median_household_income high anthropic:claude-opus-4-7

Median household income in dollars per record, fully populated across 2,327 rows with 2,106 unique values and a sensible median of 76,833 and IQR of 49,117. The mean of -36,017,397 and minimum of -666,666,666 are sentinel-coded missing values masquerading as numbers, which drag skew to -3.94 and kurtosis to 13.53. Roughly 8.9% of rows (208) are flagged as outliers, almost certainly the same sentinel contamination.

total_households high anthropic:claude-opus-4-7

Counts of households per record, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zero, which may indicate unpopulated or placeholder areas.

owner_occupied medium anthropic:claude-opus-4-7

Despite the boolean-sounding name 'owner_occupied', this is a numeric count column with 1001 unique values ranging from 0 to 3052 and a mean of 464.6 — likely a tally of owner-occupied units per record (e.g., per tract or block). The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% zeros. No nulls are present.

renter_occupied high anthropic:claude-opus-4-7

Counts of renter-occupied units per record, ranging from 0 to 8209 with a median of 726 and mean of 946. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 69 outliers (2.97%) and 4.38% zeros, consistent with area-level housing tallies rather than a per-household flag.

pct_owner_occupied high anthropic:claude-opus-4-7

Percentage of owner-occupied housing per record, ranging the full 0-100 scale with a mean of 37.5 and median of 34.4. The distribution is wide (std 25.7, IQR 39.7) and slightly right-skewed (0.39) with negative kurtosis (-0.85), indicating a flat, near-uniform spread rather than a tight central mass. About 3.2% of rows are exactly zero and 4.1% are null, but no statistical outliers were flagged.

pct_renter_occupied high anthropic:claude-opus-4-7

Numeric percentage of renter-occupied units, ranging the full 0–100 with mean 62.5 and median 65.6, suggesting these records skew toward rental-heavy geographies. Spread is wide (std 25.7, IQR 39.7) and the distribution is mildly left-skewed (-0.39) and flat (kurtosis -0.85), so no outliers were flagged. About 4.1% of rows are null and only 0.27% are exact zeros, with 823 distinct values across 2,327 rows.

Numeric correlation

total_renter_households numeric

rows2,327
null0 (0.0%)
unique1,418
min0.000
max8,209
mean946.145
median726.000
std815.372
q1346.000
q31,357
iqr1,011
skew1.595
kurtosis4.627
n_outliers69
outlier_rate0.030
zero_rate0.044

rent_30_to_34_9_pct numeric

skew=+2.76 5.3% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique355
min0.000
max1,205
mean83.050
median51.000
std100.320
q115.000
q3116.000
iqr101.000
skew2.755
kurtosis13.860
n_outliers124
outlier_rate0.053
zero_rate0.162

rent_35_to_39_9_pct numeric

skew=+2.40
rows2,327
null0 (0.0%)
unique270
min0.000
max633.000
mean58.351
median35.000
std69.848
q110.000
q383.000
iqr73.000
skew2.395
kurtosis9.275
n_outliers110
outlier_rate0.047
zero_rate0.196

rent_40_to_49_9_pct numeric

skew=+2.14
rows2,327
null0 (0.0%)
unique322
min0.000
max740.000
mean74.676
median49.000
std83.794
q114.000
q3106.000
iqr92.000
skew2.137
kurtosis7.139
n_outliers111
outlier_rate0.048
zero_rate0.156

rent_50_pct_or_more numeric

rows2,327
null0 (0.0%)
unique706
min0.000
max1,918
mean253.181
median184.000
std236.597
q182.000
q3360.000
iqr278.000
skew1.603
kurtosis3.435
n_outliers87
outlier_rate0.037
zero_rate0.063

NAME text

100.0% of rows are unique strings
rows2,327
null0 (0.0%)
unique2,327
len_min38
len_max46
len_mean41.649
len_median41.000
len_p9546.000
word_mean7.133
word_median7.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,539
readability_flesch_mean91.451
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Census Tract 4; Bronx County; New York
  2. Census Tract 399.01; Queens County; New York
  3. Census Tract 779.08; Queens County; New York
  4. Census Tract 613.02; Queens County; New York
  5. Census Tract 780; Kings County; New York
  6. Census Tract 156.02; Richmond County; New York
  7. Census Tract 848; Kings County; New York
  8. Census Tract 1008.04; Queens County; New York
  9. Census Tract 618; Queens County; New York
  10. Census Tract 145; Bronx County; New York

state numeric

only one distinct value
rows2,327
null0 (0.0%)
unique1
min36.000
max36.000
mean36.000
median36.000
std0.000
q136.000
q336.000
iqr0.000
skew0.000
kurtosis0.000
n_outliers0
outlier_rate0.000
zero_rate0.000

county numeric

rows2,327
null0 (0.0%)
unique5
min5.000
max85.000
mean55.000
median47.000
std25.969
q147.000
q381.000
iqr34.000
skew-0.720
kurtosis-0.453
n_outliers0
outlier_rate0.000
zero_rate0.000

tract numeric

skew=+10.14
rows2,327
null0 (0.0%)
unique1,530
min100.000
max990,100
mean42,252
median30,100
std48,265
q115,200
q357,900
iqr42,700
skew10.143
kurtosis189.824
n_outliers63
outlier_rate0.027
zero_rate0.000

county_name categorical

rows2,327
null0 (0.0%)
unique5
top_valueBrooklyn (Kings)
top_rate0.346
cardinality5
entropy2.086
entropy_ratio0.898
Top values (rank 1–20)
  1. Brooklyn (Kings) — 805
  2. Queens — 725
  3. Bronx — 361
  4. Manhattan (New York) — 310
  5. Staten Island (Richmond) — 126

moderate_burden numeric

rows2,327
null0 (0.0%)
unique639
min0.000
max1,732
mean216.076
median159.000
std210.384
q164.000
q3311.000
iqr247.000
skew1.934
kurtosis6.052
n_outliers86
outlier_rate0.037
zero_rate0.064

severe_burden numeric

rows2,327
null0 (0.0%)
unique706
min0.000
max1,918
mean253.181
median184.000
std236.597
q182.000
q3360.000
iqr278.000
skew1.603
kurtosis3.435
n_outliers87
outlier_rate0.037
zero_rate0.063

pct_moderate_burden numeric

rows2,327
null102 (4.4%)
unique461
min0.000
max100.000
mean22.744
median21.800
std11.359
q115.900
q328.200
iqr12.300
skew1.509
kurtosis6.704
n_outliers59
outlier_rate0.027
zero_rate0.021

pct_severe_burden numeric

rows2,327
null102 (4.4%)
unique518
min0.000
max100.000
mean27.124
median26.200
std12.677
q118.700
q334.600
iqr15.900
skew0.566
kurtosis1.222
n_outliers30
outlier_rate0.013
zero_rate0.020

rent_burdened numeric

rows2,327
null0 (0.0%)
unique1,013
min0.000
max3,153
mean469.258
median358.000
std415.279
q1164.500
q3670.000
iqr505.500
skew1.494
kurtosis3.005
n_outliers82
outlier_rate0.035
zero_rate0.047

pct_rent_burdened numeric

rows2,327
null102 (4.4%)
unique596
min0.000
max100.000
mean49.867
median50.000
std14.615
q140.900
q358.800
iqr17.900
skew-0.038
kurtosis0.785
n_outliers62
outlier_rate0.028
zero_rate3.60e-03

median_gross_rent numeric

skew=-3.62 12.4% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique1,232
min-666,666,666
max3,501
mean-41,539,609
median1,735
std161,182,639
q11,442
q32,049
iqr607.500
skew-3.621
kurtosis11.115
n_outliers289
outlier_rate0.124
zero_rate0.000

median_household_income numeric

skew=-3.94 8.9% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique2,106
min-666,666,666
max250,001
mean-36,017,397
median76,833
std150,923,372
q153,242
q3102,360
iqr49,117
skew-3.940
kurtosis13.525
n_outliers208
outlier_rate0.089
zero_rate0.000

total_households numeric

rows2,327
null0 (0.0%)
unique1,495
min0.000
max8,209
mean1,411
median1,252
std923.255
q1773.500
q31,850
iqr1,076
skew1.479
kurtosis4.377
n_outliers70
outlier_rate0.030
zero_rate0.041

owner_occupied numeric

6.1% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique1,001
min0.000
max3,052
mean464.600
median371.000
std422.558
q1177.000
q3608.000
iqr431.000
skew1.761
kurtosis4.254
n_outliers143
outlier_rate0.061
zero_rate0.072

renter_occupied numeric

rows2,327
null0 (0.0%)
unique1,418
min0.000
max8,209
mean946.145
median726.000
std815.372
q1346.000
q31,357
iqr1,011
skew1.595
kurtosis4.627
n_outliers69
outlier_rate0.030
zero_rate0.044

pct_owner_occupied numeric

rows2,327
null96 (4.1%)
unique823
min0.000
max100.000
mean37.513
median34.400
std25.651
q116.400
q356.100
iqr39.700
skew0.395
kurtosis-0.854
n_outliers0
outlier_rate0.000
zero_rate0.032

pct_renter_occupied numeric

rows2,327
null96 (4.1%)
unique823
min0.000
max100.000
mean62.487
median65.600
std25.651
q143.900
q383.600
iqr39.700
skew-0.395
kurtosis-0.854
n_outliers0
outlier_rate0.000
zero_rate2.69e-03