saturn·

nyc housing nyc median income by tract

source /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv 2,327 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 2,327 New York City census tracts with median household income, geographic identifiers (state, county, tract), and tract names. The headline issue is median_household_income: it has a minimum of -666,666,666 and a mean of about -36 million, indicating sentinel/missing-value codes that must be filtered before any analysis — the median of $76,833 is the more trustworthy central value. County coverage is uneven, with Brooklyn (Kings) holding 34.6% of tracts and Staten Island only 126, so per-borough comparisons should be normalized. The state column is constant (36 = New York) and can be dropped.

citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.alerts · county_name.top_values · county_name.stats.top_rate · state.alerts · tract.stats.skew

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
median_household_income numeric 0.0% 2,106
high_skew outliers
NAME text 0.0% 2,327
near_unique
state numeric 0.0% 1
constant
county numeric 0.0% 5
tract numeric 0.0% 1,530
high_skew
county_name categorical 0.0% 5

median_household_income

numeric feature high_skew outliers
Likely U.S. median household income in dollars, with median 76833 and IQR spanning 53242.5 to 102359.5. The minimum of -666666666 is a sentinel null code that is poisoning the mean (-36017397.46) and standard deviation (150923371.88), and 208 rows (8.94%) flag as outliers. Skew of -3.94 and kurtosis of 13.53 are entirely artifacts of that sentinel. Treatment: Replace -666666666 with NaN, then optionally cap at the 250001 top-code before modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
2,106
min
-6.667e+08
max
250,001
mean
-3.602e+07
median
76,833
std
1.509e+08
q1
5.324e+04
q3
1.024e+05
iqr
49,117
skew
-3.94
kurtosis
13.53
n_outliers
208
outlier_rate
0.08939
zero_rate
0

NAME

text identifier near_unique
This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the words 'new', 'york', 'census', 'tract', and 'county;', confirming a rigid template; the borough token (Kings 805, Queens 725, Bronx 361, Richmond 126) is the only meaningful variation. It is effectively a row identifier, not a feature. Treatment: Drop from modelling; optionally parse the borough token out as a categorical feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
2,327
len_min
38
len_max
46
len_mean
41.65
len_median
41
len_p95
46
word_mean
7.133
word_median
7
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,539
readability_flesch_mean
91.45
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

numeric metadata constant
The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or processing-state code captured during a single-state extract. Treatment: Drop; constant column with no predictive signal. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1
min
36
max
36
mean
36
median
36
std
0
q1
36
q3
36
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric feature
Despite being typed as numeric, `county` has only 5 unique values across 2327 rows (5, ?, 47, 81, 85 implied by the quartiles) with no nulls — these are almost certainly FIPS-style county codes rather than a measured quantity. The distribution is left-skewed (skew -0.72) with the median at 47 and Q1 also at 47, meaning at least a quarter of rows share that single code. Treating mean (55.0) or std (25.97) as meaningful would be misleading given the categorical nature. Treatment: Cast to categorical and one-hot encode before modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
min
5
max
85
mean
55
median
47
std
25.97
q1
47
q3
81
iqr
34
skew
-0.72
kurtosis
-0.4531
n_outliers
0
outlier_rate
0
zero_rate
0

tract

numeric identifier high_skew
Almost certainly U.S. Census tract codes stored as integers, with 1530 distinct values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 sitting far above the q3 of 57900.5 and median of 30100, producing 63 outliers (2.7%); this is an artifact of tract numbering conventions, not a true numeric magnitude. Treatment: Treat as a categorical geographic key (zero-pad and join with state/county FIPS); do not use as a numeric feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,530
min
100
max
990,100
mean
4.225e+04
median
30,100
std
4.827e+04
q1
15,200
q3
5.79e+04
iqr
4.27e+04
skew
10.14
kurtosis
189.8
n_outliers
63
outlier_rate
0.02707
zero_rate
0

county_name

categorical feature
This column lists the NYC borough/county for each record, with all 5 expected values present across 2327 rows and no nulls. Distribution roughly tracks borough population: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.898 indicates the categories are fairly evenly spread rather than dominated by one value. Treatment: one-hot encode as a low-cardinality categorical feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
top_value
Brooklyn (Kings)
top_rate
0.3459
cardinality
5
entropy
2.086
entropy_ratio
0.8985