nyc housing nyc median income by tract

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv

Saturn profiled 2,327 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv",
    "--findings", "nyc_housing-nyc_median_income_by_tract.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 2,327 New York City census tracts with median household income, geographic identifiers (state, county, tract), and tract names. The headline issue is median_household_income: it has a minimum of -666,666,666 and a mean of about -36 million, indicating sentinel/missing-value codes that must be filtered before any analysis — the median of $76,833 is the more trustworthy central value. County coverage is uneven, with Brooklyn (Kings) holding 34.6% of tracts and Staten Island only 126, so per-borough comparisons should be normalized. The state column is constant (36 = New York) and can be dropped.

citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.alerts · county_name.top_values · county_name.stats.top_rate · state.alerts · tract.stats.skew

Out[4]:

saturn.schema() · 6 columns

column	kind	n	null%	unique	alerts
median_household_income	numeric	2,327	0.0%	2,106	high_skew outliers
NAME	text	2,327	0.0%	2,327	near_unique
state	numeric	2,327	0.0%	1	constant
county	numeric	2,327	0.0%	5
tract	numeric	2,327	0.0%	1,530	high_skew
county_name	categorical	2,327	0.0%	5

Fig 1.

median_household_income · Look for an extreme negative tail caused by sentinel codes (e.g. -666666666) that distort the mean — filter these before interpreting the distribution.

Show data table

Histogram bins for median_household_income (median: 76833.0).
bin	count
-6.667e+08 – -6.5e+08	126
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.166e+08	0
-6.166e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.666e+08	0
-5.666e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.166e+08	0
-5.166e+08 – -4.999e+08	0
-4.999e+08 – -4.833e+08	0
-4.833e+08 – -4.666e+08	0
-4.666e+08 – -4.499e+08	0
-4.499e+08 – -4.332e+08	0
-4.332e+08 – -4.166e+08	0
-4.166e+08 – -3.999e+08	0
-3.999e+08 – -3.832e+08	0
-3.832e+08 – -3.666e+08	0
-3.666e+08 – -3.499e+08	0
-3.499e+08 – -3.332e+08	0
-3.332e+08 – -3.165e+08	0
-3.165e+08 – -2.999e+08	0
-2.999e+08 – -2.832e+08	0
-2.832e+08 – -2.665e+08	0
-2.665e+08 – -2.498e+08	0
-2.498e+08 – -2.332e+08	0
-2.332e+08 – -2.165e+08	0
-2.165e+08 – -1.998e+08	0
-1.998e+08 – -1.832e+08	0
-1.832e+08 – -1.665e+08	0
-1.665e+08 – -1.498e+08	0
-1.498e+08 – -1.331e+08	0
-1.331e+08 – -1.165e+08	0
-1.165e+08 – -9.979e+07	0
-9.979e+07 – -8.311e+07	0
-8.311e+07 – -6.644e+07	0
-6.644e+07 – -4.977e+07	0
-4.977e+07 – -3.31e+07	0
-3.31e+07 – -1.642e+07	0
-1.642e+07 – 2.5e+05	2201

Fig 2.

county_name · Shows tract counts per borough; Brooklyn dominates while Staten Island has the fewest, which matters when aggregating income by area.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

Fig 3.

county_name · Share of census tracts across the five NYC boroughs at a glance.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

Fig 4.

tract · Tract numbers are highly skewed (skew ≈ 10) — useful as IDs but confirm they aren't being treated as a quantitative measure.

Show data table

Histogram bins for tract (median: 30100.0).
bin	count
100 – 2.485e+04	982
2.485e+04 – 4.96e+04	617
4.96e+04 – 7.435e+04	329
7.435e+04 – 9.91e+04	197
9.91e+04 – 1.238e+05	145
1.238e+05 – 1.486e+05	37
1.486e+05 – 1.734e+05	17
1.734e+05 – 1.981e+05	0
1.981e+05 – 2.228e+05	0
2.228e+05 – 2.476e+05	0
2.476e+05 – 2.724e+05	0
2.724e+05 – 2.971e+05	0
2.971e+05 – 3.218e+05	0
3.218e+05 – 3.466e+05	0
3.466e+05 – 3.714e+05	0
3.714e+05 – 3.961e+05	0
3.961e+05 – 4.208e+05	0
4.208e+05 – 4.456e+05	0
4.456e+05 – 4.704e+05	0
4.704e+05 – 4.951e+05	0
4.951e+05 – 5.198e+05	0
5.198e+05 – 5.446e+05	0
5.446e+05 – 5.694e+05	0
5.694e+05 – 5.941e+05	0
5.941e+05 – 6.188e+05	0
6.188e+05 – 6.436e+05	0
6.436e+05 – 6.684e+05	0
6.684e+05 – 6.931e+05	0
6.931e+05 – 7.178e+05	0
7.178e+05 – 7.426e+05	0
7.426e+05 – 7.674e+05	0
7.674e+05 – 7.921e+05	0
7.921e+05 – 8.168e+05	0
8.168e+05 – 8.416e+05	0
8.416e+05 – 8.664e+05	0
8.664e+05 – 8.911e+05	0
8.911e+05 – 9.158e+05	0
9.158e+05 – 9.406e+05	0
9.406e+05 – 9.654e+05	0
9.654e+05 – 9.901e+05	3

Fig 5.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
median_household_income	numeric	0.0%
NAME	text	0.0%
state	numeric	0.0%
county	numeric	0.0%
tract	numeric	0.0%
county_name	categorical	0.0%

Fig 6.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	median_household_income	state	county	tract
median_household_income	+1.00	+nan	-0.00	-0.01
state	+nan	+nan	+nan	+nan
county	-0.00	+nan	+1.00	+0.18
tract	-0.01	+nan	+0.18	+1.00

median_household_income numeric feature

Likely U.S. median household income in dollars, with median 76833 and IQR spanning 53242.5 to 102359.5. The minimum of -666666666 is a sentinel null code that is poisoning the mean (-36017397.46) and standard deviation (150923371.88), and 208 rows (8.94%) flag as outliers. Skew of -3.94 and kurtosis of 13.53 are entirely artifacts of that sentinel.

Treatment: Replace -666666666 with NaN, then optionally cap at the 250001 top-code before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[12]:

saturn.columns["median_household_income"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	2,106
min	-6.667e+08
max	250,001
mean	-3.602e+07
median	76,833
std	1.509e+08
q1	5.324e+04
q3	1.024e+05
iqr	49,117
skew	-3.94
kurtosis	13.53
n_outliers	208
outlier_rate	0.08939
zero_rate	0
alert: high_skew	skew=-3.94
alert: outliers	8.9% rows beyond 1.5 IQR

Fig 7.

Distribution of median_household_income. Vertical dash marks the median.

Show data table

Histogram bins for median_household_income (median: 76833.0).
bin	count
-6.667e+08 – -6.5e+08	126
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.166e+08	0
-6.166e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.666e+08	0
-5.666e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.166e+08	0
-5.166e+08 – -4.999e+08	0
-4.999e+08 – -4.833e+08	0
-4.833e+08 – -4.666e+08	0
-4.666e+08 – -4.499e+08	0
-4.499e+08 – -4.332e+08	0
-4.332e+08 – -4.166e+08	0
-4.166e+08 – -3.999e+08	0
-3.999e+08 – -3.832e+08	0
-3.832e+08 – -3.666e+08	0
-3.666e+08 – -3.499e+08	0
-3.499e+08 – -3.332e+08	0
-3.332e+08 – -3.165e+08	0
-3.165e+08 – -2.999e+08	0
-2.999e+08 – -2.832e+08	0
-2.832e+08 – -2.665e+08	0
-2.665e+08 – -2.498e+08	0
-2.498e+08 – -2.332e+08	0
-2.332e+08 – -2.165e+08	0
-2.165e+08 – -1.998e+08	0
-1.998e+08 – -1.832e+08	0
-1.832e+08 – -1.665e+08	0
-1.665e+08 – -1.498e+08	0
-1.498e+08 – -1.331e+08	0
-1.331e+08 – -1.165e+08	0
-1.165e+08 – -9.979e+07	0
-9.979e+07 – -8.311e+07	0
-8.311e+07 – -6.644e+07	0
-6.644e+07 – -4.977e+07	0
-4.977e+07 – -3.31e+07	0
-3.31e+07 – -1.642e+07	0
-1.642e+07 – 2.5e+05	2201

NAME text identifier

This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the words 'new', 'york', 'census', 'tract', and 'county;', confirming a rigid template; the borough token (Kings 805, Queens 725, Bronx 361, Richmond 126) is the only meaningful variation. It is effectively a row identifier, not a feature.

Treatment: Drop from modelling; optionally parse the borough token out as a categorical feature.

anthropic:claude-opus-4-7 · confidence high

Out[15]:

saturn.columns["NAME"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	2,327
len_min	38
len_max	46
len_mean	41.65
len_median	41
len_p95	46
word_mean	7.133
word_median	7
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,539
readability_flesch_mean	91.45
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 8.

Character-length distribution for NAME.

Show data table

Character-length distribution for NAME (mean: 41.64890416845724).
chars	count
38 – 38	7
38 – 38	0
38 – 39	0
39 – 39	0
39 – 39	0
39 – 39	104
39 – 39	0
39 – 40	0
40 – 40	0
40 – 40	0
40 – 40	785
40 – 40	0
40 – 41	0
41 – 41	0
41 – 41	0
41 – 41	447
41 – 41	0
41 – 42	0
42 – 42	0
42 – 42	0
42 – 42	200
42 – 42	0
42 – 43	0
43 – 43	0
43 – 43	0
43 – 43	378
43 – 43	0
43 – 44	0
44 – 44	0
44 – 44	0
44 – 44	190
44 – 44	0
44 – 45	0
45 – 45	0
45 – 45	0
45 – 45	82
45 – 45	0
45 – 46	0
46 – 46	0
46 – 46	134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or processing-state code captured during a single-state extract.

Treatment: Drop; constant column with no predictive signal.

anthropic:claude-opus-4-7 · confidence high

Out[18]:

saturn.columns["state"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1
min	36
max	36
mean	36
median	36
std	0
q1	36
q3	36
iqr	0
skew	0
kurtosis	0
n_outliers	0
outlier_rate	0
zero_rate	0
alert: constant	only one distinct value

Fig 9.

Distribution of state. Vertical dash marks the median.

Show data table

Histogram bins for state (median: 36.0).
bin	count
35.5 – 35.52	0
35.52 – 35.55	0
35.55 – 35.58	0
35.58 – 35.6	0
35.6 – 35.62	0
35.62 – 35.65	0
35.65 – 35.67	0
35.67 – 35.7	0
35.7 – 35.73	0
35.73 – 35.75	0
35.75 – 35.77	0
35.77 – 35.8	0
35.8 – 35.83	0
35.83 – 35.85	0
35.85 – 35.88	0
35.88 – 35.9	0
35.9 – 35.92	0
35.92 – 35.95	0
35.95 – 35.98	0
35.98 – 36	0
36 – 36.02	2327
36.02 – 36.05	0
36.05 – 36.08	0
36.08 – 36.1	0
36.1 – 36.12	0
36.12 – 36.15	0
36.15 – 36.17	0
36.17 – 36.2	0
36.2 – 36.23	0
36.23 – 36.25	0
36.25 – 36.27	0
36.27 – 36.3	0
36.3 – 36.33	0
36.33 – 36.35	0
36.35 – 36.38	0
36.38 – 36.4	0
36.4 – 36.42	0
36.42 – 36.45	0
36.45 – 36.48	0
36.48 – 36.5	0

county numeric feature

Despite being typed as numeric, `county` has only 5 unique values across 2327 rows (5, ?, 47, 81, 85 implied by the quartiles) with no nulls — these are almost certainly FIPS-style county codes rather than a measured quantity. The distribution is left-skewed (skew -0.72) with the median at 47 and Q1 also at 47, meaning at least a quarter of rows share that single code. Treating mean (55.0) or std (25.97) as meaningful would be misleading given the categorical nature.

Treatment: Cast to categorical and one-hot encode before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[21]:

saturn.columns["county"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
min	5
max	85
mean	55
median	47
std	25.97
q1	47
q3	81
iqr	34
skew	-0.72
kurtosis	-0.4531
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 10.

Distribution of county. Vertical dash marks the median.

Show data table

Histogram bins for county (median: 47.0).
bin	count
5 – 7	361
7 – 9	0
9 – 11	0
11 – 13	0
13 – 15	0
15 – 17	0
17 – 19	0
19 – 21	0
21 – 23	0
23 – 25	0
25 – 27	0
27 – 29	0
29 – 31	0
31 – 33	0
33 – 35	0
35 – 37	0
37 – 39	0
39 – 41	0
41 – 43	0
43 – 45	0
45 – 47	0
47 – 49	805
49 – 51	0
51 – 53	0
53 – 55	0
55 – 57	0
57 – 59	0
59 – 61	0
61 – 63	310
63 – 65	0
65 – 67	0
67 – 69	0
69 – 71	0
71 – 73	0
73 – 75	0
75 – 77	0
77 – 79	0
79 – 81	0
81 – 83	725
83 – 85	126

tract numeric identifier

Almost certainly U.S. Census tract codes stored as integers, with 1530 distinct values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 sitting far above the q3 of 57900.5 and median of 30100, producing 63 outliers (2.7%); this is an artifact of tract numbering conventions, not a true numeric magnitude.

Treatment: Treat as a categorical geographic key (zero-pad and join with state/county FIPS); do not use as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[24]:

saturn.columns["tract"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,530
min	100
max	990,100
mean	4.225e+04
median	30,100
std	4.827e+04
q1	15,200
q3	5.79e+04
iqr	4.27e+04
skew	10.14
kurtosis	189.8
n_outliers	63
outlier_rate	0.02707
zero_rate	0
alert: high_skew	skew=+10.14

Fig 11.

Distribution of tract. Vertical dash marks the median.

Show data table

Histogram bins for tract (median: 30100.0).
bin	count
100 – 2.485e+04	982
2.485e+04 – 4.96e+04	617
4.96e+04 – 7.435e+04	329
7.435e+04 – 9.91e+04	197
9.91e+04 – 1.238e+05	145
1.238e+05 – 1.486e+05	37
1.486e+05 – 1.734e+05	17
1.734e+05 – 1.981e+05	0
1.981e+05 – 2.228e+05	0
2.228e+05 – 2.476e+05	0
2.476e+05 – 2.724e+05	0
2.724e+05 – 2.971e+05	0
2.971e+05 – 3.218e+05	0
3.218e+05 – 3.466e+05	0
3.466e+05 – 3.714e+05	0
3.714e+05 – 3.961e+05	0
3.961e+05 – 4.208e+05	0
4.208e+05 – 4.456e+05	0
4.456e+05 – 4.704e+05	0
4.704e+05 – 4.951e+05	0
4.951e+05 – 5.198e+05	0
5.198e+05 – 5.446e+05	0
5.446e+05 – 5.694e+05	0
5.694e+05 – 5.941e+05	0
5.941e+05 – 6.188e+05	0
6.188e+05 – 6.436e+05	0
6.436e+05 – 6.684e+05	0
6.684e+05 – 6.931e+05	0
6.931e+05 – 7.178e+05	0
7.178e+05 – 7.426e+05	0
7.426e+05 – 7.674e+05	0
7.674e+05 – 7.921e+05	0
7.921e+05 – 8.168e+05	0
8.168e+05 – 8.416e+05	0
8.416e+05 – 8.664e+05	0
8.664e+05 – 8.911e+05	0
8.911e+05 – 9.158e+05	0
9.158e+05 – 9.406e+05	0
9.406e+05 – 9.654e+05	0
9.654e+05 – 9.901e+05	3

county_name categorical feature

This column lists the NYC borough/county for each record, with all 5 expected values present across 2327 rows and no nulls. Distribution roughly tracks borough population: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.898 indicates the categories are fairly evenly spread rather than dominated by one value.

Treatment: one-hot encode as a low-cardinality categorical feature.

anthropic:claude-opus-4-7 · confidence high

Out[27]:

saturn.columns["county_name"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
top_value	Brooklyn (Kings)
top_rate	0.3459
cardinality	5
entropy	2.086
entropy_ratio	0.8985

Fig 12.

Top values for county_name.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

How to cite

click to copy

BibTeX

@misc{saturn-nyc-housing-nyc-median-income-by-tract-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: nyc housing nyc median income by tract},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_income_by_tract}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: nyc housing nyc median income by tract. Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_income_by_tract