saturn·

nyc housing nyc median income by tract

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv

Saturn profiled 2,327 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv",
    "--findings", "nyc_housing-nyc_median_income_by_tract.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 2,327 New York City census tracts with median household income, geographic identifiers (state, county, tract), and tract names. The headline issue is median_household_income: it has a minimum of -666,666,666 and a mean of about -36 million, indicating sentinel/missing-value codes that must be filtered before any analysis — the median of $76,833 is the more trustworthy central value. County coverage is uneven, with Brooklyn (Kings) holding 34.6% of tracts and Staten Island only 126, so per-borough comparisons should be normalized. The state column is constant (36 = New York) and can be dropped.

citing: row_count · column_count · median_household_income.stats.min · median_household_income.stats.mean · median_household_income.stats.median · median_household_income.stats.skew · median_household_income.alerts · county_name.top_values · county_name.stats.top_rate · state.alerts · tract.stats.skew

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
median_household_income numeric 2,327 0.0% 2,106 high_skew outliers
NAME text 2,327 0.0% 2,327 near_unique
state numeric 2,327 0.0% 1 constant
county numeric 2,327 0.0% 5
tract numeric 2,327 0.0% 1,530 high_skew
county_name categorical 2,327 0.0% 5
Fig 1.
median_household_income · Look for an extreme negative tail caused by sentinel codes (e.g. -666666666) that distort the mean — filter these before interpreting the distribution.
Show data table
Histogram bins for median_household_income (median: 76833.0).
bincount
-6.667e+08 – -6.5e+08126
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.166e+080
-6.166e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -4.999e+080
-4.999e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.499e+080
-4.499e+08 – -4.332e+080
-4.332e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.832e+080
-3.832e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.332e+080
-3.332e+08 – -3.165e+080
-3.165e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.665e+080
-2.665e+08 – -2.498e+080
-2.498e+08 – -2.332e+080
-2.332e+08 – -2.165e+080
-2.165e+08 – -1.998e+080
-1.998e+08 – -1.832e+080
-1.832e+08 – -1.665e+080
-1.665e+08 – -1.498e+080
-1.498e+08 – -1.331e+080
-1.331e+08 – -1.165e+080
-1.165e+08 – -9.979e+070
-9.979e+07 – -8.311e+070
-8.311e+07 – -6.644e+070
-6.644e+07 – -4.977e+070
-4.977e+07 – -3.31e+070
-3.31e+07 – -1.642e+070
-1.642e+07 – 2.5e+052201
Fig 2.
county_name · Shows tract counts per borough; Brooklyn dominates while Staten Island has the fewest, which matters when aggregating income by area.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%
Fig 3.
county_name · Share of census tracts across the five NYC boroughs at a glance.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%
Fig 4.
tract · Tract numbers are highly skewed (skew ≈ 10) — useful as IDs but confirm they aren't being treated as a quantitative measure.
Show data table
Histogram bins for tract (median: 30100.0).
bincount
100 – 2.485e+04982
2.485e+04 – 4.96e+04617
4.96e+04 – 7.435e+04329
7.435e+04 – 9.91e+04197
9.91e+04 – 1.238e+05145
1.238e+05 – 1.486e+0537
1.486e+05 – 1.734e+0517
1.734e+05 – 1.981e+050
1.981e+05 – 2.228e+050
2.228e+05 – 2.476e+050
2.476e+05 – 2.724e+050
2.724e+05 – 2.971e+050
2.971e+05 – 3.218e+050
3.218e+05 – 3.466e+050
3.466e+05 – 3.714e+050
3.714e+05 – 3.961e+050
3.961e+05 – 4.208e+050
4.208e+05 – 4.456e+050
4.456e+05 – 4.704e+050
4.704e+05 – 4.951e+050
4.951e+05 – 5.198e+050
5.198e+05 – 5.446e+050
5.446e+05 – 5.694e+050
5.694e+05 – 5.941e+050
5.941e+05 – 6.188e+050
6.188e+05 – 6.436e+050
6.436e+05 – 6.684e+050
6.684e+05 – 6.931e+050
6.931e+05 – 7.178e+050
7.178e+05 – 7.426e+050
7.426e+05 – 7.674e+050
7.674e+05 – 7.921e+050
7.921e+05 – 8.168e+050
8.168e+05 – 8.416e+050
8.416e+05 – 8.664e+050
8.664e+05 – 8.911e+050
8.911e+05 – 9.158e+050
9.158e+05 – 9.406e+050
9.406e+05 – 9.654e+050
9.654e+05 – 9.901e+053
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
median_household_incomenumeric0.0%
NAMEtext0.0%
statenumeric0.0%
countynumeric0.0%
tractnumeric0.0%
county_namecategorical0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
median_household_incomestatecountytract
median_household_income+1.00+nan-0.00-0.01
state+nan+nan+nan+nan
county-0.00+nan+1.00+0.18
tract-0.01+nan+0.18+1.00

median_household_income numeric feature

Likely U.S. median household income in dollars, with median 76833 and IQR spanning 53242.5 to 102359.5. The minimum of -666666666 is a sentinel null code that is poisoning the mean (-36017397.46) and standard deviation (150923371.88), and 208 rows (8.94%) flag as outliers. Skew of -3.94 and kurtosis of 13.53 are entirely artifacts of that sentinel.

Treatment: Replace -666666666 with NaN, then optionally cap at the 250001 top-code before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["median_household_income"].stats

statvalue
n2,327
nulls0 (0.0%)
unique2,106
min -6.667e+08
max 250,001
mean -3.602e+07
median 76,833
std 1.509e+08
q1 5.324e+04
q3 1.024e+05
iqr 49,117
skew -3.94
kurtosis 13.53
n_outliers 208
outlier_rate 0.08939
zero_rate 0
alert: high_skewskew=-3.94
alert: outliers8.9% rows beyond 1.5 IQR
Fig 7.
Distribution of median_household_income. Vertical dash marks the median.
Show data table
Histogram bins for median_household_income (median: 76833.0).
bincount
-6.667e+08 – -6.5e+08126
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.166e+080
-6.166e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.666e+080
-5.666e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.166e+080
-5.166e+08 – -4.999e+080
-4.999e+08 – -4.833e+080
-4.833e+08 – -4.666e+080
-4.666e+08 – -4.499e+080
-4.499e+08 – -4.332e+080
-4.332e+08 – -4.166e+080
-4.166e+08 – -3.999e+080
-3.999e+08 – -3.832e+080
-3.832e+08 – -3.666e+080
-3.666e+08 – -3.499e+080
-3.499e+08 – -3.332e+080
-3.332e+08 – -3.165e+080
-3.165e+08 – -2.999e+080
-2.999e+08 – -2.832e+080
-2.832e+08 – -2.665e+080
-2.665e+08 – -2.498e+080
-2.498e+08 – -2.332e+080
-2.332e+08 – -2.165e+080
-2.165e+08 – -1.998e+080
-1.998e+08 – -1.832e+080
-1.832e+08 – -1.665e+080
-1.665e+08 – -1.498e+080
-1.498e+08 – -1.331e+080
-1.331e+08 – -1.165e+080
-1.165e+08 – -9.979e+070
-9.979e+07 – -8.311e+070
-8.311e+07 – -6.644e+070
-6.644e+07 – -4.977e+070
-4.977e+07 – -3.31e+070
-3.31e+07 – -1.642e+070
-1.642e+07 – 2.5e+052201

NAME text identifier

This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the words 'new', 'york', 'census', 'tract', and 'county;', confirming a rigid template; the borough token (Kings 805, Queens 725, Bronx 361, Richmond 126) is the only meaningful variation. It is effectively a row identifier, not a feature.

Treatment: Drop from modelling; optionally parse the borough token out as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["NAME"].stats

statvalue
n2,327
nulls0 (0.0%)
unique2,327
len_min 38
len_max 46
len_mean 41.65
len_median 41
len_p95 46
word_mean 7.133
word_median 7
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,539
readability_flesch_mean 91.45
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 8.
Character-length distribution for NAME.
Show data table
Character-length distribution for NAME (mean: 41.64890416845724).
charscount
38 – 387
38 – 380
38 – 390
39 – 390
39 – 390
39 – 39104
39 – 390
39 – 400
40 – 400
40 – 400
40 – 40785
40 – 400
40 – 410
41 – 410
41 – 410
41 – 41447
41 – 410
41 – 420
42 – 420
42 – 420
42 – 42200
42 – 420
42 – 430
43 – 430
43 – 430
43 – 43378
43 – 430
43 – 440
44 – 440
44 – 440
44 – 44190
44 – 440
44 – 450
45 – 450
45 – 450
45 – 4582
45 – 450
45 – 460
46 – 460
46 – 46134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or processing-state code captured during a single-state extract.

Treatment: Drop; constant column with no predictive signal.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["state"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1
min 36
max 36
mean 36
median 36
std 0
q1 36
q3 36
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 9.
Distribution of state. Vertical dash marks the median.
Show data table
Histogram bins for state (median: 36.0).
bincount
35.5 – 35.520
35.52 – 35.550
35.55 – 35.580
35.58 – 35.60
35.6 – 35.620
35.62 – 35.650
35.65 – 35.670
35.67 – 35.70
35.7 – 35.730
35.73 – 35.750
35.75 – 35.770
35.77 – 35.80
35.8 – 35.830
35.83 – 35.850
35.85 – 35.880
35.88 – 35.90
35.9 – 35.920
35.92 – 35.950
35.95 – 35.980
35.98 – 360
36 – 36.022327
36.02 – 36.050
36.05 – 36.080
36.08 – 36.10
36.1 – 36.120
36.12 – 36.150
36.15 – 36.170
36.17 – 36.20
36.2 – 36.230
36.23 – 36.250
36.25 – 36.270
36.27 – 36.30
36.3 – 36.330
36.33 – 36.350
36.35 – 36.380
36.38 – 36.40
36.4 – 36.420
36.42 – 36.450
36.45 – 36.480
36.48 – 36.50

county numeric feature

Despite being typed as numeric, `county` has only 5 unique values across 2327 rows (5, ?, 47, 81, 85 implied by the quartiles) with no nulls — these are almost certainly FIPS-style county codes rather than a measured quantity. The distribution is left-skewed (skew -0.72) with the median at 47 and Q1 also at 47, meaning at least a quarter of rows share that single code. Treating mean (55.0) or std (25.97) as meaningful would be misleading given the categorical nature.

Treatment: Cast to categorical and one-hot encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["county"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
min 5
max 85
mean 55
median 47
std 25.97
q1 47
q3 81
iqr 34
skew -0.72
kurtosis -0.4531
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of county. Vertical dash marks the median.
Show data table
Histogram bins for county (median: 47.0).
bincount
5 – 7361
7 – 90
9 – 110
11 – 130
13 – 150
15 – 170
17 – 190
19 – 210
21 – 230
23 – 250
25 – 270
27 – 290
29 – 310
31 – 330
33 – 350
35 – 370
37 – 390
39 – 410
41 – 430
43 – 450
45 – 470
47 – 49805
49 – 510
51 – 530
53 – 550
55 – 570
57 – 590
59 – 610
61 – 63310
63 – 650
65 – 670
67 – 690
69 – 710
71 – 730
73 – 750
75 – 770
77 – 790
79 – 810
81 – 83725
83 – 85126

tract numeric identifier

Almost certainly U.S. Census tract codes stored as integers, with 1530 distinct values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 sitting far above the q3 of 57900.5 and median of 30100, producing 63 outliers (2.7%); this is an artifact of tract numbering conventions, not a true numeric magnitude.

Treatment: Treat as a categorical geographic key (zero-pad and join with state/county FIPS); do not use as a numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[24]:

saturn.columns["tract"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,530
min 100
max 990,100
mean 4.225e+04
median 30,100
std 4.827e+04
q1 15,200
q3 5.79e+04
iqr 4.27e+04
skew 10.14
kurtosis 189.8
n_outliers 63
outlier_rate 0.02707
zero_rate 0
alert: high_skewskew=+10.14
Fig 11.
Distribution of tract. Vertical dash marks the median.
Show data table
Histogram bins for tract (median: 30100.0).
bincount
100 – 2.485e+04982
2.485e+04 – 4.96e+04617
4.96e+04 – 7.435e+04329
7.435e+04 – 9.91e+04197
9.91e+04 – 1.238e+05145
1.238e+05 – 1.486e+0537
1.486e+05 – 1.734e+0517
1.734e+05 – 1.981e+050
1.981e+05 – 2.228e+050
2.228e+05 – 2.476e+050
2.476e+05 – 2.724e+050
2.724e+05 – 2.971e+050
2.971e+05 – 3.218e+050
3.218e+05 – 3.466e+050
3.466e+05 – 3.714e+050
3.714e+05 – 3.961e+050
3.961e+05 – 4.208e+050
4.208e+05 – 4.456e+050
4.456e+05 – 4.704e+050
4.704e+05 – 4.951e+050
4.951e+05 – 5.198e+050
5.198e+05 – 5.446e+050
5.446e+05 – 5.694e+050
5.694e+05 – 5.941e+050
5.941e+05 – 6.188e+050
6.188e+05 – 6.436e+050
6.436e+05 – 6.684e+050
6.684e+05 – 6.931e+050
6.931e+05 – 7.178e+050
7.178e+05 – 7.426e+050
7.426e+05 – 7.674e+050
7.674e+05 – 7.921e+050
7.921e+05 – 8.168e+050
8.168e+05 – 8.416e+050
8.416e+05 – 8.664e+050
8.664e+05 – 8.911e+050
8.911e+05 – 9.158e+050
9.158e+05 – 9.406e+050
9.406e+05 – 9.654e+050
9.654e+05 – 9.901e+053

county_name categorical feature

This column lists the NYC borough/county for each record, with all 5 expected values present across 2327 rows and no nulls. Distribution roughly tracks borough population: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.898 indicates the categories are fairly evenly spread rather than dominated by one value.

Treatment: one-hot encode as a low-cardinality categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[27]:

saturn.columns["county_name"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
top_value Brooklyn (Kings)
top_rate 0.3459
cardinality 5
entropy 2.086
entropy_ratio 0.8985
Fig 12.
Top values for county_name.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%

How to cite

click to copy

BibTeX
@misc{saturn-nyc-housing-nyc-median-income-by-tract-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: nyc housing nyc median income by tract},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_income_by_tract}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: nyc housing nyc median income by tract. Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_income_by_tract