saturn·

nyc housing nyc median rent by tract

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv

Saturn profiled 2,327 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv",
    "--findings", "nyc_housing-nyc_median_rent_by_tract.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 2,327 New York City census tracts with median gross rent values across the five boroughs. The most important issue to investigate is median_gross_rent: it has a minimum of -666,666,666 and a mean of about -41.5 million, indicating sentinel values for missing data that must be filtered before any analysis — once cleaned, the median rent of $1,735 and IQR of $1,441–$2,049 are the realistic figures. The county_name field is well-distributed across five boroughs, with Brooklyn (Kings) the largest at 805 tracts (34.6%) and Staten Island the smallest at 126. Note that 'state' is constant (all 36, New York) and can be ignored, and 'NAME' is a unique tract label rather than an analytical field.

citing: median_gross_rent.stats.min · median_gross_rent.stats.mean · median_gross_rent.stats.median · median_gross_rent.stats.q1 · median_gross_rent.stats.q3 · median_gross_rent.alerts · county_name.top_values · county_name.stats.top_rate · county_name.stats.cardinality · state.stats.min · state.stats.max · row_count

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
median_gross_rent numeric 2,327 0.0% 1,232 high_skew outliers
NAME text 2,327 0.0% 2,327 near_unique
state numeric 2,327 0.0% 1 constant
county numeric 2,327 0.0% 5
tract numeric 2,327 0.0% 1,530 high_skew
county_name categorical 2,327 0.0% 5
Fig 1.
county_name · Distribution of census tracts across the five NYC boroughs, with Brooklyn leading.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%
Fig 2.
median_gross_rent · Rent distribution — expect to see extreme negative sentinel values that need filtering before the real $1,400–$2,000 range becomes visible.
Show data table
Histogram bins for median_gross_rent (median: 1735.0).
bincount
-6.667e+08 – -6.5e+08145
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.667e+080
-5.667e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.167e+080
-5.167e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.667e+080
-4.667e+08 – -4.5e+080
-4.5e+08 – -4.333e+080
-4.333e+08 – -4.167e+080
-4.167e+08 – -4e+080
-4e+08 – -3.833e+080
-3.833e+08 – -3.667e+080
-3.667e+08 – -3.5e+080
-3.5e+08 – -3.333e+080
-3.333e+08 – -3.167e+080
-3.167e+08 – -3e+080
-3e+08 – -2.833e+080
-2.833e+08 – -2.667e+080
-2.667e+08 – -2.5e+080
-2.5e+08 – -2.333e+080
-2.333e+08 – -2.167e+080
-2.167e+08 – -2e+080
-2e+08 – -1.833e+080
-1.833e+08 – -1.667e+080
-1.667e+08 – -1.5e+080
-1.5e+08 – -1.333e+080
-1.333e+08 – -1.167e+080
-1.167e+08 – -1e+080
-1e+08 – -8.333e+070
-8.333e+07 – -6.666e+070
-6.666e+07 – -5e+070
-5e+07 – -3.333e+070
-3.333e+07 – -1.666e+070
-1.666e+07 – 35012182
Fig 3.
county_name · Share of tracts per borough; Brooklyn and Queens together account for over 65%.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%
Fig 4.
tract · Tract ID distribution is highly skewed (skew ≈ 10) due to a few very large tract numbers.
Show data table
Histogram bins for tract (median: 30100.0).
bincount
100 – 2.485e+04982
2.485e+04 – 4.96e+04617
4.96e+04 – 7.435e+04329
7.435e+04 – 9.91e+04197
9.91e+04 – 1.238e+05145
1.238e+05 – 1.486e+0537
1.486e+05 – 1.734e+0517
1.734e+05 – 1.981e+050
1.981e+05 – 2.228e+050
2.228e+05 – 2.476e+050
2.476e+05 – 2.724e+050
2.724e+05 – 2.971e+050
2.971e+05 – 3.218e+050
3.218e+05 – 3.466e+050
3.466e+05 – 3.714e+050
3.714e+05 – 3.961e+050
3.961e+05 – 4.208e+050
4.208e+05 – 4.456e+050
4.456e+05 – 4.704e+050
4.704e+05 – 4.951e+050
4.951e+05 – 5.198e+050
5.198e+05 – 5.446e+050
5.446e+05 – 5.694e+050
5.694e+05 – 5.941e+050
5.941e+05 – 6.188e+050
6.188e+05 – 6.436e+050
6.436e+05 – 6.684e+050
6.684e+05 – 6.931e+050
6.931e+05 – 7.178e+050
7.178e+05 – 7.426e+050
7.426e+05 – 7.674e+050
7.674e+05 – 7.921e+050
7.921e+05 – 8.168e+050
8.168e+05 – 8.416e+050
8.416e+05 – 8.664e+050
8.664e+05 – 8.911e+050
8.911e+05 – 9.158e+050
9.158e+05 – 9.406e+050
9.406e+05 – 9.654e+050
9.654e+05 – 9.901e+053
Fig 5.
NAME · Tract name lengths cluster tightly between 38 and 46 characters, confirming a uniform naming format.
Show data table
Character-length distribution for NAME (mean: 41.64890416845724).
charscount
38 – 387
38 – 380
38 – 390
39 – 390
39 – 390
39 – 39104
39 – 390
39 – 400
40 – 400
40 – 400
40 – 40785
40 – 400
40 – 410
41 – 410
41 – 410
41 – 41447
41 – 410
41 – 420
42 – 420
42 – 420
42 – 42200
42 – 420
42 – 430
43 – 430
43 – 430
43 – 43378
43 – 430
43 – 440
44 – 440
44 – 440
44 – 44190
44 – 440
44 – 450
45 – 450
45 – 450
45 – 4582
45 – 450
45 – 460
46 – 460
46 – 46134
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
median_gross_rentnumeric0.0%
NAMEtext0.0%
statenumeric0.0%
countynumeric0.0%
tractnumeric0.0%
county_namecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
median_gross_rentstatecountytract
median_gross_rent+1.00+nan-0.03-0.03
state+nan+nan+nan+nan
county-0.03+nan+1.00+0.18
tract-0.03+nan+0.18+1.00

median_gross_rent numeric feature

Median gross rent per geography, with a typical value around $1,735 (IQR $1,441.5–$2,049). The column is contaminated by sentinel values: the min of -666666666 drags the mean to -41539608.82 and inflates std to 1.6e8, producing skew of -3.62 and 12.4% flagged outliers. Once sentinels are removed, the real distribution looks tight and plausible for US rents capped near $3,501.

Treatment: Replace -666666666 sentinel with null, then consider winsorizing or log-transforming before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["median_gross_rent"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,232
min -6.667e+08
max 3,501
mean -4.154e+07
median 1,735
std 1.612e+08
q1 1442
q3 2,049
iqr 607.5
skew -3.621
kurtosis 11.11
n_outliers 289
outlier_rate 0.1242
zero_rate 0
alert: high_skewskew=-3.62
alert: outliers12.4% rows beyond 1.5 IQR
Fig 8.
Distribution of median_gross_rent. Vertical dash marks the median.
Show data table
Histogram bins for median_gross_rent (median: 1735.0).
bincount
-6.667e+08 – -6.5e+08145
-6.5e+08 – -6.333e+080
-6.333e+08 – -6.167e+080
-6.167e+08 – -6e+080
-6e+08 – -5.833e+080
-5.833e+08 – -5.667e+080
-5.667e+08 – -5.5e+080
-5.5e+08 – -5.333e+080
-5.333e+08 – -5.167e+080
-5.167e+08 – -5e+080
-5e+08 – -4.833e+080
-4.833e+08 – -4.667e+080
-4.667e+08 – -4.5e+080
-4.5e+08 – -4.333e+080
-4.333e+08 – -4.167e+080
-4.167e+08 – -4e+080
-4e+08 – -3.833e+080
-3.833e+08 – -3.667e+080
-3.667e+08 – -3.5e+080
-3.5e+08 – -3.333e+080
-3.333e+08 – -3.167e+080
-3.167e+08 – -3e+080
-3e+08 – -2.833e+080
-2.833e+08 – -2.667e+080
-2.667e+08 – -2.5e+080
-2.5e+08 – -2.333e+080
-2.333e+08 – -2.167e+080
-2.167e+08 – -2e+080
-2e+08 – -1.833e+080
-1.833e+08 – -1.667e+080
-1.667e+08 – -1.5e+080
-1.5e+08 – -1.333e+080
-1.333e+08 – -1.167e+080
-1.167e+08 – -1e+080
-1e+08 – -8.333e+070
-8.333e+07 – -6.666e+070
-6.666e+07 – -5e+070
-5e+07 – -3.333e+070
-3.333e+07 – -1.666e+070
-1.666e+07 – 35012182

NAME text identifier

This column holds fully-qualified names of New York City census tracts, one per row (e.g. 'Census Tract ...; Kings County; New York'). Every one of the 2327 values is unique with zero nulls and tightly bounded length (38-46 chars, mean 41.6 words≈7), and the top words confirm the five NYC boroughs: Kings (805), Queens (725), Bronx (361), Richmond (126), with Manhattan/New York making up the remainder. It is effectively a row identifier rather than a modelling feature.

Treatment: Drop from modelling; retain as a join key or parse out the borough/tract components if geography is needed.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["NAME"].stats

statvalue
n2,327
nulls0 (0.0%)
unique2,327
len_min 38
len_max 46
len_mean 41.65
len_median 41
len_p95 46
word_mean 7.133
word_median 7
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,539
readability_flesch_mean 91.45
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 9.
Character-length distribution for NAME.
Show data table
Character-length distribution for NAME (mean: 41.64890416845724).
charscount
38 – 387
38 – 380
38 – 390
39 – 390
39 – 390
39 – 39104
39 – 390
39 – 400
40 – 400
40 – 400
40 – 40785
40 – 400
40 – 410
41 – 410
41 – 410
41 – 41447
41 – 410
41 – 420
42 – 420
42 – 420
42 – 42200
42 – 420
42 – 430
43 – 430
43 – 430
43 – 43378
43 – 430
43 – 440
44 – 440
44 – 440
44 – 44190
44 – 440
44 – 450
45 – 450
45 – 450
45 – 4582
45 – 450
45 – 460
46 – 460
46 – 46134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and zero nulls. This is a constant field carrying no information for modelling, likely a leftover state code from an upstream filter or partition.

Treatment: Drop; constant column provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["state"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1
min 36
max 36
mean 36
median 36
std 0
q1 36
q3 36
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 10.
Distribution of state. Vertical dash marks the median.
Show data table
Histogram bins for state (median: 36.0).
bincount
35.5 – 35.520
35.52 – 35.550
35.55 – 35.580
35.58 – 35.60
35.6 – 35.620
35.62 – 35.650
35.65 – 35.670
35.67 – 35.70
35.7 – 35.730
35.73 – 35.750
35.75 – 35.770
35.77 – 35.80
35.8 – 35.830
35.83 – 35.850
35.85 – 35.880
35.88 – 35.90
35.9 – 35.920
35.92 – 35.950
35.95 – 35.980
35.98 – 360
36 – 36.022327
36.02 – 36.050
36.05 – 36.080
36.08 – 36.10
36.1 – 36.120
36.12 – 36.150
36.15 – 36.170
36.17 – 36.20
36.2 – 36.230
36.23 – 36.250
36.25 – 36.270
36.27 – 36.30
36.3 – 36.330
36.33 – 36.350
36.35 – 36.380
36.38 – 36.40
36.4 – 36.420
36.42 – 36.450
36.45 – 36.480
36.48 – 36.50

county numeric identifier

This column holds numeric county codes (likely FIPS-style identifiers), with only 5 unique values across 2327 rows and no nulls. Despite being labelled numeric, the values 5, 47, 81, 85 etc. are categorical labels — the reported mean of 55.0 and std of 25.97 are not meaningful. The distribution is concentrated in the upper end (median 47, Q3 81), giving a negative skew of -0.72.

Treatment: Cast to categorical and one-hot or target-encode; do not treat as a continuous feature.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["county"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
min 5
max 85
mean 55
median 47
std 25.97
q1 47
q3 81
iqr 34
skew -0.72
kurtosis -0.4531
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 11.
Distribution of county. Vertical dash marks the median.
Show data table
Histogram bins for county (median: 47.0).
bincount
5 – 7361
7 – 90
9 – 110
11 – 130
13 – 150
15 – 170
17 – 190
19 – 210
21 – 230
23 – 250
25 – 270
27 – 290
29 – 310
31 – 330
33 – 350
35 – 370
37 – 390
39 – 410
41 – 430
43 – 450
45 – 470
47 – 49805
49 – 510
51 – 530
53 – 550
55 – 570
57 – 590
59 – 610
61 – 63310
63 – 650
65 – 670
67 – 690
69 – 710
71 – 730
73 – 750
75 – 770
77 – 790
79 – 810
81 – 83725
83 – 85126

tract numeric identifier

This is almost certainly a U.S. Census tract code rather than a true numeric measurement, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a max of 990100 sitting far above the median of 30100, which is expected behavior for tract identifiers and triggered the high_skew alert. The 63 flagged outliers (2.7%) reflect tract-numbering conventions, not data errors.

Treatment: Cast to string and treat as a categorical/geographic key; do not use as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["tract"].stats

statvalue
n2,327
nulls0 (0.0%)
unique1,530
min 100
max 990,100
mean 4.225e+04
median 30,100
std 4.827e+04
q1 15,200
q3 5.79e+04
iqr 4.27e+04
skew 10.14
kurtosis 189.8
n_outliers 63
outlier_rate 0.02707
zero_rate 0
alert: high_skewskew=+10.14
Fig 12.
Distribution of tract. Vertical dash marks the median.
Show data table
Histogram bins for tract (median: 30100.0).
bincount
100 – 2.485e+04982
2.485e+04 – 4.96e+04617
4.96e+04 – 7.435e+04329
7.435e+04 – 9.91e+04197
9.91e+04 – 1.238e+05145
1.238e+05 – 1.486e+0537
1.486e+05 – 1.734e+0517
1.734e+05 – 1.981e+050
1.981e+05 – 2.228e+050
2.228e+05 – 2.476e+050
2.476e+05 – 2.724e+050
2.724e+05 – 2.971e+050
2.971e+05 – 3.218e+050
3.218e+05 – 3.466e+050
3.466e+05 – 3.714e+050
3.714e+05 – 3.961e+050
3.961e+05 – 4.208e+050
4.208e+05 – 4.456e+050
4.456e+05 – 4.704e+050
4.704e+05 – 4.951e+050
4.951e+05 – 5.198e+050
5.198e+05 – 5.446e+050
5.446e+05 – 5.694e+050
5.694e+05 – 5.941e+050
5.941e+05 – 6.188e+050
6.188e+05 – 6.436e+050
6.436e+05 – 6.684e+050
6.684e+05 – 6.931e+050
6.931e+05 – 7.178e+050
7.178e+05 – 7.426e+050
7.426e+05 – 7.674e+050
7.674e+05 – 7.921e+050
7.921e+05 – 8.168e+050
8.168e+05 – 8.416e+050
8.416e+05 – 8.664e+050
8.664e+05 – 8.911e+050
8.911e+05 – 9.158e+050
9.158e+05 – 9.406e+050
9.406e+05 – 9.654e+050
9.654e+05 – 9.901e+053

county_name categorical feature

This column records NYC borough/county names across 2327 rows with no nulls and only 5 distinct values, matching the five boroughs of New York City. Distribution is uneven but balanced enough to be informative: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126), giving high entropy_ratio of 0.898. Notably, three of the five labels embed parenthetical legal county names (e.g., 'Brooklyn (Kings)'), which will need normalization if joining to standard county tables.

Treatment: One-hot or target-encode after stripping the parenthetical county aliases for clean joins.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["county_name"].stats

statvalue
n2,327
nulls0 (0.0%)
unique5
top_value Brooklyn (Kings)
top_rate 0.3459
cardinality 5
entropy 2.086
entropy_ratio 0.8985
Fig 13.
Top values for county_name.
Show data table
Top values for county_name (5 unique shown, of 5 total).
valuecountshare
Brooklyn (Kings)80534.6%
Queens72531.2%
Bronx36115.5%
Manhattan (New York)31013.3%
Staten Island (Richmond)1265.4%

How to cite

click to copy

BibTeX
@misc{saturn-nyc-housing-nyc-median-rent-by-tract-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: nyc housing nyc median rent by tract},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_rent_by_tract}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: nyc housing nyc median rent by tract. Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_rent_by_tract