nyc housing nyc median rent by tract

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv

Saturn profiled 2,327 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv",
    "--findings", "nyc_housing-nyc_median_rent_by_tract.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 2,327 New York City census tracts with median gross rent values across the five boroughs. The most important issue to investigate is median_gross_rent: it has a minimum of -666,666,666 and a mean of about -41.5 million, indicating sentinel values for missing data that must be filtered before any analysis — once cleaned, the median rent of $1,735 and IQR of $1,441–$2,049 are the realistic figures. The county_name field is well-distributed across five boroughs, with Brooklyn (Kings) the largest at 805 tracts (34.6%) and Staten Island the smallest at 126. Note that 'state' is constant (all 36, New York) and can be ignored, and 'NAME' is a unique tract label rather than an analytical field.

citing: median_gross_rent.stats.min · median_gross_rent.stats.mean · median_gross_rent.stats.median · median_gross_rent.stats.q1 · median_gross_rent.stats.q3 · median_gross_rent.alerts · county_name.top_values · county_name.stats.top_rate · county_name.stats.cardinality · state.stats.min · state.stats.max · row_count

Out[4]:

saturn.schema() · 6 columns

column	kind	n	null%	unique	alerts
median_gross_rent	numeric	2,327	0.0%	1,232	high_skew outliers
NAME	text	2,327	0.0%	2,327	near_unique
state	numeric	2,327	0.0%	1	constant
county	numeric	2,327	0.0%	5
tract	numeric	2,327	0.0%	1,530	high_skew
county_name	categorical	2,327	0.0%	5

Fig 1.

county_name · Distribution of census tracts across the five NYC boroughs, with Brooklyn leading.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

Fig 2.

median_gross_rent · Rent distribution — expect to see extreme negative sentinel values that need filtering before the real $1,400–$2,000 range becomes visible.

Show data table

Histogram bins for median_gross_rent (median: 1735.0).
bin	count
-6.667e+08 – -6.5e+08	145
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.167e+08	0
-6.167e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.667e+08	0
-5.667e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.167e+08	0
-5.167e+08 – -5e+08	0
-5e+08 – -4.833e+08	0
-4.833e+08 – -4.667e+08	0
-4.667e+08 – -4.5e+08	0
-4.5e+08 – -4.333e+08	0
-4.333e+08 – -4.167e+08	0
-4.167e+08 – -4e+08	0
-4e+08 – -3.833e+08	0
-3.833e+08 – -3.667e+08	0
-3.667e+08 – -3.5e+08	0
-3.5e+08 – -3.333e+08	0
-3.333e+08 – -3.167e+08	0
-3.167e+08 – -3e+08	0
-3e+08 – -2.833e+08	0
-2.833e+08 – -2.667e+08	0
-2.667e+08 – -2.5e+08	0
-2.5e+08 – -2.333e+08	0
-2.333e+08 – -2.167e+08	0
-2.167e+08 – -2e+08	0
-2e+08 – -1.833e+08	0
-1.833e+08 – -1.667e+08	0
-1.667e+08 – -1.5e+08	0
-1.5e+08 – -1.333e+08	0
-1.333e+08 – -1.167e+08	0
-1.167e+08 – -1e+08	0
-1e+08 – -8.333e+07	0
-8.333e+07 – -6.666e+07	0
-6.666e+07 – -5e+07	0
-5e+07 – -3.333e+07	0
-3.333e+07 – -1.666e+07	0
-1.666e+07 – 3501	2182

Fig 3.

county_name · Share of tracts per borough; Brooklyn and Queens together account for over 65%.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

Fig 4.

tract · Tract ID distribution is highly skewed (skew ≈ 10) due to a few very large tract numbers.

Show data table

Histogram bins for tract (median: 30100.0).
bin	count
100 – 2.485e+04	982
2.485e+04 – 4.96e+04	617
4.96e+04 – 7.435e+04	329
7.435e+04 – 9.91e+04	197
9.91e+04 – 1.238e+05	145
1.238e+05 – 1.486e+05	37
1.486e+05 – 1.734e+05	17
1.734e+05 – 1.981e+05	0
1.981e+05 – 2.228e+05	0
2.228e+05 – 2.476e+05	0
2.476e+05 – 2.724e+05	0
2.724e+05 – 2.971e+05	0
2.971e+05 – 3.218e+05	0
3.218e+05 – 3.466e+05	0
3.466e+05 – 3.714e+05	0
3.714e+05 – 3.961e+05	0
3.961e+05 – 4.208e+05	0
4.208e+05 – 4.456e+05	0
4.456e+05 – 4.704e+05	0
4.704e+05 – 4.951e+05	0
4.951e+05 – 5.198e+05	0
5.198e+05 – 5.446e+05	0
5.446e+05 – 5.694e+05	0
5.694e+05 – 5.941e+05	0
5.941e+05 – 6.188e+05	0
6.188e+05 – 6.436e+05	0
6.436e+05 – 6.684e+05	0
6.684e+05 – 6.931e+05	0
6.931e+05 – 7.178e+05	0
7.178e+05 – 7.426e+05	0
7.426e+05 – 7.674e+05	0
7.674e+05 – 7.921e+05	0
7.921e+05 – 8.168e+05	0
8.168e+05 – 8.416e+05	0
8.416e+05 – 8.664e+05	0
8.664e+05 – 8.911e+05	0
8.911e+05 – 9.158e+05	0
9.158e+05 – 9.406e+05	0
9.406e+05 – 9.654e+05	0
9.654e+05 – 9.901e+05	3

Fig 5.

NAME · Tract name lengths cluster tightly between 38 and 46 characters, confirming a uniform naming format.

Show data table

Character-length distribution for NAME (mean: 41.64890416845724).
chars	count
38 – 38	7
38 – 38	0
38 – 39	0
39 – 39	0
39 – 39	0
39 – 39	104
39 – 39	0
39 – 40	0
40 – 40	0
40 – 40	0
40 – 40	785
40 – 40	0
40 – 41	0
41 – 41	0
41 – 41	0
41 – 41	447
41 – 41	0
41 – 42	0
42 – 42	0
42 – 42	0
42 – 42	200
42 – 42	0
42 – 43	0
43 – 43	0
43 – 43	0
43 – 43	378
43 – 43	0
43 – 44	0
44 – 44	0
44 – 44	0
44 – 44	190
44 – 44	0
44 – 45	0
45 – 45	0
45 – 45	0
45 – 45	82
45 – 45	0
45 – 46	0
46 – 46	0
46 – 46	134

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
median_gross_rent	numeric	0.0%
NAME	text	0.0%
state	numeric	0.0%
county	numeric	0.0%
tract	numeric	0.0%
county_name	categorical	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	median_gross_rent	state	county	tract
median_gross_rent	+1.00	+nan	-0.03	-0.03
state	+nan	+nan	+nan	+nan
county	-0.03	+nan	+1.00	+0.18
tract	-0.03	+nan	+0.18	+1.00

median_gross_rent numeric feature

Median gross rent per geography, with a typical value around $1,735 (IQR $1,441.5–$2,049). The column is contaminated by sentinel values: the min of -666666666 drags the mean to -41539608.82 and inflates std to 1.6e8, producing skew of -3.62 and 12.4% flagged outliers. Once sentinels are removed, the real distribution looks tight and plausible for US rents capped near $3,501.

Treatment: Replace -666666666 sentinel with null, then consider winsorizing or log-transforming before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["median_gross_rent"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,232
min	-6.667e+08
max	3,501
mean	-4.154e+07
median	1,735
std	1.612e+08
q1	1442
q3	2,049
iqr	607.5
skew	-3.621
kurtosis	11.11
n_outliers	289
outlier_rate	0.1242
zero_rate	0
alert: high_skew	skew=-3.62
alert: outliers	12.4% rows beyond 1.5 IQR

Fig 8.

Distribution of median_gross_rent. Vertical dash marks the median.

Show data table

Histogram bins for median_gross_rent (median: 1735.0).
bin	count
-6.667e+08 – -6.5e+08	145
-6.5e+08 – -6.333e+08	0
-6.333e+08 – -6.167e+08	0
-6.167e+08 – -6e+08	0
-6e+08 – -5.833e+08	0
-5.833e+08 – -5.667e+08	0
-5.667e+08 – -5.5e+08	0
-5.5e+08 – -5.333e+08	0
-5.333e+08 – -5.167e+08	0
-5.167e+08 – -5e+08	0
-5e+08 – -4.833e+08	0
-4.833e+08 – -4.667e+08	0
-4.667e+08 – -4.5e+08	0
-4.5e+08 – -4.333e+08	0
-4.333e+08 – -4.167e+08	0
-4.167e+08 – -4e+08	0
-4e+08 – -3.833e+08	0
-3.833e+08 – -3.667e+08	0
-3.667e+08 – -3.5e+08	0
-3.5e+08 – -3.333e+08	0
-3.333e+08 – -3.167e+08	0
-3.167e+08 – -3e+08	0
-3e+08 – -2.833e+08	0
-2.833e+08 – -2.667e+08	0
-2.667e+08 – -2.5e+08	0
-2.5e+08 – -2.333e+08	0
-2.333e+08 – -2.167e+08	0
-2.167e+08 – -2e+08	0
-2e+08 – -1.833e+08	0
-1.833e+08 – -1.667e+08	0
-1.667e+08 – -1.5e+08	0
-1.5e+08 – -1.333e+08	0
-1.333e+08 – -1.167e+08	0
-1.167e+08 – -1e+08	0
-1e+08 – -8.333e+07	0
-8.333e+07 – -6.666e+07	0
-6.666e+07 – -5e+07	0
-5e+07 – -3.333e+07	0
-3.333e+07 – -1.666e+07	0
-1.666e+07 – 3501	2182

NAME text identifier

This column holds fully-qualified names of New York City census tracts, one per row (e.g. 'Census Tract ...; Kings County; New York'). Every one of the 2327 values is unique with zero nulls and tightly bounded length (38-46 chars, mean 41.6 words≈7), and the top words confirm the five NYC boroughs: Kings (805), Queens (725), Bronx (361), Richmond (126), with Manhattan/New York making up the remainder. It is effectively a row identifier rather than a modelling feature.

Treatment: Drop from modelling; retain as a join key or parse out the borough/tract components if geography is needed.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["NAME"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	2,327
len_min	38
len_max	46
len_mean	41.65
len_median	41
len_p95	46
word_mean	7.133
word_median	7
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,539
readability_flesch_mean	91.45
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 9.

Character-length distribution for NAME.

Show data table

Character-length distribution for NAME (mean: 41.64890416845724).
chars	count
38 – 38	7
38 – 38	0
38 – 39	0
39 – 39	0
39 – 39	0
39 – 39	104
39 – 39	0
39 – 40	0
40 – 40	0
40 – 40	0
40 – 40	785
40 – 40	0
40 – 41	0
41 – 41	0
41 – 41	0
41 – 41	447
41 – 41	0
41 – 42	0
42 – 42	0
42 – 42	0
42 – 42	200
42 – 42	0
42 – 43	0
43 – 43	0
43 – 43	0
43 – 43	378
43 – 43	0
43 – 44	0
44 – 44	0
44 – 44	0
44 – 44	190
44 – 44	0
44 – 45	0
45 – 45	0
45 – 45	0
45 – 45	82
45 – 45	0
45 – 46	0
46 – 46	0
46 – 46	134

state numeric metadata

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and zero nulls. This is a constant field carrying no information for modelling, likely a leftover state code from an upstream filter or partition.

Treatment: Drop; constant column provides no signal.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["state"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1
min	36
max	36
mean	36
median	36
std	0
q1	36
q3	36
iqr	0
skew	0
kurtosis	0
n_outliers	0
outlier_rate	0
zero_rate	0
alert: constant	only one distinct value

Fig 10.

Distribution of state. Vertical dash marks the median.

Show data table

Histogram bins for state (median: 36.0).
bin	count
35.5 – 35.52	0
35.52 – 35.55	0
35.55 – 35.58	0
35.58 – 35.6	0
35.6 – 35.62	0
35.62 – 35.65	0
35.65 – 35.67	0
35.67 – 35.7	0
35.7 – 35.73	0
35.73 – 35.75	0
35.75 – 35.77	0
35.77 – 35.8	0
35.8 – 35.83	0
35.83 – 35.85	0
35.85 – 35.88	0
35.88 – 35.9	0
35.9 – 35.92	0
35.92 – 35.95	0
35.95 – 35.98	0
35.98 – 36	0
36 – 36.02	2327
36.02 – 36.05	0
36.05 – 36.08	0
36.08 – 36.1	0
36.1 – 36.12	0
36.12 – 36.15	0
36.15 – 36.17	0
36.17 – 36.2	0
36.2 – 36.23	0
36.23 – 36.25	0
36.25 – 36.27	0
36.27 – 36.3	0
36.3 – 36.33	0
36.33 – 36.35	0
36.35 – 36.38	0
36.38 – 36.4	0
36.4 – 36.42	0
36.42 – 36.45	0
36.45 – 36.48	0
36.48 – 36.5	0

county numeric identifier

This column holds numeric county codes (likely FIPS-style identifiers), with only 5 unique values across 2327 rows and no nulls. Despite being labelled numeric, the values 5, 47, 81, 85 etc. are categorical labels — the reported mean of 55.0 and std of 25.97 are not meaningful. The distribution is concentrated in the upper end (median 47, Q3 81), giving a negative skew of -0.72.

Treatment: Cast to categorical and one-hot or target-encode; do not treat as a continuous feature.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["county"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
min	5
max	85
mean	55
median	47
std	25.97
q1	47
q3	81
iqr	34
skew	-0.72
kurtosis	-0.4531
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 11.

Distribution of county. Vertical dash marks the median.

Show data table

Histogram bins for county (median: 47.0).
bin	count
5 – 7	361
7 – 9	0
9 – 11	0
11 – 13	0
13 – 15	0
15 – 17	0
17 – 19	0
19 – 21	0
21 – 23	0
23 – 25	0
25 – 27	0
27 – 29	0
29 – 31	0
31 – 33	0
33 – 35	0
35 – 37	0
37 – 39	0
39 – 41	0
41 – 43	0
43 – 45	0
45 – 47	0
47 – 49	805
49 – 51	0
51 – 53	0
53 – 55	0
55 – 57	0
57 – 59	0
59 – 61	0
61 – 63	310
63 – 65	0
65 – 67	0
67 – 69	0
69 – 71	0
71 – 73	0
73 – 75	0
75 – 77	0
77 – 79	0
79 – 81	0
81 – 83	725
83 – 85	126

tract numeric identifier

This is almost certainly a U.S. Census tract code rather than a true numeric measurement, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a max of 990100 sitting far above the median of 30100, which is expected behavior for tract identifiers and triggered the high_skew alert. The 63 flagged outliers (2.7%) reflect tract-numbering conventions, not data errors.

Treatment: Cast to string and treat as a categorical/geographic key; do not use as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["tract"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	1,530
min	100
max	990,100
mean	4.225e+04
median	30,100
std	4.827e+04
q1	15,200
q3	5.79e+04
iqr	4.27e+04
skew	10.14
kurtosis	189.8
n_outliers	63
outlier_rate	0.02707
zero_rate	0
alert: high_skew	skew=+10.14

Fig 12.

Distribution of tract. Vertical dash marks the median.

Show data table

Histogram bins for tract (median: 30100.0).
bin	count
100 – 2.485e+04	982
2.485e+04 – 4.96e+04	617
4.96e+04 – 7.435e+04	329
7.435e+04 – 9.91e+04	197
9.91e+04 – 1.238e+05	145
1.238e+05 – 1.486e+05	37
1.486e+05 – 1.734e+05	17
1.734e+05 – 1.981e+05	0
1.981e+05 – 2.228e+05	0
2.228e+05 – 2.476e+05	0
2.476e+05 – 2.724e+05	0
2.724e+05 – 2.971e+05	0
2.971e+05 – 3.218e+05	0
3.218e+05 – 3.466e+05	0
3.466e+05 – 3.714e+05	0
3.714e+05 – 3.961e+05	0
3.961e+05 – 4.208e+05	0
4.208e+05 – 4.456e+05	0
4.456e+05 – 4.704e+05	0
4.704e+05 – 4.951e+05	0
4.951e+05 – 5.198e+05	0
5.198e+05 – 5.446e+05	0
5.446e+05 – 5.694e+05	0
5.694e+05 – 5.941e+05	0
5.941e+05 – 6.188e+05	0
6.188e+05 – 6.436e+05	0
6.436e+05 – 6.684e+05	0
6.684e+05 – 6.931e+05	0
6.931e+05 – 7.178e+05	0
7.178e+05 – 7.426e+05	0
7.426e+05 – 7.674e+05	0
7.674e+05 – 7.921e+05	0
7.921e+05 – 8.168e+05	0
8.168e+05 – 8.416e+05	0
8.416e+05 – 8.664e+05	0
8.664e+05 – 8.911e+05	0
8.911e+05 – 9.158e+05	0
9.158e+05 – 9.406e+05	0
9.406e+05 – 9.654e+05	0
9.654e+05 – 9.901e+05	3

county_name categorical feature

This column records NYC borough/county names across 2327 rows with no nulls and only 5 distinct values, matching the five boroughs of New York City. Distribution is uneven but balanced enough to be informative: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126), giving high entropy_ratio of 0.898. Notably, three of the five labels embed parenthetical legal county names (e.g., 'Brooklyn (Kings)'), which will need normalization if joining to standard county tables.

Treatment: One-hot or target-encode after stripping the parenthetical county aliases for clean joins.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["county_name"].stats

stat	value
n	2,327
nulls	0 (0.0%)
unique	5
top_value	Brooklyn (Kings)
top_rate	0.3459
cardinality	5
entropy	2.086
entropy_ratio	0.8985

Fig 13.

Top values for county_name.

Show data table

Top values for county_name (5 unique shown, of 5 total).
value	count	share
Brooklyn (Kings)	805	34.6%
Queens	725	31.2%
Bronx	361	15.5%
Manhattan (New York)	310	13.3%
Staten Island (Richmond)	126	5.4%

How to cite

click to copy

BibTeX

@misc{saturn-nyc-housing-nyc-median-rent-by-tract-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: nyc housing nyc median rent by tract},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_rent_by_tract}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: nyc housing nyc median rent by tract. Source: /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/nyc_housing-nyc_median_rent_by_tract