saturn·

nyc housing nyc median rent by tract

source /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv 2,327 rows 6 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 2,327 New York City census tracts with median gross rent values across the five boroughs. The most important issue to investigate is median_gross_rent: it has a minimum of -666,666,666 and a mean of about -41.5 million, indicating sentinel values for missing data that must be filtered before any analysis — once cleaned, the median rent of $1,735 and IQR of $1,441–$2,049 are the realistic figures. The county_name field is well-distributed across five boroughs, with Brooklyn (Kings) the largest at 805 tracts (34.6%) and Staten Island the smallest at 126. Note that 'state' is constant (all 36, New York) and can be ignored, and 'NAME' is a unique tract label rather than an analytical field.

citing: median_gross_rent.stats.min · median_gross_rent.stats.mean · median_gross_rent.stats.median · median_gross_rent.stats.q1 · median_gross_rent.stats.q3 · median_gross_rent.alerts · county_name.top_values · county_name.stats.top_rate · county_name.stats.cardinality · state.stats.min · state.stats.max · row_count

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
median_gross_rent numeric 0.0% 1,232
high_skew outliers
NAME text 0.0% 2,327
near_unique
state numeric 0.0% 1
constant
county numeric 0.0% 5
tract numeric 0.0% 1,530
high_skew
county_name categorical 0.0% 5

median_gross_rent

numeric feature high_skew outliers
Median gross rent per geography, with a typical value around $1,735 (IQR $1,441.5–$2,049). The column is contaminated by sentinel values: the min of -666666666 drags the mean to -41539608.82 and inflates std to 1.6e8, producing skew of -3.62 and 12.4% flagged outliers. Once sentinels are removed, the real distribution looks tight and plausible for US rents capped near $3,501. Treatment: Replace -666666666 sentinel with null, then consider winsorizing or log-transforming before modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,232
min
-6.667e+08
max
3,501
mean
-4.154e+07
median
1,735
std
1.612e+08
q1
1442
q3
2,049
iqr
607.5
skew
-3.621
kurtosis
11.11
n_outliers
289
outlier_rate
0.1242
zero_rate
0

NAME

text identifier near_unique
This column holds fully-qualified names of New York City census tracts, one per row (e.g. 'Census Tract ...; Kings County; New York'). Every one of the 2327 values is unique with zero nulls and tightly bounded length (38-46 chars, mean 41.6 words≈7), and the top words confirm the five NYC boroughs: Kings (805), Queens (725), Bronx (361), Richmond (126), with Manhattan/New York making up the remainder. It is effectively a row identifier rather than a modelling feature. Treatment: Drop from modelling; retain as a join key or parse out the borough/tract components if geography is needed. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
2,327
len_min
38
len_max
46
len_mean
41.65
len_median
41
len_p95
46
word_mean
7.133
word_median
7
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,539
readability_flesch_mean
91.45
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

numeric metadata constant
The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and zero nulls. This is a constant field carrying no information for modelling, likely a leftover state code from an upstream filter or partition. Treatment: Drop; constant column provides no signal. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1
min
36
max
36
mean
36
median
36
std
0
q1
36
q3
36
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric identifier
This column holds numeric county codes (likely FIPS-style identifiers), with only 5 unique values across 2327 rows and no nulls. Despite being labelled numeric, the values 5, 47, 81, 85 etc. are categorical labels — the reported mean of 55.0 and std of 25.97 are not meaningful. The distribution is concentrated in the upper end (median 47, Q3 81), giving a negative skew of -0.72. Treatment: Cast to categorical and one-hot or target-encode; do not treat as a continuous feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
min
5
max
85
mean
55
median
47
std
25.97
q1
47
q3
81
iqr
34
skew
-0.72
kurtosis
-0.4531
n_outliers
0
outlier_rate
0
zero_rate
0

tract

numeric identifier high_skew
This is almost certainly a U.S. Census tract code rather than a true numeric measurement, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a max of 990100 sitting far above the median of 30100, which is expected behavior for tract identifiers and triggered the high_skew alert. The 63 flagged outliers (2.7%) reflect tract-numbering conventions, not data errors. Treatment: Cast to string and treat as a categorical/geographic key; do not use as a continuous numeric feature. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,530
min
100
max
990,100
mean
4.225e+04
median
30,100
std
4.827e+04
q1
15,200
q3
5.79e+04
iqr
4.27e+04
skew
10.14
kurtosis
189.8
n_outliers
63
outlier_rate
0.02707
zero_rate
0

county_name

categorical feature
This column records NYC borough/county names across 2327 rows with no nulls and only 5 distinct values, matching the five boroughs of New York City. Distribution is uneven but balanced enough to be informative: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126), giving high entropy_ratio of 0.898. Notably, three of the five labels embed parenthetical legal county names (e.g., 'Brooklyn (Kings)'), which will need normalization if joining to standard county tables. Treatment: One-hot or target-encode after stripping the parenthetical county aliases for clean joins. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
top_value
Brooklyn (Kings)
top_rate
0.3459
cardinality
5
entropy
2.086
entropy_ratio
0.8985