saturn·

nyc housing nyc tenure by tract

source /home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_tenure_by_tract.csv 2,327 rows 10 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 2,327 New York City census tracts with housing tenure breakdowns across 10 columns, covering owner- and renter-occupied household counts and percentages by county. Brooklyn (Kings) leads with 805 tracts (34.6% of rows), followed by Queens (725) and Bronx (361), while Staten Island has just 126. Renting dominates citywide: the mean share of renter-occupied households is 62.5% versus 37.5% owner-occupied, and renter counts are right-skewed with a long tail up to 8,209 per tract. Worth a closer look: the strong skew in raw household counts (owner_occupied skew 1.76, renter_occupied skew 1.59) and the ~4% null rate in the percentage columns. Note that 'state' is constant (all 36) and can be ignored.

citing: row_count · column_count · county_name.top_values · county_name.top_rate · pct_owner_occupied.mean · pct_renter_occupied.mean · owner_occupied.skew · renter_occupied.skew · renter_occupied.max · pct_owner_occupied.null_rate · state.n_unique

Schema

10 columns
Per-column summary. Click column name to jump to its detail.
Alerts
total_households numeric 0.0% 1,495
owner_occupied numeric 0.0% 1,001
outliers
renter_occupied numeric 0.0% 1,418
NAME text 0.0% 2,327
near_unique
state numeric 0.0% 1
constant
county numeric 0.0% 5
tract numeric 0.0% 1,530
high_skew
county_name categorical 0.0% 5
pct_owner_occupied numeric 4.1% 823
pct_renter_occupied numeric 4.1% 823

total_households

numeric feature
Counts of households per geographic unit, ranging from 0 to 8209 with a median of 1252 and mean of 1410.7. The distribution is right-skewed (skew 1.48, kurtosis 4.38) with 70 outliers (3.0%) on the high end, and 4.1% of rows are zeros which may indicate uninhabited or unreported areas. Treatment: Consider log1p-transform before regression to tame the right skew and zero inflation. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,495
min
0
max
8,209
mean
1411
median
1,252
std
923.3
q1
773.5
q3
1,850
iqr
1076
skew
1.479
kurtosis
4.377
n_outliers
70
outlier_rate
0.03008
zero_rate
0.04125

owner_occupied

numeric feature outliers
Despite the boolean-sounding name, owner_occupied is an integer count ranging 0–3052 with 1001 distinct values and a mean of 464.6 versus a median of 371, suggesting a per-area tally of owner-occupied units rather than a flag. The distribution is right-skewed (skew 1.76, kurtosis 4.25) with 143 outliers (6.1%) and 7.2% exact zeros. No nulls are present. Treatment: Log-transform or winsorize before modelling to tame the right tail. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,001
min
0
max
3,052
mean
464.6
median
371
std
422.6
q1
177
q3
608
iqr
431
skew
1.761
kurtosis
4.254
n_outliers
143
outlier_rate
0.06145
zero_rate
0.0722

renter_occupied

numeric feature
This column reports the count of renter-occupied units per record, ranging from 0 to 8209 with a mean of 946 and median of 726. The distribution is right-skewed (skew 1.59, kurtosis 4.63) with 4.4% zeros and 69 outliers (2.97%) in the upper tail. No nulls and 1418 unique values across 2327 rows suggest a per-area aggregate count rather than a per-unit flag. Treatment: Log-transform (log1p to handle zeros) before regression to tame right skew. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,418
min
0
max
8,209
mean
946.1
median
726
std
815.4
q1
346
q3
1,357
iqr
1,011
skew
1.595
kurtosis
4.627
n_outliers
69
outlier_rate
0.02965
zero_rate
0.04383

NAME

text identifier near_unique
This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the tokens 'new', 'york', 'census', 'tract', and 'county;', with the borough breakdown skewed toward Kings (805) and Queens (725) over Bronx (361) and Richmond (126) — Manhattan/New York County appears absent from the top words, which is worth checking. With n_unique == n, this is effectively a row identifier rather than a feature. Treatment: Treat as a row label; parse out the borough token if a geographic feature is needed, otherwise drop from modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
2,327
len_min
38
len_max
46
len_mean
41.65
len_median
41
len_p95
46
word_mean
7.133
word_median
7
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,539
readability_flesch_mean
91.45
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

numeric metadata constant
The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and only one unique value. It carries no information for analysis and is flagged constant. Treatment: Drop, constant column. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1
min
36
max
36
mean
36
median
36
std
0
q1
36
q3
36
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric feature
Encoded as numeric but only 5 distinct values across 2327 rows (min 5, max 85, median 47), this is almost certainly a categorical county code stored as an integer. The distribution is left-skewed (skew -0.72) with mean 55 sitting above median 47, suggesting one or two higher-numbered codes dominate. No nulls or outliers reported. Treatment: Cast to categorical and one-hot or target-encode rather than treating as continuous. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
min
5
max
85
mean
55
median
47
std
25.97
q1
47
q3
81
iqr
34
skew
-0.72
kurtosis
-0.4531
n_outliers
0
outlier_rate
0
zero_rate
0

tract

numeric identifier high_skew
Census tract codes stored as integers, ranging from 100 to 990100 across 1530 distinct values in 2327 rows. The skew of 10.14 and kurtosis of 189.8 are artefacts of the tract numbering scheme rather than a real distribution — these are categorical identifiers, not measurements. 63 outliers (2.7%) reflect tracts with unusually high numeric codes, not anomalous data. Treatment: Treat as categorical geographic key; do not use as a numeric feature or apply transforms. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
1,530
min
100
max
990,100
mean
4.225e+04
median
30,100
std
4.827e+04
q1
15,200
q3
5.79e+04
iqr
4.27e+04
skew
10.14
kurtosis
189.8
n_outliers
63
outlier_rate
0.02707
zero_rate
0

county_name

categorical feature
This column lists New York City borough/county names across 2327 rows, with all 5 NYC boroughs represented and no nulls. Distribution is fairly even (entropy ratio 0.898), though Brooklyn (Kings) leads at 34.6% (805) and Staten Island (Richmond) trails at 126. The parenthetical county names suggest the source schema uses formal county labels rather than borough-only naming. Treatment: one-hot or target-encode for modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
0 (0.0%)
unique
5
top_value
Brooklyn (Kings)
top_rate
0.3459
cardinality
5
entropy
2.086
entropy_ratio
0.8985

pct_owner_occupied

numeric feature
Numeric column on a 0-100 scale (min 0.0, max 100.0) capturing the percentage of owner-occupied housing per record. The distribution is wide and flattish (std 25.65, kurtosis -0.85) with mean 37.51 just above median 34.4, and a broad IQR from 16.4 to 56.1, indicating most areas are minority owner-occupied. About 4.13% of rows are null and 3.23% are exactly zero, which may represent fully-rental areas worth flagging. Treatment: Impute the 4.13% nulls and use as-is; no transform needed given mild skew (0.39) and no outliers. high · anthropic:claude-opus-4-7
n
2,327
nulls
96 (4.1%)
unique
823
min
0
max
100
mean
37.51
median
34.4
std
25.65
q1
16.4
q3
56.1
iqr
39.7
skew
0.3948
kurtosis
-0.854
n_outliers
0
outlier_rate
0
zero_rate
0.03227

pct_renter_occupied

numeric feature
Numeric share variable bounded between 0 and 100 (mean 62.49, median 65.6) — almost certainly the percentage of renter-occupied housing units in each row. The distribution is wide (std 25.65, IQR 39.7) and slightly left-skewed (skew -0.39, kurtosis -0.85), so values cluster toward the high end with a long tail of owner-dominated areas. About 4.13% of rows are null and only 0.27% are exact zeros; no outliers were flagged given the natural 0–100 bounds. Treatment: Impute the ~4% nulls and use as-is, or rescale to 0–1 before modelling. high · anthropic:claude-opus-4-7
n
2,327
nulls
96 (4.1%)
unique
823
min
0
max
100
mean
62.49
median
65.6
std
25.65
q1
43.9
q3
83.6
iqr
39.7
skew
-0.3948
kurtosis
-0.854
n_outliers
0
outlier_rate
0
zero_rate
0.002689