saturn

/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv 2,327 rows sample n=2,327 seed 42 2026-05-01T17:09:55+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_rent_by_tract.csv
Total rows2,327
Profiled sample2,327
Columns6
Generated2026-05-01T17:09:55+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 2,327 New York City census tracts with median gross rent values across the five boroughs. The most important issue to investigate is median_gross_rent: it has a minimum of -666,666,666 and a mean of about -41.5 million, indicating sentinel values for missing data that must be filtered before any analysis — once cleaned, the median rent of $1,735 and IQR of $1,441–$2,049 are the realistic figures. The county_name field is well-distributed across five boroughs, with Brooklyn (Kings) the largest at 805 tracts (34.6%) and Staten Island the smallest at 126. Note that 'state' is constant (all 36, New York) and can be ignored, and 'NAME' is a unique tract label rather than an analytical field.

median_gross_rent high anthropic:claude-opus-4-7

Median gross rent per geography, with a typical value around $1,735 (IQR $1,441.5–$2,049). The column is contaminated by sentinel values: the min of -666666666 drags the mean to -41539608.82 and inflates std to 1.6e8, producing skew of -3.62 and 12.4% flagged outliers. Once sentinels are removed, the real distribution looks tight and plausible for US rents capped near $3,501.

NAME high anthropic:claude-opus-4-7

This column holds fully-qualified names of New York City census tracts, one per row (e.g. 'Census Tract ...; Kings County; New York'). Every one of the 2327 values is unique with zero nulls and tightly bounded length (38-46 chars, mean 41.6 words≈7), and the top words confirm the five NYC boroughs: Kings (805), Queens (725), Bronx (361), Richmond (126), with Manhattan/New York making up the remainder. It is effectively a row identifier rather than a modelling feature.

state high anthropic:claude-opus-4-7

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and zero nulls. This is a constant field carrying no information for modelling, likely a leftover state code from an upstream filter or partition.

county high anthropic:claude-opus-4-7

This column holds numeric county codes (likely FIPS-style identifiers), with only 5 unique values across 2327 rows and no nulls. Despite being labelled numeric, the values 5, 47, 81, 85 etc. are categorical labels — the reported mean of 55.0 and std of 25.97 are not meaningful. The distribution is concentrated in the upper end (median 47, Q3 81), giving a negative skew of -0.72.

tract high anthropic:claude-opus-4-7

This is almost certainly a U.S. Census tract code rather than a true numeric measurement, with 1530 unique values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.8) with a max of 990100 sitting far above the median of 30100, which is expected behavior for tract identifiers and triggered the high_skew alert. The 63 flagged outliers (2.7%) reflect tract-numbering conventions, not data errors.

county_name high anthropic:claude-opus-4-7

This column records NYC borough/county names across 2327 rows with no nulls and only 5 distinct values, matching the five boroughs of New York City. Distribution is uneven but balanced enough to be informative: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126), giving high entropy_ratio of 0.898. Notably, three of the five labels embed parenthetical legal county names (e.g., 'Brooklyn (Kings)'), which will need normalization if joining to standard county tables.

Numeric correlation

median_gross_rent numeric

skew=-3.62 12.4% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique1,232
min-666,666,666
max3,501
mean-41,539,609
median1,735
std161,182,639
q11,442
q32,049
iqr607.500
skew-3.621
kurtosis11.115
n_outliers289
outlier_rate0.124
zero_rate0.000

NAME text

100.0% of rows are unique strings
rows2,327
null0 (0.0%)
unique2,327
len_min38
len_max46
len_mean41.649
len_median41.000
len_p9546.000
word_mean7.133
word_median7.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,539
readability_flesch_mean91.451
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Census Tract 4; Bronx County; New York
  2. Census Tract 399.01; Queens County; New York
  3. Census Tract 779.08; Queens County; New York
  4. Census Tract 613.02; Queens County; New York
  5. Census Tract 780; Kings County; New York
  6. Census Tract 156.02; Richmond County; New York
  7. Census Tract 848; Kings County; New York
  8. Census Tract 1008.04; Queens County; New York
  9. Census Tract 618; Queens County; New York
  10. Census Tract 145; Bronx County; New York

state numeric

only one distinct value
rows2,327
null0 (0.0%)
unique1
min36.000
max36.000
mean36.000
median36.000
std0.000
q136.000
q336.000
iqr0.000
skew0.000
kurtosis0.000
n_outliers0
outlier_rate0.000
zero_rate0.000

county numeric

rows2,327
null0 (0.0%)
unique5
min5.000
max85.000
mean55.000
median47.000
std25.969
q147.000
q381.000
iqr34.000
skew-0.720
kurtosis-0.453
n_outliers0
outlier_rate0.000
zero_rate0.000

tract numeric

skew=+10.14
rows2,327
null0 (0.0%)
unique1,530
min100.000
max990,100
mean42,252
median30,100
std48,265
q115,200
q357,900
iqr42,700
skew10.143
kurtosis189.824
n_outliers63
outlier_rate0.027
zero_rate0.000

county_name categorical

rows2,327
null0 (0.0%)
unique5
top_valueBrooklyn (Kings)
top_rate0.346
cardinality5
entropy2.086
entropy_ratio0.898
Top values (rank 1–20)
  1. Brooklyn (Kings) — 805
  2. Queens — 725
  3. Bronx — 361
  4. Manhattan (New York) — 310
  5. Staten Island (Richmond) — 126