saturn

/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv 2,327 rows sample n=2,327 seed 42 2026-05-01T17:10:42+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/urban/nyc_housing/nyc_median_income_by_tract.csv
Total rows2,327
Profiled sample2,327
Columns6
Generated2026-05-01T17:10:42+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 2,327 New York City census tracts with median household income, geographic identifiers (state, county, tract), and tract names. The headline issue is median_household_income: it has a minimum of -666,666,666 and a mean of about -36 million, indicating sentinel/missing-value codes that must be filtered before any analysis — the median of $76,833 is the more trustworthy central value. County coverage is uneven, with Brooklyn (Kings) holding 34.6% of tracts and Staten Island only 126, so per-borough comparisons should be normalized. The state column is constant (36 = New York) and can be dropped.

median_household_income high anthropic:claude-opus-4-7

Likely U.S. median household income in dollars, with median 76833 and IQR spanning 53242.5 to 102359.5. The minimum of -666666666 is a sentinel null code that is poisoning the mean (-36017397.46) and standard deviation (150923371.88), and 208 rows (8.94%) flag as outliers. Skew of -3.94 and kurtosis of 13.53 are entirely artifacts of that sentinel.

NAME high anthropic:claude-opus-4-7

This column holds fully-qualified Census tract names for New York City, with every one of the 2327 rows unique and non-null. Lengths cluster tightly between 38 and 46 characters and every record contains the words 'new', 'york', 'census', 'tract', and 'county;', confirming a rigid template; the borough token (Kings 805, Queens 725, Bronx 361, Richmond 126) is the only meaningful variation. It is effectively a row identifier, not a feature.

state high anthropic:claude-opus-4-7

The column 'state' is numeric but holds the single value 36 across all 2327 rows, with zero variance and no nulls. It carries no information for modelling and likely encodes a fixed jurisdiction or processing-state code captured during a single-state extract.

county high anthropic:claude-opus-4-7

Despite being typed as numeric, `county` has only 5 unique values across 2327 rows (5, ?, 47, 81, 85 implied by the quartiles) with no nulls — these are almost certainly FIPS-style county codes rather than a measured quantity. The distribution is left-skewed (skew -0.72) with the median at 47 and Q1 also at 47, meaning at least a quarter of rows share that single code. Treating mean (55.0) or std (25.97) as meaningful would be misleading given the categorical nature.

tract high anthropic:claude-opus-4-7

Almost certainly U.S. Census tract codes stored as integers, with 1530 distinct values across 2327 rows and no nulls. The distribution is severely right-skewed (skew 10.14, kurtosis 189.82) with a max of 990100 sitting far above the q3 of 57900.5 and median of 30100, producing 63 outliers (2.7%); this is an artifact of tract numbering conventions, not a true numeric magnitude.

county_name high anthropic:claude-opus-4-7

This column lists the NYC borough/county for each record, with all 5 expected values present across 2327 rows and no nulls. Distribution roughly tracks borough population: Brooklyn (Kings) leads at 805 (top_rate 0.346), followed by Queens (725), Bronx (361), Manhattan (310), and Staten Island (126). Entropy ratio of 0.898 indicates the categories are fairly evenly spread rather than dominated by one value.

Numeric correlation

median_household_income numeric

skew=-3.94 8.9% rows beyond 1.5 IQR
rows2,327
null0 (0.0%)
unique2,106
min-666,666,666
max250,001
mean-36,017,397
median76,833
std150,923,372
q153,242
q3102,360
iqr49,117
skew-3.940
kurtosis13.525
n_outliers208
outlier_rate0.089
zero_rate0.000

NAME text

100.0% of rows are unique strings
rows2,327
null0 (0.0%)
unique2,327
len_min38
len_max46
len_mean41.649
len_median41.000
len_p9546.000
word_mean7.133
word_median7.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,539
readability_flesch_mean91.451
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Census Tract 4; Bronx County; New York
  2. Census Tract 399.01; Queens County; New York
  3. Census Tract 779.08; Queens County; New York
  4. Census Tract 613.02; Queens County; New York
  5. Census Tract 780; Kings County; New York
  6. Census Tract 156.02; Richmond County; New York
  7. Census Tract 848; Kings County; New York
  8. Census Tract 1008.04; Queens County; New York
  9. Census Tract 618; Queens County; New York
  10. Census Tract 145; Bronx County; New York

state numeric

only one distinct value
rows2,327
null0 (0.0%)
unique1
min36.000
max36.000
mean36.000
median36.000
std0.000
q136.000
q336.000
iqr0.000
skew0.000
kurtosis0.000
n_outliers0
outlier_rate0.000
zero_rate0.000

county numeric

rows2,327
null0 (0.0%)
unique5
min5.000
max85.000
mean55.000
median47.000
std25.969
q147.000
q381.000
iqr34.000
skew-0.720
kurtosis-0.453
n_outliers0
outlier_rate0.000
zero_rate0.000

tract numeric

skew=+10.14
rows2,327
null0 (0.0%)
unique1,530
min100.000
max990,100
mean42,252
median30,100
std48,265
q115,200
q357,900
iqr42,700
skew10.143
kurtosis189.824
n_outliers63
outlier_rate0.027
zero_rate0.000

county_name categorical

rows2,327
null0 (0.0%)
unique5
top_valueBrooklyn (Kings)
top_rate0.346
cardinality5
entropy2.086
entropy_ratio0.898
Top values (rank 1–20)
  1. Brooklyn (Kings) — 805
  2. Queens — 725
  3. Bronx — 361
  4. Manhattan (New York) — 310
  5. Staten Island (Richmond) — 126