saturn

/home/coolhand/html/datavis/data_trove/cache/housing_units.parquet 3,222 rows sample n=3,222 seed 42 2026-05-01T17:03:19+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/housing_units.parquet
Total rows3,222
Profiled sample3,222
Columns6
Generated2026-05-01T17:03:19+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset covers 3,222 U.S. counties with housing-unit counts (owner-occupied, renter-occupied, total) plus a FIPS code, county name, and the percent of renters. The three count columns are extremely right-skewed (skew between 9.5 and 15.8, kurtosis above 140) with 13–14% of rows flagged as outliers — a handful of huge urban counties (max total_housing_units of about 3.36M vs a median of roughly 10,021) dominate the distribution. The pct_renter field is far better behaved, centered near 26% with a much tighter spread, making it the most useful comparable metric across counties. Start by inspecting the long tail of total_housing_units, then use pct_renter to compare counties on a normalized basis.

fips high anthropic:claude-opus-4-7

This column is the FIPS code for U.S. counties — every one of 3,222 rows is unique with no nulls, matching the count of U.S. counties. Values span 1001 to 72153, consistent with state-prefixed county FIPS identifiers, and the distribution is essentially uniform across the code space (skew 0.157, kurtosis -0.63, no outliers).

county_name high anthropic:claude-opus-4-7

This column holds fully-qualified US county names (e.g. 'X County, State'), with 3222 rows all unique and zero nulls. The token 'county,' appears 2999 times, so roughly 223 rows use a different administrative suffix (parish, borough, census area). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the real US county count.

total_housing_units high anthropic:claude-opus-4-7

Counts of total housing units per record, almost certainly at a county or similar geographic level given 3,222 rows with 3,074 unique values and no nulls. The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a median of 10,021 but a max of 3,363,093, and 443 rows (13.7%) flagged as outliers well above the Q3 of 25,939. The mean of 39,402 sits far above the median, confirming a long heavy tail driven by a few very large geographies.

owner_occupied high anthropic:claude-opus-4-7

This appears to be a count of owner-occupied housing units per geographic area, with 3001 unique values across 3222 rows and effectively no zeros (zero_rate 0.0003) or nulls. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9): the median is 7325.5 but the mean is 25551.7 and the max reaches 1,552,164, producing 429 outliers (13.3% outlier rate). The interquartile range (3147.75 to 18863.5) is dwarfed by the standard deviation of 67553, indicating a long tail of large jurisdictions.

renter_occupied high anthropic:claude-opus-4-7

Counts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of 2,579.5 — consistent with a geographic rollup (likely county or similar). The distribution is extremely right-skewed (skew 15.82, kurtosis 398.15) and 13.9% of rows fall outside the IQR fences, reflecting a few very large metros dominating a long tail of small areas. No nulls or zeros, and 2,709 unique values across 3,222 rows.

pct_renter high anthropic:claude-opus-4-7

This is a numeric feature representing the percentage of renters per record, ranging from 3.01 to 100.0 with a mean of 27.35 and median of 26.07. The distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 outliers (2.7%) on the high end, suggesting a small set of records — likely dense urban areas — with renter shares far above the typical 21.64–31.66 IQR. No nulls or zeros, and 1925 unique values across 3222 rows indicate well-populated continuous data.

Numeric correlation

fips numeric

rows3,222
null0 (0.0%)
unique3,222
min1,001
max72,153
mean31,378
median30,022
std16,300
q119,030
q346,104
iqr27,075
skew0.157
kurtosis-0.631
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

100.0% of rows are unique strings
rows3,222
null0 (0.0%)
unique3,222
len_min16
len_max59
len_mean24.324
len_median24.000
len_p9531.000
word_mean3.248
word_median3.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size1,990
readability_flesch_mean10.284
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Bibb County, Alabama
  2. Cheatham County, Tennessee
  3. Piute County, Utah
  4. Lamb County, Texas
  5. Martin County, Minnesota
  6. Sheridan County, Wyoming
  7. Chickasaw County, Mississippi
  8. Rockingham County, Virginia
  9. Liberty County, Texas
  10. Clark County, Arkansas

total_housing_units numeric

skew=+12.05 13.7% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,074
min32.000
max3,363,093
mean39,403
median10,021
std120,103
q14,211
q325,939
iqr21,728
skew12.048
kurtosis240.507
n_outliers443
outlier_rate0.137
zero_rate0.000

owner_occupied numeric

skew=+9.52 13.3% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique3,001
min0.000
max1,552,164
mean25,552
median7,326
std67,553
q13,148
q318,864
iqr15,716
skew9.516
kurtosis146.904
n_outliers429
outlier_rate0.133
zero_rate3.10e-04

renter_occupied numeric

skew=+15.82 13.9% rows beyond 1.5 IQR
rows3,222
null0 (0.0%)
unique2,709
min28.000
max1,810,929
mean13,851
median2,580
std55,352
q11,004
q37,396
iqr6,392
skew15.822
kurtosis398.150
n_outliers449
outlier_rate0.139
zero_rate0.000

pct_renter numeric

rows3,222
null0 (0.0%)
unique1,925
min3.010
max100.000
mean27.349
median26.070
std8.564
q121.640
q331.657
iqr10.017
skew1.317
kurtosis4.412
n_outliers88
outlier_rate0.027
zero_rate0.000