housing units
Reading
This dataset covers 3,222 U.S. counties with housing-unit counts (owner-occupied, renter-occupied, total) plus a FIPS code, county name, and the percent of renters. The three count columns are extremely right-skewed (skew between 9.5 and 15.8, kurtosis above 140) with 13–14% of rows flagged as outliers — a handful of huge urban counties (max total_housing_units of about 3.36M vs a median of roughly 10,021) dominate the distribution. The pct_renter field is far better behaved, centered near 26% with a much tighter spread, making it the most useful comparable metric across counties. Start by inspecting the long tail of total_housing_units, then use pct_renter to compare counties on a normalized basis.
citing: owner_occupied.stats · renter_occupied.stats · total_housing_units.stats · pct_renter.stats · fips.stats · county_name.top_words · row_count
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 32 – 8.411e+04 | 2907 |
| 8.411e+04 – 1.682e+05 | 153 |
| 1.682e+05 – 2.523e+05 | 62 |
| 2.523e+05 – 3.363e+05 | 38 |
| 3.363e+05 – 4.204e+05 | 22 |
| 4.204e+05 – 5.045e+05 | 6 |
| 5.045e+05 – 5.886e+05 | 11 |
| 5.886e+05 – 6.726e+05 | 5 |
| 6.726e+05 – 7.567e+05 | 5 |
| 7.567e+05 – 8.408e+05 | 3 |
| 8.408e+05 – 9.249e+05 | 1 |
| 9.249e+05 – 1.009e+06 | 3 |
| 1.009e+06 – 1.093e+06 | 1 |
| 1.093e+06 – 1.177e+06 | 1 |
| 1.177e+06 – 1.261e+06 | 0 |
| 1.261e+06 – 1.345e+06 | 0 |
| 1.345e+06 – 1.429e+06 | 0 |
| 1.429e+06 – 1.513e+06 | 0 |
| 1.513e+06 – 1.597e+06 | 0 |
| 1.597e+06 – 1.682e+06 | 1 |
| 1.682e+06 – 1.766e+06 | 1 |
| 1.766e+06 – 1.85e+06 | 0 |
| 1.85e+06 – 1.934e+06 | 0 |
| 1.934e+06 – 2.018e+06 | 0 |
| 2.018e+06 – 2.102e+06 | 1 |
| 2.102e+06 – 2.186e+06 | 0 |
| 2.186e+06 – 2.27e+06 | 0 |
| 2.27e+06 – 2.354e+06 | 0 |
| 2.354e+06 – 2.438e+06 | 0 |
| 2.438e+06 – 2.522e+06 | 0 |
| 2.522e+06 – 2.606e+06 | 0 |
| 2.606e+06 – 2.69e+06 | 0 |
| 2.69e+06 – 2.775e+06 | 0 |
| 2.775e+06 – 2.859e+06 | 0 |
| 2.859e+06 – 2.943e+06 | 0 |
| 2.943e+06 – 3.027e+06 | 0 |
| 3.027e+06 – 3.111e+06 | 0 |
| 3.111e+06 – 3.195e+06 | 0 |
| 3.195e+06 – 3.279e+06 | 0 |
| 3.279e+06 – 3.363e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 3.01 – 5.435 | 1 |
| 5.435 – 7.859 | 3 |
| 7.859 – 10.28 | 9 |
| 10.28 – 12.71 | 26 |
| 12.71 – 15.13 | 63 |
| 15.13 – 17.56 | 156 |
| 17.56 – 19.98 | 316 |
| 19.98 – 22.41 | 371 |
| 22.41 – 24.83 | 450 |
| 24.83 – 27.26 | 419 |
| 27.26 – 29.68 | 357 |
| 29.68 – 32.11 | 301 |
| 32.11 – 34.53 | 203 |
| 34.53 – 36.96 | 169 |
| 36.96 – 39.38 | 115 |
| 39.38 – 41.81 | 75 |
| 41.81 – 44.23 | 56 |
| 44.23 – 46.66 | 45 |
| 46.66 – 49.08 | 25 |
| 49.08 – 51.5 | 15 |
| 51.5 – 53.93 | 11 |
| 53.93 – 56.35 | 10 |
| 56.35 – 58.78 | 8 |
| 58.78 – 61.2 | 4 |
| 61.2 – 63.63 | 4 |
| 63.63 – 66.05 | 1 |
| 66.05 – 68.48 | 1 |
| 68.48 – 70.9 | 3 |
| 70.9 – 73.33 | 1 |
| 73.33 – 75.75 | 1 |
| 75.75 – 78.18 | 0 |
| 78.18 – 80.6 | 1 |
| 80.6 – 83.03 | 0 |
| 83.03 – 85.45 | 1 |
| 85.45 – 87.88 | 0 |
| 87.88 – 90.3 | 0 |
| 90.3 – 92.73 | 0 |
| 92.73 – 95.15 | 0 |
| 95.15 – 97.58 | 0 |
| 97.58 – 100 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 3.88e+04 | 2761 |
| 3.88e+04 – 7.761e+04 | 225 |
| 7.761e+04 – 1.164e+05 | 78 |
| 1.164e+05 – 1.552e+05 | 52 |
| 1.552e+05 – 1.94e+05 | 36 |
| 1.94e+05 – 2.328e+05 | 20 |
| 2.328e+05 – 2.716e+05 | 10 |
| 2.716e+05 – 3.104e+05 | 10 |
| 3.104e+05 – 3.492e+05 | 6 |
| 3.492e+05 – 3.88e+05 | 6 |
| 3.88e+05 – 4.268e+05 | 3 |
| 4.268e+05 – 4.656e+05 | 3 |
| 4.656e+05 – 5.045e+05 | 4 |
| 5.045e+05 – 5.433e+05 | 2 |
| 5.433e+05 – 5.821e+05 | 0 |
| 5.821e+05 – 6.209e+05 | 1 |
| 6.209e+05 – 6.597e+05 | 1 |
| 6.597e+05 – 6.985e+05 | 0 |
| 6.985e+05 – 7.373e+05 | 0 |
| 7.373e+05 – 7.761e+05 | 0 |
| 7.761e+05 – 8.149e+05 | 0 |
| 8.149e+05 – 8.537e+05 | 0 |
| 8.537e+05 – 8.925e+05 | 0 |
| 8.925e+05 – 9.313e+05 | 1 |
| 9.313e+05 – 9.701e+05 | 0 |
| 9.701e+05 – 1.009e+06 | 0 |
| 1.009e+06 – 1.048e+06 | 0 |
| 1.048e+06 – 1.087e+06 | 1 |
| 1.087e+06 – 1.125e+06 | 0 |
| 1.125e+06 – 1.164e+06 | 0 |
| 1.164e+06 – 1.203e+06 | 1 |
| 1.203e+06 – 1.242e+06 | 0 |
| 1.242e+06 – 1.281e+06 | 0 |
| 1.281e+06 – 1.319e+06 | 0 |
| 1.319e+06 – 1.358e+06 | 0 |
| 1.358e+06 – 1.397e+06 | 0 |
| 1.397e+06 – 1.436e+06 | 0 |
| 1.436e+06 – 1.475e+06 | 0 |
| 1.475e+06 – 1.513e+06 | 0 |
| 1.513e+06 – 1.552e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 28 – 4.53e+04 | 3019 |
| 4.53e+04 – 9.057e+04 | 109 |
| 9.057e+04 – 1.358e+05 | 38 |
| 1.358e+05 – 1.811e+05 | 17 |
| 1.811e+05 – 2.264e+05 | 11 |
| 2.264e+05 – 2.717e+05 | 9 |
| 2.717e+05 – 3.169e+05 | 5 |
| 3.169e+05 – 3.622e+05 | 0 |
| 3.622e+05 – 4.075e+05 | 2 |
| 4.075e+05 – 4.528e+05 | 2 |
| 4.528e+05 – 4.98e+05 | 3 |
| 4.98e+05 – 5.433e+05 | 1 |
| 5.433e+05 – 5.886e+05 | 1 |
| 5.886e+05 – 6.338e+05 | 1 |
| 6.338e+05 – 6.791e+05 | 0 |
| 6.791e+05 – 7.244e+05 | 1 |
| 7.244e+05 – 7.697e+05 | 1 |
| 7.697e+05 – 8.149e+05 | 0 |
| 8.149e+05 – 8.602e+05 | 0 |
| 8.602e+05 – 9.055e+05 | 1 |
| 9.055e+05 – 9.508e+05 | 0 |
| 9.508e+05 – 9.96e+05 | 0 |
| 9.96e+05 – 1.041e+06 | 0 |
| 1.041e+06 – 1.087e+06 | 0 |
| 1.087e+06 – 1.132e+06 | 0 |
| 1.132e+06 – 1.177e+06 | 0 |
| 1.177e+06 – 1.222e+06 | 0 |
| 1.222e+06 – 1.268e+06 | 0 |
| 1.268e+06 – 1.313e+06 | 0 |
| 1.313e+06 – 1.358e+06 | 0 |
| 1.358e+06 – 1.403e+06 | 0 |
| 1.403e+06 – 1.449e+06 | 0 |
| 1.449e+06 – 1.494e+06 | 0 |
| 1.494e+06 – 1.539e+06 | 0 |
| 1.539e+06 – 1.585e+06 | 0 |
| 1.585e+06 – 1.63e+06 | 0 |
| 1.63e+06 – 1.675e+06 | 0 |
| 1.675e+06 – 1.72e+06 | 0 |
| 1.72e+06 – 1.766e+06 | 0 |
| 1.766e+06 – 1.811e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 1001 – 2780 | 97 |
| 2780 – 4559 | 15 |
| 4559 – 6337 | 133 |
| 6337 – 8116 | 59 |
| 8116 – 9895 | 14 |
| 9895 – 1.167e+04 | 4 |
| 1.167e+04 – 1.345e+04 | 226 |
| 1.345e+04 – 1.523e+04 | 5 |
| 1.523e+04 – 1.701e+04 | 49 |
| 1.701e+04 – 1.879e+04 | 189 |
| 1.879e+04 – 2.057e+04 | 204 |
| 2.057e+04 – 2.235e+04 | 184 |
| 2.235e+04 – 2.413e+04 | 39 |
| 2.413e+04 – 2.59e+04 | 15 |
| 2.59e+04 – 2.768e+04 | 170 |
| 2.768e+04 – 2.946e+04 | 196 |
| 2.946e+04 – 3.124e+04 | 150 |
| 3.124e+04 – 3.302e+04 | 27 |
| 3.302e+04 – 3.48e+04 | 21 |
| 3.48e+04 – 3.658e+04 | 95 |
| 3.658e+04 – 3.836e+04 | 153 |
| 3.836e+04 – 4.013e+04 | 155 |
| 4.013e+04 – 4.191e+04 | 46 |
| 4.191e+04 – 4.369e+04 | 67 |
| 4.369e+04 – 4.547e+04 | 51 |
| 4.547e+04 – 4.725e+04 | 161 |
| 4.725e+04 – 4.903e+04 | 268 |
| 4.903e+04 – 5.081e+04 | 29 |
| 5.081e+04 – 5.259e+04 | 133 |
| 5.259e+04 – 5.436e+04 | 94 |
| 5.436e+04 – 5.614e+04 | 95 |
| 5.614e+04 – 5.792e+04 | 0 |
| 5.792e+04 – 5.97e+04 | 0 |
| 5.97e+04 – 6.148e+04 | 0 |
| 6.148e+04 – 6.326e+04 | 0 |
| 6.326e+04 – 6.504e+04 | 0 |
| 6.504e+04 – 6.682e+04 | 0 |
| 6.682e+04 – 6.86e+04 | 0 |
| 6.86e+04 – 7.037e+04 | 0 |
| 7.037e+04 – 7.215e+04 | 78 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| total_housing_units | numeric | 0.0% | 3,074 |
high_skew
outliers
|
| owner_occupied | numeric | 0.0% | 3,001 |
high_skew
outliers
|
| renter_occupied | numeric | 0.0% | 2,709 |
high_skew
outliers
|
| pct_renter | numeric | 0.0% | 1,925 |
|
fips
numeric identifierThis column is the FIPS code for U.S. counties — every one of 3,222 rows is unique with no nulls, matching the count of U.S. counties. Values span 1001 to 72153, consistent with state-prefixed county FIPS identifiers, and the distribution is essentially uniform across the code space (skew 0.157, kurtosis -0.63, no outliers). Treatment: Treat as a categorical key; left-join on this code to county-level reference data rather than using as a numeric feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueThis column holds fully-qualified US county names (e.g. 'X County, State'), with 3222 rows all unique and zero nulls. The token 'county,' appears 2999 times, so roughly 223 rows use a different administrative suffix (parish, borough, census area). Texas (256), Virginia (189), and Georgia (159) lead the state distribution, consistent with the real US county count. Treatment: use as a join key after splitting into county and state components.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
total_housing_units
numeric feature high_skew outliersCounts of total housing units per record, almost certainly at a county or similar geographic level given 3,222 rows with 3,074 unique values and no nulls. The distribution is severely right-skewed (skew 12.05, kurtosis 240.5) with a median of 10,021 but a max of 3,363,093, and 443 rows (13.7%) flagged as outliers well above the Q3 of 25,939. The mean of 39,402 sits far above the median, confirming a long heavy tail driven by a few very large geographies. Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,074
- min
- 32
- max
- 3.363e+06
- mean
- 3.94e+04
- median
- 10,021
- std
- 1.201e+05
- q1
- 4211
- q3
- 25,939
- iqr
- 2.173e+04
- skew
- 12.05
- kurtosis
- 240.5
- n_outliers
- 443
- outlier_rate
- 0.1375
- zero_rate
- 0
owner_occupied
numeric feature high_skew outliersThis appears to be a count of owner-occupied housing units per geographic area, with 3001 unique values across 3222 rows and effectively no zeros (zero_rate 0.0003) or nulls. The distribution is severely right-skewed (skew 9.52, kurtosis 146.9): the median is 7325.5 but the mean is 25551.7 and the max reaches 1,552,164, producing 429 outliers (13.3% outlier rate). The interquartile range (3147.75 to 18863.5) is dwarfed by the standard deviation of 67553, indicating a long tail of large jurisdictions. Treatment: Log-transform before modelling to tame the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,001
- min
- 0
- max
- 1.552e+06
- mean
- 2.555e+04
- median
- 7326
- std
- 6.755e+04
- q1
- 3148
- q3
- 1.886e+04
- iqr
- 1.572e+04
- skew
- 9.516
- kurtosis
- 146.9
- n_outliers
- 429
- outlier_rate
- 0.1331
- zero_rate
- 0.0003104
renter_occupied
numeric feature high_skew outliersCounts of renter-occupied housing units per record, ranging from 28 to 1,810,929 with a median of 2,579.5 — consistent with a geographic rollup (likely county or similar). The distribution is extremely right-skewed (skew 15.82, kurtosis 398.15) and 13.9% of rows fall outside the IQR fences, reflecting a few very large metros dominating a long tail of small areas. No nulls or zeros, and 2,709 unique values across 3,222 rows. Treatment: log-transform before modelling to tame the skew and heavy outliers.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2,709
- min
- 28
- max
- 1.811e+06
- mean
- 1.385e+04
- median
- 2580
- std
- 5.535e+04
- q1
- 1004
- q3
- 7396
- iqr
- 6,392
- skew
- 15.82
- kurtosis
- 398.2
- n_outliers
- 449
- outlier_rate
- 0.1394
- zero_rate
- 0
pct_renter
numeric featureThis is a numeric feature representing the percentage of renters per record, ranging from 3.01 to 100.0 with a mean of 27.35 and median of 26.07. The distribution is right-skewed (skew 1.32, kurtosis 4.41) with 88 outliers (2.7%) on the high end, suggesting a small set of records — likely dense urban areas — with renter shares far above the typical 21.64–31.66 IQR. No nulls or zeros, and 1925 unique values across 3222 rows indicate well-populated continuous data. Treatment: Use as-is or apply a mild transform (e.g., log or winsorize) before regression to dampen the right-skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,925
- min
- 3.01
- max
- 100
- mean
- 27.35
- median
- 26.07
- std
- 8.564
- q1
- 21.64
- q3
- 31.66
- iqr
- 10.02
- skew
- 1.317
- kurtosis
- 4.412
- n_outliers
- 88
- outlier_rate
- 0.02731
- zero_rate
- 0