merged inequality master
Reading
This dataset profiles 3,222 U.S. counties across 28 columns of socioeconomic indicators, including poverty, rent burden, education, healthcare, and a composite inequality index. Two things stand out for closer inspection: the rent_to_income_ratio shows extreme skew (53.98) with a max of 1200 against a median of 17.06, suggesting either data-entry anomalies or a handful of severe outliers worth investigating. Total population is also highly skewed (skew 13.36, max ~9.78M vs median 25,174), so any per-county aggregation should be population-weighted. The composite_index and the *_score columns are well-behaved and centered near 50, making them good candidates for cross-county comparison. Texas (254 counties), Georgia, and Virginia dominate the state distribution.
citing: rent_to_income_ratio · total_pop · composite_index · pct_poverty · state · pct_rent_burdened_30 · uninsured_rate
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 10.1 – 12.1 | 1 |
| 12.1 – 14.1 | 2 |
| 14.1 – 16.1 | 6 |
| 16.1 – 18.1 | 15 |
| 18.1 – 20.1 | 13 |
| 20.1 – 22.1 | 27 |
| 22.1 – 24.1 | 50 |
| 24.1 – 26.1 | 46 |
| 26.1 – 28.1 | 78 |
| 28.1 – 30.1 | 81 |
| 30.1 – 32.1 | 101 |
| 32.1 – 34.1 | 107 |
| 34.1 – 36.1 | 118 |
| 36.1 – 38.1 | 139 |
| 38.1 – 40.1 | 142 |
| 40.1 – 42.1 | 128 |
| 42.1 – 44.1 | 155 |
| 44.1 – 46.1 | 160 |
| 46.1 – 48.1 | 151 |
| 48.1 – 50.1 | 142 |
| 50.1 – 52.1 | 162 |
| 52.1 – 54.1 | 165 |
| 54.1 – 56.1 | 115 |
| 56.1 – 58.1 | 122 |
| 58.1 – 60.1 | 113 |
| 60.1 – 62.1 | 116 |
| 62.1 – 64.1 | 108 |
| 64.1 – 66.1 | 109 |
| 66.1 – 68.1 | 81 |
| 68.1 – 70.1 | 103 |
| 70.1 – 72.1 | 65 |
| 72.1 – 74.1 | 81 |
| 74.1 – 76.1 | 70 |
| 76.1 – 78.1 | 47 |
| 78.1 – 80.1 | 38 |
| 80.1 – 82.1 | 24 |
| 82.1 – 84.1 | 18 |
| 84.1 – 86.1 | 13 |
| 86.1 – 88.1 | 3 |
| 88.1 – 90.1 | 7 |
Show data table
| bin | count |
|---|---|
| 1.6 – 3.218 | 7 |
| 3.218 – 4.836 | 34 |
| 4.836 – 6.454 | 106 |
| 6.454 – 8.072 | 246 |
| 8.072 – 9.69 | 320 |
| 9.69 – 11.31 | 354 |
| 11.31 – 12.93 | 393 |
| 12.93 – 14.54 | 364 |
| 14.54 – 16.16 | 306 |
| 16.16 – 17.78 | 262 |
| 17.78 – 19.4 | 192 |
| 19.4 – 21.02 | 149 |
| 21.02 – 22.63 | 123 |
| 22.63 – 24.25 | 91 |
| 24.25 – 25.87 | 52 |
| 25.87 – 27.49 | 44 |
| 27.49 – 29.11 | 34 |
| 29.11 – 30.72 | 23 |
| 30.72 – 32.34 | 18 |
| 32.34 – 33.96 | 14 |
| 33.96 – 35.58 | 6 |
| 35.58 – 37.2 | 8 |
| 37.2 – 38.81 | 3 |
| 38.81 – 40.43 | 8 |
| 40.43 – 42.05 | 5 |
| 42.05 – 43.67 | 9 |
| 43.67 – 45.29 | 4 |
| 45.29 – 46.9 | 11 |
| 46.9 – 48.52 | 7 |
| 48.52 – 50.14 | 8 |
| 50.14 – 51.76 | 2 |
| 51.76 – 53.38 | 6 |
| 53.38 – 54.99 | 5 |
| 54.99 – 56.61 | 5 |
| 56.61 – 58.23 | 1 |
| 58.23 – 59.85 | 0 |
| 59.85 – 61.47 | 0 |
| 61.47 – 63.08 | 0 |
| 63.08 – 64.7 | 1 |
| 64.7 – 66.32 | 1 |
Show data table
| bin | count |
|---|---|
| 6.1 – 35.95 | 3207 |
| 35.95 – 65.8 | 5 |
| 65.8 – 95.64 | 0 |
| 95.64 – 125.5 | 0 |
| 125.5 – 155.3 | 0 |
| 155.3 – 185.2 | 0 |
| 185.2 – 215 | 0 |
| 215 – 244.9 | 0 |
| 244.9 – 274.7 | 0 |
| 274.7 – 304.6 | 0 |
| 304.6 – 334.4 | 0 |
| 334.4 – 364.3 | 0 |
| 364.3 – 394.1 | 0 |
| 394.1 – 424 | 0 |
| 424 – 453.8 | 0 |
| 453.8 – 483.7 | 0 |
| 483.7 – 513.5 | 0 |
| 513.5 – 543.4 | 0 |
| 543.4 – 573.2 | 0 |
| 573.2 – 603.1 | 0 |
| 603.1 – 632.9 | 0 |
| 632.9 – 662.7 | 0 |
| 662.7 – 692.6 | 0 |
| 692.6 – 722.4 | 0 |
| 722.4 – 752.3 | 0 |
| 752.3 – 782.1 | 0 |
| 782.1 – 812 | 0 |
| 812 – 841.8 | 0 |
| 841.8 – 871.7 | 0 |
| 871.7 – 901.5 | 0 |
| 901.5 – 931.4 | 0 |
| 931.4 – 961.2 | 0 |
| 961.2 – 991.1 | 0 |
| 991.1 – 1021 | 0 |
| 1021 – 1051 | 0 |
| 1051 – 1081 | 0 |
| 1081 – 1110 | 0 |
| 1110 – 1140 | 0 |
| 1140 – 1170 | 0 |
| 1170 – 1200 | 1 |
Show data table
| value | count | share |
|---|---|---|
| TX | 254 | 7.9% |
| GA | 159 | 4.9% |
| VA | 133 | 4.1% |
| KY | 120 | 3.7% |
| MO | 115 | 3.6% |
| KS | 105 | 3.3% |
| IL | 102 | 3.2% |
| NC | 100 | 3.1% |
| IA | 99 | 3.1% |
| TN | 95 | 2.9% |
| NE | 93 | 2.9% |
| IN | 92 | 2.9% |
| OH | 88 | 2.7% |
| MN | 87 | 2.7% |
| MI | 83 | 2.6% |
| MS | 82 | 2.5% |
| PR | 78 | 2.4% |
| OK | 77 | 2.4% |
| AR | 75 | 2.3% |
| WI | 72 | 2.2% |
Show data table
| bin | count |
|---|---|
| 0 – 1.624 | 9 |
| 1.624 – 3.248 | 5 |
| 3.248 – 4.872 | 3 |
| 4.872 – 6.496 | 5 |
| 6.496 – 8.12 | 9 |
| 8.12 – 9.744 | 13 |
| 9.744 – 11.37 | 11 |
| 11.37 – 12.99 | 16 |
| 12.99 – 14.62 | 26 |
| 14.62 – 16.24 | 19 |
| 16.24 – 17.86 | 35 |
| 17.86 – 19.49 | 43 |
| 19.49 – 21.11 | 52 |
| 21.11 – 22.74 | 52 |
| 22.74 – 24.36 | 73 |
| 24.36 – 25.98 | 99 |
| 25.98 – 27.61 | 109 |
| 27.61 – 29.23 | 116 |
| 29.23 – 30.86 | 132 |
| 30.86 – 32.48 | 159 |
| 32.48 – 34.1 | 189 |
| 34.1 – 35.73 | 209 |
| 35.73 – 37.35 | 227 |
| 37.35 – 38.98 | 239 |
| 38.98 – 40.6 | 205 |
| 40.6 – 42.22 | 209 |
| 42.22 – 43.85 | 210 |
| 43.85 – 45.47 | 190 |
| 45.47 – 47.1 | 131 |
| 47.1 – 48.72 | 114 |
| 48.72 – 50.34 | 118 |
| 50.34 – 51.97 | 69 |
| 51.97 – 53.59 | 51 |
| 53.59 – 55.22 | 34 |
| 55.22 – 56.84 | 24 |
| 56.84 – 58.46 | 6 |
| 58.46 – 60.09 | 3 |
| 60.09 – 61.71 | 2 |
| 61.71 – 63.34 | 3 |
| 63.34 – 64.96 | 3 |
Schema
28 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| state | categorical | 0.0% | 52 |
|
| total_pop | numeric | 0.0% | 3,173 |
high_skew
outliers
|
| composite_index | numeric | 0.0% | 650 |
|
| economic_score | numeric | 0.0% | 908 |
|
| education_score | numeric | 0.0% | 1,001 |
|
| healthcare_score | numeric | 0.0% | 808 |
|
| housing_score | numeric | 0.0% | 937 |
|
| food_score | numeric | 0.0% | 941 |
|
| disability_score | numeric | 0.0% | 1,001 |
|
| poverty_rate | numeric | 0.0% | 1,719 |
high_skew
|
| no_vehicle_pct | numeric | 0.0% | 1,065 |
high_skew
|
| uninsured_rate | numeric | 0.0% | 152 |
high_skew
outliers
|
| hospital_closure_risk | numeric | 0.0% | 3 |
|
| pct_rent_burdened_30 | numeric | 0.0% | 2,146 |
|
| pct_rent_burdened_50 | numeric | 0.0% | 1,769 |
|
| median_gross_rent | numeric | 0.3% | 983 |
outliers
|
| rent_to_income_ratio | numeric | 0.3% | 1,269 |
high_skew
|
| gini_index | numeric | 0.0% | 1,317 |
|
| unemployment_rate | numeric | 0.0% | 950 |
high_skew
|
| labor_force_participation | numeric | 0.0% | 1,944 |
|
| pct_deep_poverty | numeric | 0.0% | 1,131 |
high_skew
outliers
|
| pct_poverty | numeric | 0.0% | 1,719 |
high_skew
|
| pct_near_poverty | numeric | 0.0% | 1,237 |
|
| pct_hs_or_higher | numeric | 0.0% | 1,612 |
|
| pct_bachelors_or_higher | numeric | 0.0% | 1,982 |
|
| disability_rate | numeric | 0.0% | 305 |
high_skew
|
fips
numeric identifierThis is the FIPS code identifying U.S. counties (or equivalents), with values spanning 1001 to 72153 and exactly one row per code (3222 unique out of 3222). The distribution is roughly symmetric (skew 0.16, kurtosis -0.63) with no nulls or outliers, consistent with a structured geographic key rather than a measured quantity. Treat the numeric stats as incidental—the magnitude has no analytic meaning. Treatment: Cast to zero-padded string and use as a join key to county-level reference data.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueThis column appears to be a fully-qualified US county name (e.g., 'X County, State'), with all 3222 values unique and zero nulls. The token 'county,' appears in 2999 of 3222 rows, suggesting ~223 entries use a different administrative suffix (parish, borough, census area). State-name frequencies (Texas 256, Virginia 189, Georgia 159) line up with known county counts, and length is tightly bounded between 16 and 59 characters. Treatment: Use as a join key to county-level reference tables; do not feed as a feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
categorical featureThis is a US state code column with 52 distinct values, consistent with the 50 states plus DC and likely one territory. Distribution is broad and near-uniform on entropy (entropy_ratio 0.932), with TX leading at just 254 of 3222 rows (7.88%) followed by GA, VA, KY, and MO — suggesting one row per US county or similar geographic unit rather than a population-weighted sample. No nulls. Treatment: Use as a categorical grouping key; one-hot or target-encode for modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 52
- top_value
- TX
- top_rate
- 0.07883
- cardinality
- 52
- entropy
- 5.314
- entropy_ratio
- 0.9322
total_pop
numeric feature high_skew outliersThis is a population count column with 3222 records and 3173 unique values, no nulls or zeros, ranging from 47 to 9,782,602. The distribution is extremely right-skewed (skew 13.36, kurtosis 297.59) with the mean (101,340) nearly four times the median (25,174), and 449 outliers (13.9%) sit beyond the IQR fence. The shape is consistent with US county- or municipality-level populations where a few large metros dominate. Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,173
- min
- 47
- max
- 9.783e+06
- mean
- 1.013e+05
- median
- 25,174
- std
- 3.246e+05
- q1
- 1.059e+04
- q3
- 6.501e+04
- iqr
- 5.442e+04
- skew
- 13.36
- kurtosis
- 297.6
- n_outliers
- 449
- outlier_rate
- 0.1394
- zero_rate
- 0
composite_index
numeric featureA numeric composite_index spanning 10.1 to 90.1 with mean 49.99 and median 49.5, suggesting a deliberately scaled or normalized index centered near 50. The distribution is nearly symmetric (skew 0.13) and slightly platykurtic (kurtosis -0.67), with no nulls, no zeros, and no outliers flagged. Only 650 unique values across 3222 rows points to rounding to one decimal rather than continuous measurement. Treatment: Use as-is for modelling; already well-scaled and clean, no transform needed.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 650
- min
- 10.1
- max
- 90.1
- mean
- 49.99
- median
- 49.5
- std
- 15.29
- q1
- 38.4
- q3
- 61.5
- iqr
- 23.1
- skew
- 0.1295
- kurtosis
- -0.6661
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
economic_score
numeric featureA bounded numeric feature ranging from 0.3 to 99.9 with mean 50.00 and median 49.6, consistent with a 0-100 economic index or score. The distribution is nearly symmetric (skew 0.084) and platykurtic (kurtosis -0.826), with no nulls, no zeros, and no outliers flagged across 3222 rows. With 908 unique values and an IQR of 35.47, the spread is wide and uniform-leaning rather than concentrated. Treatment: Use as-is or min-max scale to [0,1]; no transformation needed given symmetric bounded distribution.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 908
- min
- 0.3
- max
- 99.9
- mean
- 50
- median
- 49.6
- std
- 23.15
- q1
- 32.2
- q3
- 67.67
- iqr
- 35.47
- skew
- 0.084
- kurtosis
- -0.8261
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
education_score
numeric featureThis column is a numeric education score bounded between 0 and 100 with a perfectly symmetric distribution (mean and median both 50.0, skew effectively zero). The negative kurtosis of -1.20 and IQR spanning exactly 25 to 75 suggest a near-uniform spread rather than a bell curve, which is unusual for a real-world score and hints at synthetic or rank-transformed data. With 1001 unique values across 3222 rows, no nulls, and no outliers, the column is clean but suspiciously well-behaved. Treatment: Use as-is or scale to [0,1]; verify it isn't a synthetic/rank feature before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,001
- min
- 0
- max
- 100
- mean
- 50
- median
- 50
- std
- 28.88
- q1
- 25
- q3
- 75
- iqr
- 50
- skew
- 1.2e-17
- kurtosis
- -1.2
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.0006207
healthcare_score
numeric featureA continuous healthcare quality or performance score for 3222 rows, ranging from 4.3 to 98.2 with mean 50.0 and median 48.6. The distribution is mildly right-skewed (0.24) with negative kurtosis (-0.75), suggesting a broad, near-uniform spread rather than a tight bell, and no outliers were flagged. With 808 unique values, no nulls, and no zeros, the column looks clean and ready to use. Treatment: Use as-is as a numeric feature; standardize if combining with other scaled features.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 808
- min
- 4.3
- max
- 98.2
- mean
- 50
- median
- 48.6
- std
- 20.19
- q1
- 33.9
- q3
- 64.57
- iqr
- 30.67
- skew
- 0.2381
- kurtosis
- -0.7521
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
housing_score
numeric featureA continuous housing_score ranging from 0.0 to 99.9 with mean 49.93 and median 49.85, suggesting a 0-100 index. The distribution is nearly symmetric (skew 0.01) and platykurtic (kurtosis -0.88), with a wide IQR of 37.98 and no detected outliers, consistent with a near-uniform spread rather than a peaked score. No nulls and only one zero across 3222 rows. Treatment: Use as-is or min-max scale to [0,1]; no transform needed given symmetry and absence of outliers.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 937
- min
- 0
- max
- 99.9
- mean
- 49.93
- median
- 49.85
- std
- 24.47
- q1
- 30.73
- q3
- 68.7
- iqr
- 37.98
- skew
- 0.01353
- kurtosis
- -0.8807
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.0003104
food_score
numeric featureA numeric feature called food_score that ranges from 0.1 to 99.5 with mean 49.9997 and median 50.0, suggesting a percentile-style or normalised rating bounded near [0,100]. The distribution is essentially symmetric (skew 0.029) and platykurtic (kurtosis -0.96), with no nulls, no zeros, and no outliers across 3222 rows — consistent with a synthetic or uniformly distributed score rather than an organic measurement. Treatment: Use as-is; already on a bounded 0–100 scale with no transformation needed.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 941
- min
- 0.1
- max
- 99.5
- mean
- 50
- median
- 50
- std
- 25.48
- q1
- 29.6
- q3
- 69.8
- iqr
- 40.2
- skew
- 0.02926
- kurtosis
- -0.9648
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
disability_score
numeric featureA numeric disability score bounded between 0 and 100 with mean and median both exactly 50.0 and zero skew, indicating a perfectly symmetric distribution. The negative kurtosis (-1.20) and IQR spanning 25 to 75 suggest a near-uniform spread rather than a bell curve, which is unusual for a real-world severity metric and hints at synthetic or rank-based generation. No nulls and no outliers across 3222 rows with 1001 distinct values. Treatment: use as-is or bin into quartiles; no transformation needed given symmetric bounded range.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,001
- min
- 0
- max
- 100
- mean
- 50
- median
- 50
- std
- 28.88
- q1
- 25
- q3
- 75
- iqr
- 50
- skew
- 1.2e-17
- kurtosis
- -1.2
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.0006207
poverty_rate
numeric feature high_skewNumeric poverty rate (likely percent of population below the poverty line) across 3,222 rows with no nulls and 1,719 distinct values. The distribution is right-skewed (skew 2.10, kurtosis 6.89): median is 13.55 and Q3 is 17.91, but the max reaches 66.32, producing 137 outliers (4.25%). Minimum is 1.6 and there are no zeros, consistent with a county- or area-level rate rather than individual records. Treatment: Consider a log or winsorizing transform before regression to tame the right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0
no_vehicle_pct
numeric feature high_skewPercentage of households with no vehicle, reported per row (likely a geographic unit like county or tract). The distribution is tightly clustered with a median of 5.41 and IQR of 3.38, but a long right tail pushes the max to 85.94, yielding skew of 6.98 and kurtosis of 86.23. About 4.3% of rows are flagged as outliers, and 0.37% are exact zeros; no nulls. Treatment: Log1p- or winsorize before modelling to tame the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,065
- min
- 0
- max
- 85.94
- mean
- 6.197
- median
- 5.41
- std
- 4.538
- q1
- 3.98
- q3
- 7.36
- iqr
- 3.38
- skew
- 6.976
- kurtosis
- 86.23
- n_outliers
- 140
- outlier_rate
- 0.04345
- zero_rate
- 0.003724
uninsured_rate
numeric feature high_skew outliersLikely a per-record uninsured rate (probably proportion of population without insurance), ranging 0.0 to 3.7 with a median of 0.12 and IQR of 0.21. The distribution is heavily right-skewed (skew 4.10, kurtosis 27.7) with 230 outliers (7.1%) and 17.5% exact zeros; the max of 3.7 is implausible for a true rate and suggests mixed units or data-entry errors. Treatment: Investigate values >1 for unit errors, then winsorize or log1p-transform before modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 152
- min
- 0
- max
- 3.7
- mean
- 0.2002
- median
- 0.12
- std
- 0.2829
- q1
- 0.04
- q3
- 0.25
- iqr
- 0.21
- skew
- 4.095
- kurtosis
- 27.7
- n_outliers
- 230
- outlier_rate
- 0.07138
- zero_rate
- 0.1754
hospital_closure_risk
numeric featureA coarse risk score for hospital closure taking only 3 distinct values across 3222 rows, bounded between 0.0 and 50.0 with a median of 25.0. Despite being stored as numeric, the column behaves categorically: 28.8% of rows are zero and quartiles collapse to 0.0 and 25.0, suggesting the three buckets are roughly {0, 25, 50}. No outliers and no nulls. Treatment: Treat as an ordinal category with three levels rather than a continuous variable.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3
- min
- 0
- max
- 50
- mean
- 21.69
- median
- 25
- std
- 16.34
- q1
- 0
- q3
- 25
- iqr
- 25
- skew
- 0.1414
- kurtosis
- -0.6949
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.2883
pct_rent_burdened_30
numeric featureThis appears to be the percentage of renter households spending at least 30% of income on rent, reported per row (likely a county or tract). Values span 0 to 64.96 with a median of 37.36 and IQR 30.67–43.48, indicating most areas cluster in the 30–45% range with a mild left skew (-0.57). About 0.25% of rows are exact zeros and 58 outliers (1.8%) sit outside the whiskers, worth checking for small-population geographies. Treatment: Use as-is for modelling; optionally winsorize the 58 outliers and verify zero-valued rows.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2,146
- min
- 0
- max
- 64.96
- mean
- 36.44
- median
- 37.36
- std
- 10.01
- q1
- 30.67
- q3
- 43.48
- iqr
- 12.81
- skew
- -0.5673
- kurtosis
- 0.5032
- n_outliers
- 58
- outlier_rate
- 0.018
- zero_rate
- 0.002483
pct_rent_burdened_50
numeric featureThis column reports the percentage of households that are severely rent-burdened (spending 50%+ of income on rent), with values ranging from 0.0 to 64.96 and a mean of 17.35 closely matching the median of 17.62. The distribution is remarkably symmetric (skew 0.054) and near-normal in shape, with only 47 outliers (1.46%) and a small zero rate of 0.93%. The tight IQR of 8.56 around a median near 17.6 suggests most geographies cluster in a narrow band of severe rent burden. Treatment: Use directly as a numeric feature; no transform needed given near-symmetric distribution.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,769
- min
- 0
- max
- 64.96
- mean
- 17.35
- median
- 17.62
- std
- 6.577
- q1
- 13.07
- q3
- 21.63
- iqr
- 8.557
- skew
- 0.05436
- kurtosis
- 0.9823
- n_outliers
- 47
- outlier_rate
- 0.01459
- zero_rate
- 0.009311
median_gross_rent
numeric feature outliersNumeric column capturing median gross rent in dollars, with 3,222 rows, 983 unique values, and a trivial 0.31% null rate. The distribution is right-skewed (skew 1.76, kurtosis 4.55), running from 297 to 2,805 around a median of 818 and mean of 891, and 225 values (7.0%) flag as outliers on the high end. No zeros are present, so missingness isn't being encoded as 0. Treatment: Log-transform before regression to tame the right skew and high-rent outliers.
- n
- 3,222
- nulls
- 10 (0.3%)
- unique
- 983
- min
- 297
- max
- 2,805
- mean
- 890.9
- median
- 818
- std
- 283.4
- q1
- 718
- q3
- 978
- iqr
- 260
- skew
- 1.763
- kurtosis
- 4.55
- n_outliers
- 225
- outlier_rate
- 0.07005
- zero_rate
- 0
rent_to_income_ratio
numeric feature high_skewThis column reports a rent-to-income ratio, with a typical tenant sitting near 17.06 and an interquartile range of just 4.29 between 15.1 and 19.39. However, the maximum of 1200.0 against a median of 17.06 produces extreme skew (53.98) and kurtosis (3007.07), and 107 values (3.33%) are flagged as outliers. The tight IQR alongside a 21.2 standard deviation indicates a small number of records are orders of magnitude beyond the bulk of the distribution. Treatment: Cap or winsorize extreme values and log-transform before modelling.
- n
- 3,222
- nulls
- 9 (0.3%)
- unique
- 1,269
- min
- 6.1
- max
- 1,200
- mean
- 17.89
- median
- 17.06
- std
- 21.2
- q1
- 15.1
- q3
- 19.39
- iqr
- 4.29
- skew
- 53.98
- kurtosis
- 3007
- n_outliers
- 107
- outlier_rate
- 0.0333
- zero_rate
- 0
gini_index
numeric featureNumeric column holding Gini index values for 3,222 records, all populated and bounded between 0.2744 and 0.721 with a mean of 0.4481 and median 0.4457. Distribution is tight (IQR 0.049375, std 0.0384) with mild right skew (0.4999) and 56 high-side outliers (1.74%) stretching toward 0.721. Values fall in the expected 0–1 range for an inequality coefficient, suggesting a clean, ready-to-use feature. Treatment: Use as-is as a numeric feature; optionally winsorize the 56 upper outliers.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,317
- min
- 0.2744
- max
- 0.721
- mean
- 0.4481
- median
- 0.4457
- std
- 0.03841
- q1
- 0.422
- q3
- 0.4714
- iqr
- 0.04938
- skew
- 0.4999
- kurtosis
- 1.634
- n_outliers
- 56
- outlier_rate
- 0.01738
- zero_rate
- 0
unemployment_rate
numeric feature high_skewLikely a county/region-level unemployment rate in percent, with values ranging from 0.0 to 31.99 and a median of 4.69. The distribution is heavily right-skewed (skew 2.55, kurtosis 12.81) with 154 outliers (4.78%) pulling the mean (5.13) above the median. A small zero_rate (0.56%) suggests a handful of suspiciously perfect-zero readings worth verifying. Treatment: Log or Yeo-Johnson transform before regression to tame the right-skew, and inspect the zero values.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 950
- min
- 0
- max
- 31.99
- mean
- 5.127
- median
- 4.69
- std
- 2.926
- q1
- 3.42
- q3
- 6.08
- iqr
- 2.66
- skew
- 2.545
- kurtosis
- 12.81
- n_outliers
- 154
- outlier_rate
- 0.0478
- zero_rate
- 0.005587
labor_force_participation
numeric featureNumeric labor force participation rate, almost certainly expressed as a percentage given the range of 18.63 to 84.04 and mean of 57.89. Distribution is moderately left-skewed (-0.58) with a tight interquartile band of 52.97 to 63.67, and only 38 outliers (1.18%) sit outside the whiskers. No nulls or zeros across 3,222 rows, and 1,944 unique values suggest fine-grained measurements rather than rounded buckets. Treatment: Use as-is in modelling; mild left skew does not require transformation.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,944
- min
- 18.63
- max
- 84.04
- mean
- 57.89
- median
- 58.72
- std
- 8.041
- q1
- 52.97
- q3
- 63.66
- iqr
- 10.7
- skew
- -0.5766
- kurtosis
- 0.4502
- n_outliers
- 38
- outlier_rate
- 0.01179
- zero_rate
- 0
pct_deep_poverty
numeric feature high_skew outliersPercentage of population in deep poverty across 3,222 rows, with no nulls and values bounded between 0.0 and 34.7. The distribution is right-skewed (skew 2.67, kurtosis 10.40) with median 5.82 trailing the mean 6.74, and 176 rows (5.5%) flagged as upper-tail outliers. Only 0.09% of rows are zero, so floor effects are minimal despite the long tail. Treatment: Log or Winsorize before linear modelling to dampen the heavy right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,131
- min
- 0
- max
- 34.7
- mean
- 6.743
- median
- 5.82
- std
- 4.154
- q1
- 4.27
- q3
- 7.918
- iqr
- 3.648
- skew
- 2.665
- kurtosis
- 10.4
- n_outliers
- 176
- outlier_rate
- 0.05462
- zero_rate
- 0.0009311
pct_poverty
numeric feature high_skewLikely a county- or area-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and mean of 15.10. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 outliers (4.25%) in the heavy upper tail, consistent with a small set of high-poverty areas pulling the mean above the median. No nulls or zeros, and 1719 unique values across 3222 rows suggest fine-grained but repeated measurements. Treatment: Consider a log or sqrt transform before linear modelling to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,719
- min
- 1.6
- max
- 66.32
- mean
- 15.1
- median
- 13.55
- std
- 7.706
- q1
- 10.16
- q3
- 17.91
- iqr
- 7.75
- skew
- 2.096
- kurtosis
- 6.891
- n_outliers
- 137
- outlier_rate
- 0.04252
- zero_rate
- 0
pct_near_poverty
numeric featurePercentage of population near the poverty line (likely between 100-200% of the federal poverty threshold), reported per record across 3222 rows with no nulls. The distribution centers around a median of 9.38 with an IQR of 4.43, but a right tail pushes the max to 49.14, yielding skew of 1.19 and kurtosis of 5.73. About 2.5% of values (82 rows) fall outside the outlier fence, suggesting a handful of high-poverty areas worth inspecting separately. Treatment: Consider a log or sqrt transform before regression to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,237
- min
- 0.58
- max
- 49.14
- mean
- 9.813
- median
- 9.38
- std
- 3.644
- q1
- 7.33
- q3
- 11.76
- iqr
- 4.43
- skew
- 1.19
- kurtosis
- 5.729
- n_outliers
- 82
- outlier_rate
- 0.02545
- zero_rate
- 0
pct_hs_or_higher
numeric featurePercentage of population (likely adults 25+) with a high school diploma or higher, reported per row across 3,222 records. Values are tightly clustered high (mean 88.08, median 89.39, IQR 84.9–92.47) with a left tail reaching down to 33.33, producing skew of -1.33 and 86 low-end outliers (2.67%). No nulls or zeros, and 1,612 unique values suggest a county- or tract-level rate. Treatment: Use as-is for modelling, but consider a reflected log or winsorisation given the left skew and low-end outliers.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,612
- min
- 33.33
- max
- 99.69
- mean
- 88.08
- median
- 89.39
- std
- 5.97
- q1
- 84.9
- q3
- 92.47
- iqr
- 7.567
- skew
- -1.328
- kurtosis
- 3.742
- n_outliers
- 86
- outlier_rate
- 0.02669
- zero_rate
- 0
pct_bachelors_or_higher
numeric featurePercent of adults with a bachelor's degree or higher, almost certainly at the county or similar geographic level given n=3222 with no nulls. Values range from 0.0 to 78.87 with median 21.07 and mean 23.50, and the distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 outliers (4.4%) on the high end—consistent with a long tail of highly educated metros above the typical county. Treatment: Consider a log or sqrt transform before linear modelling to tame the right skew.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,982
- min
- 0
- max
- 78.87
- mean
- 23.5
- median
- 21.07
- std
- 9.983
- q1
- 16.59
- q3
- 27.85
- iqr
- 11.26
- skew
- 1.357
- kurtosis
- 2.306
- n_outliers
- 141
- outlier_rate
- 0.04376
- zero_rate
- 0.0003104
disability_rate
numeric feature high_skewThis is a numeric disability rate per record, ranging from 0.0 to 9.17 with a median of 1.07 and IQR of 0.65. The distribution is heavily right-skewed (skew 2.17, kurtosis 15.24) with 117 outliers (3.6%) and a small but non-trivial 1.7% zeros. Only 305 unique values across 3,222 rows suggests the rate is reported at coarse precision or aggregated to a small set of geographies. Treatment: Log- or winsorize-transform before regression to tame the right tail.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 305
- min
- 0
- max
- 9.17
- mean
- 1.145
- median
- 1.07
- std
- 0.6215
- q1
- 0.77
- q3
- 1.42
- iqr
- 0.65
- skew
- 2.167
- kurtosis
- 15.24
- n_outliers
- 117
- outlier_rate
- 0.03631
- zero_rate
- 0.01676