Summary confidence: high
This dataset covers 3,222 US counties with 16 columns describing housing affordability — rents, incomes, renter shares, and rent-burden percentages. Several core numeric fields (annual_rent, median_gross_rent, median_household_income, rent_to_income_ratio) contain extreme negative sentinel values like -666666666 and -7999999992 that are dragging means deeply negative and producing skew of -17 to -56; these need to be cleaned or filtered before any analysis. The affordability_category field is heavily imbalanced, with 'Affordable' covering 99.1% of counties and only 1 county labeled 'Extremely Burdened', which suggests the categorization rule may be miscalibrated. Once the sentinel values are removed, the rent-burden percentage columns (pct_rent_burdened_30plus around a median of 37.4%, pct_rent_burdened_50plus around 17.6%) look like the cleanest signals to start with.
citing: annual_rent · median_gross_rent · median_household_income · rent_to_income_ratio · affordability_category · pct_rent_burdened_30plus · pct_rent_burdened_50plus · pct_renter