saturn·

merged inequality master

source /home/coolhand/datasets/us-inequality-atlas/merged/inequality_master.csv 3,222 rows 28 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset profiles 3,222 U.S. counties across 28 columns of socioeconomic indicators, including poverty, rent burden, education, healthcare, and a composite inequality index. Two things stand out for closer inspection: the rent_to_income_ratio shows extreme skew (53.98) with a max of 1200 against a median of 17.06, suggesting either data-entry anomalies or a handful of severe outliers worth investigating. Total population is also highly skewed (skew 13.36, max ~9.78M vs median 25,174), so any per-county aggregation should be population-weighted. The composite_index and the *_score columns are well-behaved and centered near 50, making them good candidates for cross-county comparison. Texas (254 counties), Georgia, and Virginia dominate the state distribution.

citing: rent_to_income_ratio · total_pop · composite_index · pct_poverty · state · pct_rent_burdened_30 · uninsured_rate

Schema

28 columns
Per-column summary. Click column name to jump to its detail.
Alerts
fips numeric 0.0% 3,222
county_name text 0.0% 3,222
near_unique
state categorical 0.0% 52
total_pop numeric 0.0% 3,173
high_skew outliers
composite_index numeric 0.0% 650
economic_score numeric 0.0% 908
education_score numeric 0.0% 1,001
healthcare_score numeric 0.0% 808
housing_score numeric 0.0% 937
food_score numeric 0.0% 941
disability_score numeric 0.0% 1,001
poverty_rate numeric 0.0% 1,719
high_skew
no_vehicle_pct numeric 0.0% 1,065
high_skew
uninsured_rate numeric 0.0% 152
high_skew outliers
hospital_closure_risk numeric 0.0% 3
pct_rent_burdened_30 numeric 0.0% 2,146
pct_rent_burdened_50 numeric 0.0% 1,769
median_gross_rent numeric 0.3% 983
outliers
rent_to_income_ratio numeric 0.3% 1,269
high_skew
gini_index numeric 0.0% 1,317
unemployment_rate numeric 0.0% 950
high_skew
labor_force_participation numeric 0.0% 1,944
pct_deep_poverty numeric 0.0% 1,131
high_skew outliers
pct_poverty numeric 0.0% 1,719
high_skew
pct_near_poverty numeric 0.0% 1,237
pct_hs_or_higher numeric 0.0% 1,612
pct_bachelors_or_higher numeric 0.0% 1,982
disability_rate numeric 0.0% 305
high_skew

fips

numeric identifier
This is the FIPS code identifying U.S. counties (or equivalents), with values spanning 1001 to 72153 and exactly one row per code (3222 unique out of 3222). The distribution is roughly symmetric (skew 0.16, kurtosis -0.63) with no nulls or outliers, consistent with a structured geographic key rather than a measured quantity. Treat the numeric stats as incidental—the magnitude has no analytic meaning. Treatment: Cast to zero-padded string and use as a join key to county-level reference data. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

county_name

text identifier near_unique
This column appears to be a fully-qualified US county name (e.g., 'X County, State'), with all 3222 values unique and zero nulls. The token 'county,' appears in 2999 of 3222 rows, suggesting ~223 entries use a different administrative suffix (parish, borough, census area). State-name frequencies (Texas 256, Virginia 189, Georgia 159) line up with known county counts, and length is tightly bounded between 16 and 59 characters. Treatment: Use as a join key to county-level reference tables; do not feed as a feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

state

categorical feature
This is a US state code column with 52 distinct values, consistent with the 50 states plus DC and likely one territory. Distribution is broad and near-uniform on entropy (entropy_ratio 0.932), with TX leading at just 254 of 3222 rows (7.88%) followed by GA, VA, KY, and MO — suggesting one row per US county or similar geographic unit rather than a population-weighted sample. No nulls. Treatment: Use as a categorical grouping key; one-hot or target-encode for modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
top_value
TX
top_rate
0.07883
cardinality
52
entropy
5.314
entropy_ratio
0.9322

total_pop

numeric feature high_skew outliers
This is a population count column with 3222 records and 3173 unique values, no nulls or zeros, ranging from 47 to 9,782,602. The distribution is extremely right-skewed (skew 13.36, kurtosis 297.59) with the mean (101,340) nearly four times the median (25,174), and 449 outliers (13.9%) sit beyond the IQR fence. The shape is consistent with US county- or municipality-level populations where a few large metros dominate. Treatment: log-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,173
min
47
max
9.783e+06
mean
1.013e+05
median
25,174
std
3.246e+05
q1
1.059e+04
q3
6.501e+04
iqr
5.442e+04
skew
13.36
kurtosis
297.6
n_outliers
449
outlier_rate
0.1394
zero_rate
0

composite_index

numeric feature
A numeric composite_index spanning 10.1 to 90.1 with mean 49.99 and median 49.5, suggesting a deliberately scaled or normalized index centered near 50. The distribution is nearly symmetric (skew 0.13) and slightly platykurtic (kurtosis -0.67), with no nulls, no zeros, and no outliers flagged. Only 650 unique values across 3222 rows points to rounding to one decimal rather than continuous measurement. Treatment: Use as-is for modelling; already well-scaled and clean, no transform needed. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
650
min
10.1
max
90.1
mean
49.99
median
49.5
std
15.29
q1
38.4
q3
61.5
iqr
23.1
skew
0.1295
kurtosis
-0.6661
n_outliers
0
outlier_rate
0
zero_rate
0

economic_score

numeric feature
A bounded numeric feature ranging from 0.3 to 99.9 with mean 50.00 and median 49.6, consistent with a 0-100 economic index or score. The distribution is nearly symmetric (skew 0.084) and platykurtic (kurtosis -0.826), with no nulls, no zeros, and no outliers flagged across 3222 rows. With 908 unique values and an IQR of 35.47, the spread is wide and uniform-leaning rather than concentrated. Treatment: Use as-is or min-max scale to [0,1]; no transformation needed given symmetric bounded distribution. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
908
min
0.3
max
99.9
mean
50
median
49.6
std
23.15
q1
32.2
q3
67.67
iqr
35.47
skew
0.084
kurtosis
-0.8261
n_outliers
0
outlier_rate
0
zero_rate
0

education_score

numeric feature
This column is a numeric education score bounded between 0 and 100 with a perfectly symmetric distribution (mean and median both 50.0, skew effectively zero). The negative kurtosis of -1.20 and IQR spanning exactly 25 to 75 suggest a near-uniform spread rather than a bell curve, which is unusual for a real-world score and hints at synthetic or rank-transformed data. With 1001 unique values across 3222 rows, no nulls, and no outliers, the column is clean but suspiciously well-behaved. Treatment: Use as-is or scale to [0,1]; verify it isn't a synthetic/rank feature before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,001
min
0
max
100
mean
50
median
50
std
28.88
q1
25
q3
75
iqr
50
skew
1.2e-17
kurtosis
-1.2
n_outliers
0
outlier_rate
0
zero_rate
0.0006207

healthcare_score

numeric feature
A continuous healthcare quality or performance score for 3222 rows, ranging from 4.3 to 98.2 with mean 50.0 and median 48.6. The distribution is mildly right-skewed (0.24) with negative kurtosis (-0.75), suggesting a broad, near-uniform spread rather than a tight bell, and no outliers were flagged. With 808 unique values, no nulls, and no zeros, the column looks clean and ready to use. Treatment: Use as-is as a numeric feature; standardize if combining with other scaled features. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
808
min
4.3
max
98.2
mean
50
median
48.6
std
20.19
q1
33.9
q3
64.57
iqr
30.67
skew
0.2381
kurtosis
-0.7521
n_outliers
0
outlier_rate
0
zero_rate
0

housing_score

numeric feature
A continuous housing_score ranging from 0.0 to 99.9 with mean 49.93 and median 49.85, suggesting a 0-100 index. The distribution is nearly symmetric (skew 0.01) and platykurtic (kurtosis -0.88), with a wide IQR of 37.98 and no detected outliers, consistent with a near-uniform spread rather than a peaked score. No nulls and only one zero across 3222 rows. Treatment: Use as-is or min-max scale to [0,1]; no transform needed given symmetry and absence of outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
937
min
0
max
99.9
mean
49.93
median
49.85
std
24.47
q1
30.73
q3
68.7
iqr
37.98
skew
0.01353
kurtosis
-0.8807
n_outliers
0
outlier_rate
0
zero_rate
0.0003104

food_score

numeric feature
A numeric feature called food_score that ranges from 0.1 to 99.5 with mean 49.9997 and median 50.0, suggesting a percentile-style or normalised rating bounded near [0,100]. The distribution is essentially symmetric (skew 0.029) and platykurtic (kurtosis -0.96), with no nulls, no zeros, and no outliers across 3222 rows — consistent with a synthetic or uniformly distributed score rather than an organic measurement. Treatment: Use as-is; already on a bounded 0–100 scale with no transformation needed. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
941
min
0.1
max
99.5
mean
50
median
50
std
25.48
q1
29.6
q3
69.8
iqr
40.2
skew
0.02926
kurtosis
-0.9648
n_outliers
0
outlier_rate
0
zero_rate
0

disability_score

numeric feature
A numeric disability score bounded between 0 and 100 with mean and median both exactly 50.0 and zero skew, indicating a perfectly symmetric distribution. The negative kurtosis (-1.20) and IQR spanning 25 to 75 suggest a near-uniform spread rather than a bell curve, which is unusual for a real-world severity metric and hints at synthetic or rank-based generation. No nulls and no outliers across 3222 rows with 1001 distinct values. Treatment: use as-is or bin into quartiles; no transformation needed given symmetric bounded range. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,001
min
0
max
100
mean
50
median
50
std
28.88
q1
25
q3
75
iqr
50
skew
1.2e-17
kurtosis
-1.2
n_outliers
0
outlier_rate
0
zero_rate
0.0006207

poverty_rate

numeric feature high_skew
Numeric poverty rate (likely percent of population below the poverty line) across 3,222 rows with no nulls and 1,719 distinct values. The distribution is right-skewed (skew 2.10, kurtosis 6.89): median is 13.55 and Q3 is 17.91, but the max reaches 66.32, producing 137 outliers (4.25%). Minimum is 1.6 and there are no zeros, consistent with a county- or area-level rate rather than individual records. Treatment: Consider a log or winsorizing transform before regression to tame the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0

no_vehicle_pct

numeric feature high_skew
Percentage of households with no vehicle, reported per row (likely a geographic unit like county or tract). The distribution is tightly clustered with a median of 5.41 and IQR of 3.38, but a long right tail pushes the max to 85.94, yielding skew of 6.98 and kurtosis of 86.23. About 4.3% of rows are flagged as outliers, and 0.37% are exact zeros; no nulls. Treatment: Log1p- or winsorize before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,065
min
0
max
85.94
mean
6.197
median
5.41
std
4.538
q1
3.98
q3
7.36
iqr
3.38
skew
6.976
kurtosis
86.23
n_outliers
140
outlier_rate
0.04345
zero_rate
0.003724

uninsured_rate

numeric feature high_skew outliers
Likely a per-record uninsured rate (probably proportion of population without insurance), ranging 0.0 to 3.7 with a median of 0.12 and IQR of 0.21. The distribution is heavily right-skewed (skew 4.10, kurtosis 27.7) with 230 outliers (7.1%) and 17.5% exact zeros; the max of 3.7 is implausible for a true rate and suggests mixed units or data-entry errors. Treatment: Investigate values >1 for unit errors, then winsorize or log1p-transform before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
152
min
0
max
3.7
mean
0.2002
median
0.12
std
0.2829
q1
0.04
q3
0.25
iqr
0.21
skew
4.095
kurtosis
27.7
n_outliers
230
outlier_rate
0.07138
zero_rate
0.1754

hospital_closure_risk

numeric feature
A coarse risk score for hospital closure taking only 3 distinct values across 3222 rows, bounded between 0.0 and 50.0 with a median of 25.0. Despite being stored as numeric, the column behaves categorically: 28.8% of rows are zero and quartiles collapse to 0.0 and 25.0, suggesting the three buckets are roughly {0, 25, 50}. No outliers and no nulls. Treatment: Treat as an ordinal category with three levels rather than a continuous variable. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3
min
0
max
50
mean
21.69
median
25
std
16.34
q1
0
q3
25
iqr
25
skew
0.1414
kurtosis
-0.6949
n_outliers
0
outlier_rate
0
zero_rate
0.2883

pct_rent_burdened_30

numeric feature
This appears to be the percentage of renter households spending at least 30% of income on rent, reported per row (likely a county or tract). Values span 0 to 64.96 with a median of 37.36 and IQR 30.67–43.48, indicating most areas cluster in the 30–45% range with a mild left skew (-0.57). About 0.25% of rows are exact zeros and 58 outliers (1.8%) sit outside the whiskers, worth checking for small-population geographies. Treatment: Use as-is for modelling; optionally winsorize the 58 outliers and verify zero-valued rows. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,146
min
0
max
64.96
mean
36.44
median
37.36
std
10.01
q1
30.67
q3
43.48
iqr
12.81
skew
-0.5673
kurtosis
0.5032
n_outliers
58
outlier_rate
0.018
zero_rate
0.002483

pct_rent_burdened_50

numeric feature
This column reports the percentage of households that are severely rent-burdened (spending 50%+ of income on rent), with values ranging from 0.0 to 64.96 and a mean of 17.35 closely matching the median of 17.62. The distribution is remarkably symmetric (skew 0.054) and near-normal in shape, with only 47 outliers (1.46%) and a small zero rate of 0.93%. The tight IQR of 8.56 around a median near 17.6 suggests most geographies cluster in a narrow band of severe rent burden. Treatment: Use directly as a numeric feature; no transform needed given near-symmetric distribution. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,769
min
0
max
64.96
mean
17.35
median
17.62
std
6.577
q1
13.07
q3
21.63
iqr
8.557
skew
0.05436
kurtosis
0.9823
n_outliers
47
outlier_rate
0.01459
zero_rate
0.009311

median_gross_rent

numeric feature outliers
Numeric column capturing median gross rent in dollars, with 3,222 rows, 983 unique values, and a trivial 0.31% null rate. The distribution is right-skewed (skew 1.76, kurtosis 4.55), running from 297 to 2,805 around a median of 818 and mean of 891, and 225 values (7.0%) flag as outliers on the high end. No zeros are present, so missingness isn't being encoded as 0. Treatment: Log-transform before regression to tame the right skew and high-rent outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
10 (0.3%)
unique
983
min
297
max
2,805
mean
890.9
median
818
std
283.4
q1
718
q3
978
iqr
260
skew
1.763
kurtosis
4.55
n_outliers
225
outlier_rate
0.07005
zero_rate
0

rent_to_income_ratio

numeric feature high_skew
This column reports a rent-to-income ratio, with a typical tenant sitting near 17.06 and an interquartile range of just 4.29 between 15.1 and 19.39. However, the maximum of 1200.0 against a median of 17.06 produces extreme skew (53.98) and kurtosis (3007.07), and 107 values (3.33%) are flagged as outliers. The tight IQR alongside a 21.2 standard deviation indicates a small number of records are orders of magnitude beyond the bulk of the distribution. Treatment: Cap or winsorize extreme values and log-transform before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
9 (0.3%)
unique
1,269
min
6.1
max
1,200
mean
17.89
median
17.06
std
21.2
q1
15.1
q3
19.39
iqr
4.29
skew
53.98
kurtosis
3007
n_outliers
107
outlier_rate
0.0333
zero_rate
0

gini_index

numeric feature
Numeric column holding Gini index values for 3,222 records, all populated and bounded between 0.2744 and 0.721 with a mean of 0.4481 and median 0.4457. Distribution is tight (IQR 0.049375, std 0.0384) with mild right skew (0.4999) and 56 high-side outliers (1.74%) stretching toward 0.721. Values fall in the expected 0–1 range for an inequality coefficient, suggesting a clean, ready-to-use feature. Treatment: Use as-is as a numeric feature; optionally winsorize the 56 upper outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,317
min
0.2744
max
0.721
mean
0.4481
median
0.4457
std
0.03841
q1
0.422
q3
0.4714
iqr
0.04938
skew
0.4999
kurtosis
1.634
n_outliers
56
outlier_rate
0.01738
zero_rate
0

unemployment_rate

numeric feature high_skew
Likely a county/region-level unemployment rate in percent, with values ranging from 0.0 to 31.99 and a median of 4.69. The distribution is heavily right-skewed (skew 2.55, kurtosis 12.81) with 154 outliers (4.78%) pulling the mean (5.13) above the median. A small zero_rate (0.56%) suggests a handful of suspiciously perfect-zero readings worth verifying. Treatment: Log or Yeo-Johnson transform before regression to tame the right-skew, and inspect the zero values. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
950
min
0
max
31.99
mean
5.127
median
4.69
std
2.926
q1
3.42
q3
6.08
iqr
2.66
skew
2.545
kurtosis
12.81
n_outliers
154
outlier_rate
0.0478
zero_rate
0.005587

labor_force_participation

numeric feature
Numeric labor force participation rate, almost certainly expressed as a percentage given the range of 18.63 to 84.04 and mean of 57.89. Distribution is moderately left-skewed (-0.58) with a tight interquartile band of 52.97 to 63.67, and only 38 outliers (1.18%) sit outside the whiskers. No nulls or zeros across 3,222 rows, and 1,944 unique values suggest fine-grained measurements rather than rounded buckets. Treatment: Use as-is in modelling; mild left skew does not require transformation. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,944
min
18.63
max
84.04
mean
57.89
median
58.72
std
8.041
q1
52.97
q3
63.66
iqr
10.7
skew
-0.5766
kurtosis
0.4502
n_outliers
38
outlier_rate
0.01179
zero_rate
0

pct_deep_poverty

numeric feature high_skew outliers
Percentage of population in deep poverty across 3,222 rows, with no nulls and values bounded between 0.0 and 34.7. The distribution is right-skewed (skew 2.67, kurtosis 10.40) with median 5.82 trailing the mean 6.74, and 176 rows (5.5%) flagged as upper-tail outliers. Only 0.09% of rows are zero, so floor effects are minimal despite the long tail. Treatment: Log or Winsorize before linear modelling to dampen the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,131
min
0
max
34.7
mean
6.743
median
5.82
std
4.154
q1
4.27
q3
7.918
iqr
3.648
skew
2.665
kurtosis
10.4
n_outliers
176
outlier_rate
0.05462
zero_rate
0.0009311

pct_poverty

numeric feature high_skew
Likely a county- or area-level poverty rate expressed as a percentage, ranging from 1.6 to 66.32 with a median of 13.55 and mean of 15.10. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with 137 outliers (4.25%) in the heavy upper tail, consistent with a small set of high-poverty areas pulling the mean above the median. No nulls or zeros, and 1719 unique values across 3222 rows suggest fine-grained but repeated measurements. Treatment: Consider a log or sqrt transform before linear modelling to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0

pct_near_poverty

numeric feature
Percentage of population near the poverty line (likely between 100-200% of the federal poverty threshold), reported per record across 3222 rows with no nulls. The distribution centers around a median of 9.38 with an IQR of 4.43, but a right tail pushes the max to 49.14, yielding skew of 1.19 and kurtosis of 5.73. About 2.5% of values (82 rows) fall outside the outlier fence, suggesting a handful of high-poverty areas worth inspecting separately. Treatment: Consider a log or sqrt transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,237
min
0.58
max
49.14
mean
9.813
median
9.38
std
3.644
q1
7.33
q3
11.76
iqr
4.43
skew
1.19
kurtosis
5.729
n_outliers
82
outlier_rate
0.02545
zero_rate
0

pct_hs_or_higher

numeric feature
Percentage of population (likely adults 25+) with a high school diploma or higher, reported per row across 3,222 records. Values are tightly clustered high (mean 88.08, median 89.39, IQR 84.9–92.47) with a left tail reaching down to 33.33, producing skew of -1.33 and 86 low-end outliers (2.67%). No nulls or zeros, and 1,612 unique values suggest a county- or tract-level rate. Treatment: Use as-is for modelling, but consider a reflected log or winsorisation given the left skew and low-end outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,612
min
33.33
max
99.69
mean
88.08
median
89.39
std
5.97
q1
84.9
q3
92.47
iqr
7.567
skew
-1.328
kurtosis
3.742
n_outliers
86
outlier_rate
0.02669
zero_rate
0

pct_bachelors_or_higher

numeric feature
Percent of adults with a bachelor's degree or higher, almost certainly at the county or similar geographic level given n=3222 with no nulls. Values range from 0.0 to 78.87 with median 21.07 and mean 23.50, and the distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 outliers (4.4%) on the high end—consistent with a long tail of highly educated metros above the typical county. Treatment: Consider a log or sqrt transform before linear modelling to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,982
min
0
max
78.87
mean
23.5
median
21.07
std
9.983
q1
16.59
q3
27.85
iqr
11.26
skew
1.357
kurtosis
2.306
n_outliers
141
outlier_rate
0.04376
zero_rate
0.0003104

disability_rate

numeric feature high_skew
This is a numeric disability rate per record, ranging from 0.0 to 9.17 with a median of 1.07 and IQR of 0.65. The distribution is heavily right-skewed (skew 2.17, kurtosis 15.24) with 117 outliers (3.6%) and a small but non-trivial 1.7% zeros. Only 305 unique values across 3,222 rows suggests the rate is reported at coarse precision or aggregated to a small set of geographies. Treatment: Log- or winsorize-transform before regression to tame the right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
305
min
0
max
9.17
mean
1.145
median
1.07
std
0.6215
q1
0.77
q3
1.42
iqr
0.65
skew
2.167
kurtosis
15.24
n_outliers
117
outlier_rate
0.03631
zero_rate
0.01676