dataset summary · high confidenceanthropic:claude-opus-4-7
This dataset contains 10,302 monthly state-level records (51 states across 102 reporting periods from 201309 to 202510) tracking Medicaid and CHIP enrollment, application processing, eligibility determinations, and call center performance. The headline metric, Total Medicaid and CHIP Enrollment, is nearly complete and ranges from 0 to about 14.46M with a median of roughly 1.03M, while most other operational metrics are heavily right-skewed with substantial outliers. Two things deserve a closer look first: missingness is very uneven — Total Adult Medicaid Enrollment is 84.7% null and the call center metrics are ~69% null, while core enrollment fields are essentially complete — and the 'State Expanded Medicaid' flag splits the panel roughly 73%/27% (Y/N), which is a natural cut for comparison. The Final Report and Preliminary/Updated flags are exactly 50/50, suggesting each record appears in both a preliminary and final form, so deduplication may be needed before aggregation.
citing: row_count · column_count · Total Medicaid and CHIP Enrollment · Total Adult Medicaid Enrollment · Average Call Center Wait Time (Minutes) · Average Call Center Abandonment Rate · State Expanded Medicaid · Final Report · Preliminary or Updated · State Name · Reporting Period
Charts the summary said to look at first
Total Medicaid and CHIP Enrollment
· Headline enrollment metric — note the strong right skew and very large maximum dragging the mean above the median.
Show data table
Histogram bins for Total Medicaid and CHIP Enrollment (median: 1031822.0).
bin
count
0 – 3.616e+05
2640
3.616e+05 – 7.231e+05
1192
7.231e+05 – 1.085e+06
1581
1.085e+06 – 1.446e+06
1149
1.446e+06 – 1.808e+06
1171
1.808e+06 – 2.169e+06
726
2.169e+06 – 2.531e+06
291
2.531e+06 – 2.893e+06
207
2.893e+06 – 3.254e+06
331
3.254e+06 – 3.616e+06
143
3.616e+06 – 3.977e+06
175
3.977e+06 – 4.339e+06
106
4.339e+06 – 4.7e+06
84
4.7e+06 – 5.062e+06
42
5.062e+06 – 5.423e+06
26
5.423e+06 – 5.785e+06
19
5.785e+06 – 6.147e+06
75
6.147e+06 – 6.508e+06
20
6.508e+06 – 6.87e+06
51
6.87e+06 – 7.231e+06
35
7.231e+06 – 7.593e+06
32
7.593e+06 – 7.954e+06
3
7.954e+06 – 8.316e+06
0
8.316e+06 – 8.678e+06
0
8.678e+06 – 9.039e+06
0
9.039e+06 – 9.401e+06
0
9.401e+06 – 9.762e+06
0
9.762e+06 – 1.012e+07
0
1.012e+07 – 1.049e+07
0
1.049e+07 – 1.085e+07
0
1.085e+07 – 1.121e+07
0
1.121e+07 – 1.157e+07
6
1.157e+07 – 1.193e+07
35
1.193e+07 – 1.229e+07
38
1.229e+07 – 1.265e+07
11
1.265e+07 – 1.302e+07
10
1.302e+07 – 1.338e+07
23
1.338e+07 – 1.374e+07
39
1.374e+07 – 1.41e+07
21
1.41e+07 – 1.446e+07
18
State Expanded Medicaid
· About 73% of records come from states that expanded Medicaid, a useful split for any comparative analysis.
Show data table
Top values for State Expanded Medicaid (2 unique shown, of 2 total).
value
count
share
Y
7475
72.6%
N
2827
27.4%
State Name
· All 51 jurisdictions appear exactly 202 times, confirming the panel is balanced across states.
Show data table
Top values for State Name (20 unique shown, of 51 total).
value
count
share
Alaska
202
2.0%
Alabama
202
2.0%
Arkansas
202
2.0%
Arizona
202
2.0%
California
202
2.0%
Colorado
202
2.0%
Connecticut
202
2.0%
District of Columbia
202
2.0%
Delaware
202
2.0%
Florida
202
2.0%
Georgia
202
2.0%
Hawaii
202
2.0%
Iowa
202
2.0%
Idaho
202
2.0%
Illinois
202
2.0%
Indiana
202
2.0%
Kansas
202
2.0%
Kentucky
202
2.0%
Louisiana
202
2.0%
Massachusetts
202
2.0%
Average Call Center Wait Time (Minutes)
· Wait times cluster low (median 5 min) but stretch to 72 minutes — check the long tail and the ~70% missingness before using.
Show data table
Histogram bins for Average Call Center Wait Time (Minutes) (median: 5.0).
bin
count
0 – 1.8
942
1.8 – 3.6
417
3.6 – 5.4
329
5.4 – 7.2
188
7.2 – 9
80
9 – 10.8
116
10.8 – 12.6
168
12.6 – 14.4
103
14.4 – 16.2
106
16.2 – 18
43
18 – 19.8
86
19.8 – 21.6
75
21.6 – 23.4
73
23.4 – 25.2
50
25.2 – 27
20
27 – 28.8
44
28.8 – 30.6
48
30.6 – 32.4
47
32.4 – 34.2
38
34.2 – 36
15
36 – 37.8
28
37.8 – 39.6
28
39.6 – 41.4
6
41.4 – 43.2
15
43.2 – 45
12
45 – 46.8
14
46.8 – 48.6
10
48.6 – 50.4
2
50.4 – 52.2
4
52.2 – 54
2
54 – 55.8
0
55.8 – 57.6
10
57.6 – 59.4
12
59.4 – 61.2
3
61.2 – 63
2
63 – 64.8
2
64.8 – 66.6
4
66.6 – 68.4
0
68.4 – 70.2
0
70.2 – 72
3
Reporting Period
· Coverage spans 201309–202510; check the distribution to see whether earlier periods are thinner than recent ones.
Show data table
Histogram bins for Reporting Period (median: 202107.5).
bin
count
2.013e+05 – 2.013e+05
51
2.013e+05 – 2.014e+05
0
2.014e+05 – 2.014e+05
0
2.014e+05 – 2.014e+05
0
2.014e+05 – 2.015e+05
0
2.015e+05 – 2.015e+05
0
2.015e+05 – 2.015e+05
0
2.015e+05 – 2.015e+05
0
2.015e+05 – 2.016e+05
0
2.016e+05 – 2.016e+05
0
2.016e+05 – 2.016e+05
0
2.016e+05 – 2.017e+05
0
2.017e+05 – 2.017e+05
0
2.017e+05 – 2.017e+05
714
2.017e+05 – 2.018e+05
0
2.018e+05 – 2.018e+05
0
2.018e+05 – 2.018e+05
1224
2.018e+05 – 2.018e+05
0
2.018e+05 – 2.019e+05
0
2.019e+05 – 2.019e+05
918
2.019e+05 – 2.019e+05
306
2.019e+05 – 2.02e+05
0
2.02e+05 – 2.02e+05
0
2.02e+05 – 2.02e+05
1224
2.02e+05 – 2.021e+05
0
2.021e+05 – 2.021e+05
0
2.021e+05 – 2.021e+05
1224
2.021e+05 – 2.021e+05
0
2.021e+05 – 2.022e+05
0
2.022e+05 – 2.022e+05
918
2.022e+05 – 2.022e+05
306
2.022e+05 – 2.023e+05
0
2.023e+05 – 2.023e+05
0
2.023e+05 – 2.023e+05
1224
2.023e+05 – 2.024e+05
0
2.024e+05 – 2.024e+05
0
2.024e+05 – 2.024e+05
1224
2.024e+05 – 2.024e+05
0
2.024e+05 – 2.025e+05
0
2.025e+05 – 2.025e+05
969
Schema
44 columns
Per-column summary. Click column name to jump to its detail.
Two-letter US state abbreviation, with 51 distinct values (the 50 states plus DC) covering all 10,302 rows without nulls. The distribution is perfectly uniform: every state appears exactly 202 times, and entropy_ratio is 1.0, indicating the dataset was constructed as a balanced panel across states rather than sampled from real-world population.
Treatment: One-hot or target-encode for modelling; can also serve as a join key to state-level reference tables.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
51
top_value
AK
top_rate
0.01961
cardinality
51
entropy
5.672
entropy_ratio
1
State Name
categoricalfeature
This column lists US state names, with 51 unique values (the 50 states plus District of Columbia) and zero nulls across 10,302 rows. The distribution is perfectly uniform: every state appears exactly 202 times, yielding a top_rate of 0.0196 and entropy_ratio of 1.0. That balance suggests the dataset was constructed as a state-by-period grid rather than sampled from real-world activity.
Treatment: Use as a categorical grouping key; one-hot or target-encode for modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
51
top_value
Alaska
top_rate
0.01961
cardinality
51
entropy
5.672
entropy_ratio
1
Reporting Period
numerictimestamp
Values like 202510, 202107, 201309 in YYYYMM form indicate this is a reporting period encoded as an integer year-month, not a true numeric measure. The range spans 201309 to 202510 with 102 unique periods across 10,302 rows and no nulls, consistent with monthly snapshots over roughly a decade. Summary stats like mean 202112.5 and std 249.6 are arithmetic artefacts of the YYYYMM encoding and should not be interpreted as a distribution.
Treatment: parse YYYYMM into a proper date and treat as a temporal key, not a numeric feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
102
min
201,309
max
202,510
mean
2.021e+05
median
2.021e+05
std
249.6
q1
201,906
q3
202,309
iqr
403
skew
-0.1301
kurtosis
-0.8155
n_outliers
0
outlier_rate
0
zero_rate
0
State Expanded Medicaid
categoricalfeature
Binary Y/N flag indicating whether the state expanded Medicaid, fully populated across all 10,302 rows. The distribution is moderately imbalanced, with 'Y' covering 72.6% (7,475) of rows versus 2,827 'N'. With only 2 categories and no nulls, this is a clean feature.
Treatment: Encode as a 0/1 binary indicator for modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
2
top_value
Y
top_rate
0.7256
cardinality
2
entropy
0.8477
entropy_ratio
0.8477
Preliminary or Updated
categoricalmetadata
Binary flag distinguishing preliminary versus updated records, taking values 'P' and 'U'. The split is exactly even at 5151 each, yielding maximum entropy (1.0) and a top_rate of 0.5 — this perfect balance suggests every event appears once in each state rather than being a natural distribution.
Treatment: Use to filter to one revision state (likely 'U') before analysis to avoid double-counting.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
2
top_value
U
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
Final Report
categoricallabel
Binary Y/N flag indicating whether a final report exists, with exactly 10302 rows split evenly into 5151 Y and 5151 N. The perfect 50/50 balance (entropy 1.0, top_rate 0.5) is unusual for organic data and suggests deliberate sampling or stratification.
Treatment: Use directly as a binary target; encode Y/N to 1/0.high · anthropic:claude-opus-4-7
n
10,302
nulls
0 (0.0%)
unique
2
top_value
Y
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
New Applications Submitted to Medicaid and CHIP Agencies
numericfeaturehigh_skewoutliers
Counts of new Medicaid/CHIP applications submitted to state agencies, reported across 10,302 rows. The distribution is heavily right-skewed (skew 4.08, kurtosis 23.5) with a median of 14,644 but a max of 733,651, and 11.5% of values flagged as outliers. About 4% of rows are zero and 0.5% are null, suggesting either non-reporting periods or genuinely inactive agencies.
Treatment: log1p-transform before modelling to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
5,378
min
0
max
733,651
mean
2.99e+04
median
14,644
std
4.911e+04
q1
4508
q3
29,970
iqr
2.546e+04
skew
4.077
kurtosis
23.55
n_outliers
1,175
outlier_rate
0.1146
zero_rate
0.04019
New Applications Submitted to Medicaid and CHIP Agencies - footnotes
categoricalmetadatanull_rate
This is a footnote/qualifier column annotating the 'New Applications Submitted to Medicaid and CHIP Agencies' metric, explaining caveats like inclusion of renewals, administrative data transfers, or excluded application types. It is sparse with a 76.6% null rate, leaving only 18 distinct footnote strings across 10,302 rows; the most common, 'Includes Renewals and/or Redeterminations', covers 27% of non-null entries. Entropy ratio of 0.84 across just 18 categories indicates the present footnotes are spread fairly evenly, and several values are concatenations of multiple caveats separated by semicolons.
Treatment: Treat as qualifier metadata: split semicolon-separated flags into binary indicators and use to caveat or filter the main metric rather than as a feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,891 (76.6%)
unique
18
top_value
Includes Renewals and/or Redeterminations
top_rate
0.27
cardinality
18
entropy
3.502
entropy_ratio
0.8399
Applications for Financial Assistance Submitted to the State Based Marketplace
numericfeaturehigh_skewoutliers
This column appears to count applications for financial assistance submitted to a State Based Marketplace, recorded as a numeric tally per row. The distribution is dominated by zeros (zero_rate 0.77, median 0, IQR 0), yet the mean is 11228.57 and the max reaches 762069.0, producing extreme skew (8.41) and kurtosis (82.64). Roughly 23% of rows (2357) flag as outliers, so the non-zero tail carries nearly all the signal.
Treatment: Log1p-transform and consider a zero-vs-nonzero indicator before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
1,373
min
0
max
762,069
mean
1.123e+04
median
0
std
5.539e+04
q1
0
q3
0
iqr
0
skew
8.415
kurtosis
82.64
n_outliers
2,357
outlier_rate
0.2299
zero_rate
0.7701
Applications for Financial Assistance Submitted to the State Based Marketplace - footnotes
categoricalmetadatanull_rate
This is a footnotes/qualifier column annotating SBM application counts, with only 3 distinct caveat strings observed. It is 97.43% null, and among the 265 populated rows, 83.0% carry the single note "Includes Renewals and/or Redeterminations". The other two values layer on duplicates and Medicaid-coverage caveats that materially affect comparability of the underlying counts.
Treatment: Keep as a data-quality flag joined to the metric; do not model directly.high · anthropic:claude-opus-4-7
n
10,302
nulls
10,037 (97.4%)
unique
3
top_value
Includes Renewals and/or Redeterminations
top_rate
0.8302
cardinality
3
entropy
0.8241
entropy_ratio
0.52
Total Applications for Financial Assistance Submitted at State Level
numericfeaturehigh_skewoutliers
This is a numeric count of financial-assistance applications submitted at the state level, with 10,302 records and 5,591 unique values. The distribution is heavily right-skewed (skew 4.40, kurtosis 26.40): the median is 18,257 but the mean is 41,125 and the max reaches 762,069, with 12.2% of rows flagged as outliers. About 2% of values are zero and only 0.5% are null.
Treatment: Log-transform (log1p) before modelling to tame the heavy right skew and outliers.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
5,591
min
0
max
762,069
mean
4.113e+04
median
18,257
std
7.208e+04
q1
6,490
q3
40,107
iqr
33,617
skew
4.396
kurtosis
26.4
n_outliers
1,249
outlier_rate
0.1218
zero_rate
0.02039
Total Applications for Financial Assistance Submitted at State Level - footnotes
categoricalmetadatanull_rate
Free-form footnote annotations qualifying state-level financial assistance application counts, drawn from a small controlled vocabulary of 17 caveat strings (often concatenated with semicolons). 73.44% of rows are null, and among the 2,737 populated rows the single value "Includes Renewals and/or Redeterminations" covers 42.5% (1,163 rows), indicating these are methodology caveats rather than data values. Entropy ratio of 0.73 shows the non-null distribution is moderately spread across the 17 caveat combinations.
Treatment: Keep as a qualifier for interpreting the paired count column; split on ';' into caveat flags rather than modelling as a single category.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,566 (73.4%)
unique
17
top_value
Includes Renewals and/or Redeterminations
top_rate
0.4251
cardinality
17
entropy
2.977
entropy_ratio
0.7283
Individuals Determined Eligible for Medicaid at Application
numericfeaturehigh_skewoutliers
This is a numeric count of individuals deemed Medicaid-eligible at application, likely aggregated by some reporting unit (state, month, or office) given 10,302 rows and 5,568 unique values. The distribution is heavily right-skewed (skew 2.93, kurtosis 11.06): the median is 11,008 but the mean is 27,437 and the max reaches 435,560, with 10.25% of rows flagged as outliers. About 5.86% of values are zero and 0.5% are null, so empty reporting periods are present but rare.
Treatment: Log-transform (or log1p, since zeros exist) before any regression or distance-based modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
5,568
min
0
max
435,560
mean
2.744e+04
median
11,008
std
4.159e+04
q1
4,344
q3
32,631
iqr
28,287
skew
2.932
kurtosis
11.06
n_outliers
1,051
outlier_rate
0.1025
zero_rate
0.05863
Individuals Determined Eligible for Medicaid at Application - footnotes
categoricalmetadatanull_rate
Footnote/caveat column annotating the Medicaid eligibility count, with 18 distinct semicolon-joined qualifier strings such as 'Includes Renewals and/or Redeterminations' (28.1% of non-nulls) and 'Does Not Include All Medicaid Determinations Made At Application'. 72.8% of rows are null, triggering the null_rate alert, meaning most reporters submitted no caveat. Entropy ratio of 0.82 across only 18 categories shows the qualifiers are spread fairly evenly rather than dominated by one note, and several values are compound flags that would need splitting to analyze cleanly.
Treatment: Split on '; ' into binary caveat flags and use to qualify or filter the paired count column; do not model directly.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,500 (72.8%)
unique
18
top_value
Includes Renewals and/or Redeterminations
top_rate
0.2812
cardinality
18
entropy
3.41
entropy_ratio
0.8178
Individuals Determined Eligible for CHIP at Application
numerichigh_skewoutliers
n
10,302
nulls
51 (0.5%)
unique
3,064
min
0
max
44,881
mean
2375
median
679
std
4296
q1
142
q3
2,418
iqr
2,276
skew
3.31
kurtosis
13.83
n_outliers
1,295
outlier_rate
0.1263
zero_rate
0.1137
Individuals Determined Eligible for CHIP at Application - footnotes
categoricalmetadatanull_rate
Free-text footnote annotations qualifying CHIP eligibility counts, populated only when a state needs to caveat its figures. The column is 90.99% null and only 7 distinct notes appear across 10,302 rows, with 'Includes Renewals and/or Redeterminations' covering 46.55% of the non-null cases. These caveats materially change comparability — some rows count households not individuals, others include presumptive or conditional determinations.
Treatment: Retain as a caveat flag joined to the eligibility count column; do not use as a modelling feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,374 (91.0%)
unique
7
top_value
Includes Renewals and/or Redeterminations
top_rate
0.4655
cardinality
7
entropy
1.939
entropy_ratio
0.6905
Total Medicaid and CHIP Determinations
numericfeaturehigh_skewoutliers
Numeric counts of total Medicaid and CHIP determinations per record, likely a state-month or state-period rollup. The distribution is heavily right-skewed (skew 2.92, kurtosis 10.77) with a median of 11,977 sitting far below the mean of 29,811 and a max of 467,780, and 10.5% of values flagged as outliers. About 5.5% of rows are zeros and 0.5% are null, which is worth checking against reporting periods or non-reporting states.
Treatment: Log-transform (after handling zeros) before any regression or distance-based modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
5,587
min
0
max
467,780
mean
2.981e+04
median
11,977
std
4.535e+04
q1
4738
q3
35,059
iqr
3.032e+04
skew
2.922
kurtosis
10.77
n_outliers
1,076
outlier_rate
0.105
zero_rate
0.05492
Total Medicaid and CHIP Determinations - footnotes
categoricalmetadatanull_rate
Free-text footnote field qualifying the Medicaid/CHIP determinations metric, with 12 distinct semicolon-delimited caveat strings drawn from a small controlled vocabulary. 81.58% of rows are null, and among the 18% populated, 47.10% carry the single caveat 'Includes Renewals and/or Redeterminations'. Most other values are compound caveats concatenated with semicolons, signalling these are flags about how each state reported the count rather than analytic content.
Treatment: Split on '; ' into boolean caveat flags and use to qualify or filter the paired determinations metric.high · anthropic:claude-opus-4-7
n
10,302
nulls
8,404 (81.6%)
unique
12
top_value
Includes Renewals and/or Redeterminations
top_rate
0.471
cardinality
12
entropy
2.42
entropy_ratio
0.675
Medicaid and CHIP Child Enrollment
numericfeaturehigh_skewoutliers
This column reports Medicaid and CHIP child enrollment counts, ranging from 0 to 5,339,904 with a mean of 740,683 and median of 511,370. The distribution is severely right-skewed (skew 2.80, kurtosis 8.85) with 804 outliers (7.84%) and a 2.13% zero rate, consistent with a few large states dominating while smaller jurisdictions cluster low. Null rate is negligible (0.5%).
Treatment: log-transform before regression to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
8,094
min
0
max
5.34e+06
mean
7.407e+05
median
511,370
std
9.294e+05
q1
156,319
q3
836,511
iqr
680,192
skew
2.799
kurtosis
8.851
n_outliers
804
outlier_rate
0.07843
zero_rate
0.02127
Medicaid and CHIP Child Enrollment - footnotes
categoricalmetadatanull_rate
This is a footnotes/caveat column annotating Medicaid and CHIP child enrollment figures, with only 7 distinct values across 10,302 rows. It is overwhelmingly null (92.19%), and among the populated rows nearly half (49.57%) carry the same caveat about counting individuals enrolled at any time in the month rather than a point-in-time. The second most common note flags states 'Unable to provide data due to system limitations' (218 rows), which is a meaningful data-quality signal worth surfacing alongside any enrollment analysis.
Treatment: Keep as a qualitative flag joined to enrollment rows; do not model directly.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,497 (92.2%)
unique
7
top_value
Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)
top_rate
0.4957
cardinality
7
entropy
1.854
entropy_ratio
0.6605
Total Medicaid and CHIP Enrollment
numericfeaturehigh_skewoutliers
State- or region-level Medicaid and CHIP enrollment counts, with a median of 1,031,822 and a max of 14,462,560 stretching the distribution well past the mean of 1,567,204. Skew of 3.67 and kurtosis of 16.57 confirm a long right tail, and 692 rows (6.7%) flag as outliers — consistent with a few very large states dominating. Near-zero null and zero rates mean the column is essentially fully populated.
Treatment: log-transform before regression to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
2 (0.0%)
unique
8,309
min
0
max
1.446e+07
mean
1.567e+06
median
1.032e+06
std
2.054e+06
q1
349,321
q3
1.805e+06
iqr
1.456e+06
skew
3.67
kurtosis
16.57
n_outliers
692
outlier_rate
0.06718
zero_rate
0.0002913
Total Medicaid and CHIP Enrollment - footnotes
categoricalmetadatanull_rate
Free-text footnote qualifiers attached to state-month Medicaid/CHIP enrollment figures, explaining methodology caveats. The column is 93.11% null and only 710 rows carry any of 9 distinct notes; among those, 56.19% read 'Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)', flagging that the headline counts are not point-in-time and so are not directly comparable across rows.
Treatment: Keep as a methodology flag; binarize or one-hot the few recurring notes when filtering comparable enrollment values.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,592 (93.1%)
unique
9
top_value
Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)
top_rate
0.562
cardinality
9
entropy
1.832
entropy_ratio
0.578
Total Medicaid Enrollment
numericfeaturehigh_skewoutliers
Numeric tally of Medicaid enrollees, almost certainly aggregated by state-month or similar jurisdiction-period grain given 10302 rows and 8221 unique values. The distribution is severely right-skewed (skew 3.61, kurtosis 16.1) with the mean (1,432,519) sitting well above the median (949,244) and a max of 13,160,563 dwarfing Q3 of 1,646,917. About 6.7% of rows (691) flag as outliers, while nulls are negligible (0.5%) and zeros essentially absent.
Treatment: Log-transform before any regression or distance-based modelling to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
8,221
min
0
max
1.316e+07
mean
1.433e+06
median
949,244
std
1.865e+06
q1
319,105
q3
1.647e+06
iqr
1.328e+06
skew
3.613
kurtosis
16.11
n_outliers
691
outlier_rate
0.06741
zero_rate
0.0002927
Total Medicaid Enrollment - footnotes
categoricalmetadatanull_rate
This is a footnote/annotation field qualifying Total Medicaid Enrollment figures, with only 7 distinct caveat strings across 10,302 rows. It is empty 93.21% of the time, and when populated, 57% of values are the single caveat 'Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)'. The remaining notes flag methodology variations like retroactive enrollments, limited-benefit inclusions, or system limitations preventing reporting.
Treatment: Keep as a qualitative caveat flag; binarize as 'has_footnote' or one-hot the 7 categories rather than treating as a feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,602 (93.2%)
unique
7
top_value
Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)
top_rate
0.57
cardinality
7
entropy
1.725
entropy_ratio
0.6144
Total CHIP Enrollment
numericfeaturehigh_skewoutliers
Numeric column capturing total CHIP (Children's Health Insurance Program) enrollment counts, likely per state-month or similar administrative unit. The distribution is heavily right-skewed (skew 3.90, kurtosis 18.21) with the mean (136,663) far above the median (78,933) and a max of 1,317,347 versus a Q3 of 171,960. About 5.3% of rows (542) are flagged outliers, and nulls and zeros are negligible.
Treatment: log-transform before regression to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
51 (0.5%)
unique
7,918
min
0
max
1.317e+06
mean
1.367e+05
median
78,933
std
2.026e+05
q1
2.605e+04
q3
171,960
iqr
1.459e+05
skew
3.897
kurtosis
18.21
n_outliers
542
outlier_rate
0.05287
zero_rate
0.0002927
Total CHIP Enrollment - footnotes
categoricalmetadatanull_rate
This is a sparse footnote/qualifier column accompanying CHIP enrollment counts, with only 4 distinct annotation strings across 10,302 rows. It is null 95.92% of the time, and when present is dominated (76.0%) by "Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)". The remaining notes flag retroactive enrollments, incomplete coverage, or system-limitation gaps — caveats that materially affect comparability of the enrollment figures.
Treatment: Keep as a caveat flag joined to the enrollment metric; do not use as a model feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,882 (95.9%)
unique
4
top_value
Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)
top_rate
0.7595
cardinality
4
entropy
1.033
entropy_ratio
0.5167
Total Adult Medicaid Enrollment
numericfeaturenull_ratehigh_skew
Numeric tally of total adult Medicaid enrollees, almost certainly aggregated by state and reporting period given the multi-million maximum (8,497,290) and median of 551,447. The column is sparsely populated—84.65% null—and heavily right-skewed (skew 4.56, kurtosis 23.4), with 62 outliers and a standard deviation (1,268,819) larger than the mean (810,066). Only 1,415 unique values across 10,302 rows suggests repeated state-level totals rather than per-person records.
Treatment: Impute or filter the 84.65% nulls and log-transform before any regression to tame the skew.high · anthropic:claude-opus-4-7
n
10,302
nulls
8,721 (84.7%)
unique
1,415
min
0
max
8.497e+06
mean
8.101e+05
median
551,447
std
1.269e+06
q1
168,141
q3
951,232
iqr
783,091
skew
4.556
kurtosis
23.38
n_outliers
62
outlier_rate
0.03922
zero_rate
0.001898
Total Adult Medicaid Enrollment - footnotes
categoricalmetadatanull_rate
This is a footnote/qualifier column for the Total Adult Medicaid Enrollment metric, carrying free-text caveats about how a state's count was constructed. It is overwhelmingly empty (98.79% null) with only 4 distinct notes across 10,302 rows; the dominant note (56% of non-null) flags that counts include anyone enrolled at any time in the month rather than a point-in-time snapshot. A small but notable 3 rows admit the data could not be provided due to system limitations.
Treatment: Keep as a qualitative caveat flag joined to the enrollment column; do not use as a model feature.high · anthropic:claude-opus-4-7
n
10,302
nulls
10,177 (98.8%)
unique
4
top_value
Includes Individuals Enrolled At Any Time in Month (Not a Point-in-Time Count)
top_rate
0.56
cardinality
4
entropy
1.382
entropy_ratio
0.6908
Total Medicaid and CHIP Determinations Processed in Less than 24 Hours
numericfeaturenull_ratehigh_skewoutliers
Counts of Medicaid/CHIP determinations processed in under 24 hours, reported per state-month or similar reporting unit. The distribution is severely right-skewed (skew 6.44, kurtosis 47.28) with a median of 3,470 but a max of 791,175 and 699 outliers (11.9%). Notably, 43.1% of rows are null, suggesting many reporters did not submit this metric.
Treatment: Log-transform and impute or flag missingness before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
4,437 (43.1%)
unique
2,701
min
0
max
791,175
mean
2.224e+04
median
3,470
std
7.333e+04
q1
932
q3
12,172
iqr
11,240
skew
6.441
kurtosis
47.28
n_outliers
699
outlier_rate
0.1192
zero_rate
0.02148
Total Medicaid and CHIP Determinations Processed in Less than 24 Hours - footnotes
categoricalmetadatanull_rate
Footnote/caveat field annotating the '<24 hour Medicaid/CHIP determinations' metric, capturing reporting caveats reported by states. It is mostly empty (89.21% null) and only 20 distinct strings appear across 10302 rows; when present, the dominant caveat is 'Incorrectly reporting processing time at application level, as opposed to the individual level' at 37.77% of non-nulls. Several values are semicolon-concatenated combinations of base caveats, so the 20 categories are not independent.
Treatment: Keep as a data-quality flag joined to the metric; split on ';' into boolean caveat indicators rather than modelling as a single category.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,190 (89.2%)
unique
20
top_value
Incorrectly reporting processing time at application level, as opposed to the individual level
top_rate
0.3777
cardinality
20
entropy
2.968
entropy_ratio
0.6867
Total Medicaid and CHIP Determinations Processed Between 24 Hours and 7 Days
numericfeaturenull_ratehigh_skewoutliers
This is a state/period-level count of Medicaid and CHIP eligibility determinations completed within a 24-hour to 7-day window. The distribution is heavily right-skewed (skew 4.39, kurtosis 29.5) with a median of 2,312 but a max of 133,996, and 10.3% of values flag as outliers. Notably, 43.1% of rows are null, suggesting many reporting periods or jurisdictions did not submit this metric.
Treatment: Log-transform and impute or flag missingness before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
4,437 (43.1%)
unique
2,559
min
0
max
133,996
mean
5465
median
2,312
std
9481
q1
723
q3
5,802
iqr
5,079
skew
4.392
kurtosis
29.55
n_outliers
602
outlier_rate
0.1026
zero_rate
0.02677
Total Medicaid and CHIP Determinations Processed Between 24 Hours and 7 Days - footnotes
categoricalmetadatanull_rate
This is a footnotes/caveats column accompanying a Medicaid/CHIP determinations metric, holding free-form annotations about reporting irregularities. It is empty 89.21% of the time, and among the 1,111 populated rows just 21 distinct messages appear, with 'Incorrectly reporting processing time at application level...' covering 37.77% of non-nulls. Notably, several entries are semicolon-joined composites of multiple caveats, suggesting the field is a concatenation rather than a clean code.
Treatment: Keep as a qualitative caveat flag; split on ';' and one-hot the distinct footnote codes if used as features.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,190 (89.2%)
unique
21
top_value
Incorrectly reporting processing time at application level, as opposed to the individual level
top_rate
0.3777
cardinality
21
entropy
3.026
entropy_ratio
0.689
Total Medicaid and CHIP Determinations Processed Between 8 Days and 30 Days
numericfeaturenull_ratehigh_skewoutliers
Numeric count of Medicaid/CHIP determinations processed within an 8-30 day window, likely reported per state-month or similar reporting unit. The distribution is heavily right-skewed (skew 3.98, kurtosis 19.86) with a median of 2,528 but a max of 155,529, and 10.2% of values flagged as outliers. Note that 43.1% of rows are null and 2.6% are exactly zero, so coverage is partial before any modelling.
Treatment: Impute or filter the 43% nulls, then log-transform to tame the heavy right tail before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
4,437 (43.1%)
unique
2,608
min
0
max
155,529
mean
7967
median
2,528
std
1.528e+04
q1
624
q3
7,866
iqr
7,242
skew
3.975
kurtosis
19.86
n_outliers
601
outlier_rate
0.1025
zero_rate
0.02643
Total Medicaid and CHIP Determinations Processed Between 8 Days and 30 Days - footnotes
categoricalmetadatanull_rate
Free-text footnote field qualifying caveats on the 8–30 day Medicaid/CHIP determination counts; 89.26% of the 10,302 rows are null, so footnotes are the exception not the rule. Among the 1,106 populated rows there are only 16 distinct values (entropy 2.82, ratio 0.70), with 39.60% being the single caveat 'Incorrectly reporting processing time at application level, as opposed to the individual level' — a sizeable data-quality warning on the underlying metric. Several top values are semicolon-concatenated combinations, indicating the field stacks multiple caveats per row.
Treatment: Keep as a caveat flag on the paired metric; split on ';' to derive boolean indicators rather than modelling directly.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,196 (89.3%)
unique
16
top_value
Incorrectly reporting processing time at application level, as opposed to the individual level
top_rate
0.396
cardinality
16
entropy
2.818
entropy_ratio
0.7045
Total Medicaid and CHIP Determinations Processed between 31 days and 45 days
numericfeaturenull_ratehigh_skewoutliers
This is a numeric operational metric counting Medicaid and CHIP determinations processed in a 31-45 day window, likely reported per state-month. The distribution is heavily right-skewed (skew 4.90, kurtosis 31.95) with a median of 693 but a mean of 2917 and max of 81475, indicating a few jurisdictions with very large backlogs. Notable concerns: 43.07% of values are null, 6.63% are zero, and 12.75% are flagged as outliers.
Treatment: Investigate the 43% nulls (likely non-reporters) and log-transform before modelling to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
4,437 (43.1%)
unique
1,923
min
0
max
81,475
mean
2917
median
693
std
6725
q1
106
q3
2,322
iqr
2,216
skew
4.899
kurtosis
31.95
n_outliers
748
outlier_rate
0.1275
zero_rate
0.06633
Total Medicaid and CHIP Determinations Processed between 31 days and 45 days - footnotes
categoricalmetadatanull_rate
This is a footnote/caveat column annotating data quality issues for the 31-45 day Medicaid/CHIP determinations metric. It's null 89.26% of the time, meaning footnotes apply to only ~11% of rows, but among those the top caveat — 'Incorrectly reporting processing time at application level, as opposed to the individual level' — covers 46.11% (510 records), suggesting a systemic reporting inconsistency. The 15 unique values include compound entries joined by semicolons, indicating multiple caveats can co-occur on a single row.
Treatment: Keep as a qualitative caveat flag; split on '; ' into boolean indicators if you need to filter unreliable rows from the main metric.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,196 (89.3%)
unique
15
top_value
Incorrectly reporting processing time at application level, as opposed to the individual level
top_rate
0.4611
cardinality
15
entropy
2.547
entropy_ratio
0.652
Total Medicaid and CHIP Determinations Processed in More than 45 Days
numericfeaturenull_ratehigh_skewoutliers
Counts of Medicaid/CHIP applications processed beyond the 45-day SLA, reported per state-month or similar reporting unit. The distribution is severely right-skewed (skew 5.26, kurtosis 37.4): the median is 395 but the mean is 3,027 and the max reaches 106,943, with 15.4% flagged as outliers. Note that 43% of rows are null and 10% are exact zeros, so coverage is uneven across reporters.
Treatment: Log1p-transform and impute or flag the 43% nulls before any modelling or aggregation.high · anthropic:claude-opus-4-7
n
10,302
nulls
4,437 (43.1%)
unique
1,691
min
0
max
106,943
mean
3028
median
395
std
8135
q1
90
q3
1,545
iqr
1,455
skew
5.264
kurtosis
37.42
n_outliers
902
outlier_rate
0.1538
zero_rate
0.1001
Total Medicaid and CHIP Determinations Processed in More than 45 Days - footnotes
categoricalmetadatanull_rate
Footnote/caveat field annotating data-quality issues with the '>45 days' Medicaid/CHIP determinations metric. Sparse - 89.26% null - and dominated by one caveat ('Incorrectly reporting processing time at application level...') at 46.11% of the 1,104 non-null rows, with only 15 distinct values that include semicolon-concatenated combinations. Entropy ratio of 0.652 confirms a skewed but multi-category caveat vocabulary worth preserving for data-quality filtering.
Treatment: Keep as a data-quality flag; split on '; ' into a multi-hot caveat indicator and exclude or down-weight affected rows in downstream metric analysis.high · anthropic:claude-opus-4-7
n
10,302
nulls
9,196 (89.3%)
unique
15
top_value
Incorrectly reporting processing time at application level, as opposed to the individual level
top_rate
0.4611
cardinality
15
entropy
2.547
entropy_ratio
0.652
Total Call Center Volume (Number of Calls)
numericfeaturenull_ratehigh_skewoutliers
Numeric volume metric capturing total calls handled by a call center, reported per row (likely per period or center). The column is sparsely populated with a 69.49% null rate and only 1,592 unique values across 10,302 rows. Distribution is heavily right-skewed (skew 4.66, kurtosis 25.11) with the mean (172,294.63) more than double the median (73,754) and a max of 2,615,575 against a Q3 of 180,553, plus 223 flagged outliers.
Treatment: Impute or flag the 69% missing rows and log-transform before modelling to tame the heavy right tail.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,159 (69.5%)
unique
1,592
min
5,750
max
2.616e+06
mean
1.723e+05
median
73,754
std
3.191e+05
q1
3.132e+04
q3
180,553
iqr
1.492e+05
skew
4.656
kurtosis
25.11
n_outliers
223
outlier_rate
0.07095
zero_rate
0
Total Call Center Volume (Number of Calls) - footnotes
categoricalmetadatanull_rate
This column holds free-form footnotes qualifying the Total Call Center Volume metric, with 27 distinct semicolon-joined caveats describing what each call count does or doesn't include. 70.96% of rows are null, meaning most volume figures carry no caveat, but among the 2,994 annotated rows the disclaimers are spread broadly (entropy ratio 0.787) with the most common note ('Does not include all calls received after business hours; Includes calls for other benefit programs') covering only 20.09% of non-nulls. The recurring themes — excluded after-hours calls, bundled benefit programs, and live-agent-only counts — signal that the underlying volume metric is not comparable across rows without parsing these flags.
Treatment: Split on '; ' into binary caveat flags and use them to gate or adjust comparisons of the call-volume metric.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,310 (71.0%)
unique
27
top_value
Does not include all calls received after business hours; Includes calls for other benefit programs
top_rate
0.2009
cardinality
27
entropy
3.742
entropy_ratio
0.787
Average Call Center Wait Time (Minutes)
numericfeaturenull_rate
This column captures the average call-center wait time in minutes per record, but it's only populated for ~30% of rows (null_rate 0.6947). Values range from 0 to 72 minutes with a median of 5 and mean of 9.95, showing a strong right skew (1.78) and 11.6% zeros. About 145 records (4.6%) are flagged as outliers, consistent with a long tail of unusually long waits.
Treatment: Impute or flag missingness, then log-transform to tame the right skew before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,157 (69.5%)
unique
63
min
0
max
72
mean
9.945
median
5
std
12.2
q1
1
q3
15
iqr
14
skew
1.78
kurtosis
3.29
n_outliers
145
outlier_rate
0.0461
zero_rate
0.1161
Average Call Center Wait Time (Minutes) - footnotes
categoricalmetadatanull_rate
This is a footnote/caveat field qualifying the 'Average Call Center Wait Time (Minutes)' metric, holding semicolon-joined methodological notes (e.g., callbacks included, after-hours exclusions, other benefit programs counted). 69.73% of rows are null, yet across the 30.27% populated there are 44 distinct combinations with very high entropy ratio (0.918) and no dominant value—the top combination only covers 7.89%. The notes themselves reveal non-comparable measurement methods across rows, which undermines straight comparisons of the wait-time metric.
Treatment: Split on '; ' into boolean flags per caveat and use to stratify or adjust comparisons of the wait-time metric.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,184 (69.7%)
unique
44
top_value
Call centers offer callbacks; Includes calls for other benefit programs; Includes only calls transferred to a live agent
top_rate
0.0789
cardinality
44
entropy
5.013
entropy_ratio
0.9183
Average Call Center Abandonment Rate
numericfeaturenull_rate
This is a numeric operational metric capturing call center abandonment rate, ranging from 0.0 to 0.652 with a median of 0.088 and mean of 0.132. The distribution is right-skewed (skew 1.22) with 54 outliers (1.7%) and a heavy null rate of 69.49%, meaning only about 30% of rows carry a value.
Treatment: Impute or add a missingness flag given 69% nulls, then consider a log or sqrt transform before modelling.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,159 (69.5%)
unique
408
min
0
max
0.652
mean
0.1321
median
0.088
std
0.1291
q1
0.024
q3
0.2115
iqr
0.1875
skew
1.222
kurtosis
1.146
n_outliers
54
outlier_rate
0.01718
zero_rate
0.0009545
Average Call Center Abandonment Rate - footnotes
categoricalmetadatanull_rate
Free-text footnotes qualifying the call-center abandonment-rate metric, listing methodological caveats (e.g., excluded after-hours calls, included other-benefit calls, live-agent-only transfers) often concatenated with semicolons. 69.73% of rows are null, meaning most records carry no caveat, while the 30% that do spread across 36 distinct combinations with high entropy ratio 0.833 and no dominant value (top share only 12.54%). The combinatorial pattern suggests these are composed from a small set of base caveats rather than free prose.
Treatment: Split on '; ' into a multi-hot caveat flag set rather than treating as a single category.high · anthropic:claude-opus-4-7
n
10,302
nulls
7,184 (69.7%)
unique
36
top_value
Does not include all calls received by call centers; Does not include all calls received after business hours; Includes only calls transferred to a live agent