data trove snap participation benefits
Reading
This dataset tracks SNAP (food stamp) program enrollment across 20 U.S. states, capturing estimated eligible populations, actual participants, and the resulting coverage gap. Two things stand out immediately: first, the enrollment rate and gap percentage are constant across all states (67% enrolled, 33% gap), suggesting these are summary-level figures rather than state-specific calculations — they should not be used for cross-state comparison. Second, the three population-count columns (eligible, participants, gap) are all heavily right-skewed with 2 outliers each, pointing to a small number of very large states — likely California and/or Florida — that dwarf the rest and will dominate any totals-based analysis.
citing: snap_enrollment_rate.stats.mean · snap_gap_pct.stats.mean · snap_eligible_est.stats.max · snap_eligible_est.stats.median · snap_eligible_est.n_outliers · snap_gap.stats.max · snap_gap.stats.median · snap_participants_est.n_outliers · snap_eligible_est.stats.skew
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 7.523e+04 – 9.972e+05 | 16 |
| 9.972e+05 – 1.919e+06 | 2 |
| 1.919e+06 – 2.841e+06 | 1 |
| 2.841e+06 – 3.763e+06 | 0 |
| 3.763e+06 – 4.685e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 2.482e+04 – 3.291e+05 | 16 |
| 3.291e+05 – 6.333e+05 | 2 |
| 6.333e+05 – 9.376e+05 | 1 |
| 9.376e+05 – 1.242e+06 | 0 |
| 1.242e+06 – 1.546e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 5.04e+04 – 6.681e+05 | 16 |
| 6.681e+05 – 1.286e+06 | 2 |
| 1.286e+06 – 1.904e+06 | 1 |
| 1.904e+06 – 2.521e+06 | 0 |
| 2.521e+06 – 3.139e+06 | 1 |
Show data table
| value | count | share |
|---|---|---|
| Alabama | 1 | 5.0% |
| Alaska | 1 | 5.0% |
| Arizona | 1 | 5.0% |
| Arkansas | 1 | 5.0% |
| California | 1 | 5.0% |
| Colorado | 1 | 5.0% |
| Connecticut | 1 | 5.0% |
| Delaware | 1 | 5.0% |
| District of Columbia | 1 | 5.0% |
| Florida | 1 | 5.0% |
| Georgia | 1 | 5.0% |
| Hawaii | 1 | 5.0% |
| Idaho | 1 | 5.0% |
| Illinois | 1 | 5.0% |
| Indiana | 1 | 5.0% |
| Iowa | 1 | 5.0% |
| Kansas | 1 | 5.0% |
| Kentucky | 1 | 5.0% |
| Louisiana | 1 | 5.0% |
| Maine | 1 | 5.0% |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| state_name | categorical | 0.0% | 20 |
long_tail
|
| snap_eligible_est | numeric | 0.0% | 20 |
high_skew
outliers
|
| snap_participants_est | numeric | 0.0% | 20 |
high_skew
outliers
|
| snap_gap | numeric | 0.0% | 20 |
high_skew
outliers
|
| snap_gap_pct | numeric | 0.0% | 1 |
constant
|
| snap_enrollment_rate | numeric | 0.0% | 1 |
constant
|
state_name
categorical label long_tailThis column contains US state (and territory) names, with exactly 20 unique values across 20 rows — every row holds a distinct state name, yielding a perfect entropy ratio of 1.0 and a top_rate of 0.05. The dataset appears to be a small, deduplicated lookup or summary table with one row per geographic unit rather than a transactional dataset. The 'long_tail' alert is a statistical artifact of the perfectly uniform distribution, not a genuine skew concern. Treatment: Use as a join key or display label; consider mapping to standard FIPS codes if merging with other geographic datasets.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 20
- top_value
- Alabama
- top_rate
- 0.05
- cardinality
- 20
- entropy
- 4.322
- entropy_ratio
- 1
snap_eligible_est
numeric feature high_skew outliersThis column represents estimated counts of SNAP-eligible individuals, likely at a geographic unit level (e.g., county or district). Values span a wide range from 75,227 to 4,685,272, with a median of 507,917 well below the mean of 857,172.75 — a classic signature of population-size data. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of rows), almost certainly large urban jurisdictions pulling the tail sharply upward. Treatment: Log-transform before regression or clustering to reduce right-skew and dampen outlier influence.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 20
- min
- 75,227
- max
- 4.685e+06
- mean
- 8.572e+05
- median
- 507,917
- std
- 1.103e+06
- q1
- 1.855e+05
- q3
- 8.607e+05
- iqr
- 6.753e+05
- skew
- 2.421
- kurtosis
- 5.577
- n_outliers
- 2
- outlier_rate
- 0.1
- zero_rate
- 0
snap_participants_est
numeric feature high_skew outliersThis column represents estimated SNAP (Supplemental Nutrition Assistance Program) participant counts, likely at the state or large geographic-unit level given the scale of values ranging from 50,403 to 3,139,134. With only 20 rows and 20 unique values, each record appears to represent a distinct entity (e.g., a U.S. state or territory). The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of the dataset), suggesting a small number of very large states—likely California or Texas—dominate the upper tail, while the median (340,304) sits well below the mean (574,306). Treatment: Log-transform before regression or clustering to reduce right skew and outlier influence.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 20
- min
- 50,403
- max
- 3.139e+06
- mean
- 5.743e+05
- median
- 340,304
- std
- 7.39e+05
- q1
- 1.243e+05
- q3
- 5.767e+05
- iqr
- 4.524e+05
- skew
- 2.421
- kurtosis
- 5.577
- n_outliers
- 2
- outlier_rate
- 0.1
- zero_rate
- 0
snap_gap
numeric feature high_skew outlierssnap_gap is a numeric column likely representing some form of gap or interval measurement (e.g., a time delta, distance, or capacity gap between snapshots). With only 20 rows and 20 unique values, every observation is distinct. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58): the median is 167,613 while the mean is pulled to 282,866 by a long upper tail, and 2 outliers (10% of rows) push up to a maximum of 1,546,138 — roughly 25× the minimum of 24,824. The IQR of 222,841 spans a wide range relative to Q1 of 61,204, signalling high dispersion. Treatment: log-transform before regression or modelling to reduce skew; investigate the 2 outliers (values near 1,546,138) for validity before inclusion.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 20
- min
- 24,824
- max
- 1.546e+06
- mean
- 2.829e+05
- median
- 167,613
- std
- 3.64e+05
- q1
- 6.12e+04
- q3
- 2.84e+05
- iqr
- 222,841
- skew
- 2.421
- kurtosis
- 5.577
- n_outliers
- 2
- outlier_rate
- 0.1
- zero_rate
- 0
snap_gap_pct
numeric metadata constantThis column appears to represent a snapshot gap percentage, likely a configuration or policy parameter defining some threshold or interval (e.g., 33%). Across all 20 rows the value is identically 33.0 with zero variance, zero nulls, and a single unique value — making it a degenerate constant with no discriminative power whatsoever. Treatment: Drop before modelling; constant column provides zero information gain and will cause issues in variance-sensitive methods.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 33
- max
- 33
- mean
- 33
- median
- 33
- std
- 0
- q1
- 33
- q3
- 33
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
snap_enrollment_rate
numeric feature constantThis column represents a SNAP (Supplemental Nutrition Assistance Program) enrollment rate, likely expressed as a percentage. Every single one of the 20 rows holds the identical value of 67.0, with zero variance, zero IQR, and a standard deviation of 0.0 — the column is perfectly constant. This is a strong signal that the value was hardcoded, imputed with a single figure, or the dataset captures a single geographic/temporal stratum where this rate was applied uniformly. It carries no predictive or descriptive information across rows. Treatment: Drop before modelling — zero-variance constant column provides no information and will cause issues with many estimators.
- n
- 20
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 67
- max
- 67
- mean
- 67
- median
- 67
- std
- 0
- q1
- 67
- q3
- 67
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0