data trove snap participation benefits

source /home/coolhand/html/datavis/data_trove/data/urban/food_deserts/snap_gap_states.json 20 rows 6 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:default

This dataset tracks SNAP (food stamp) program enrollment across 20 U.S. states, capturing estimated eligible populations, actual participants, and the resulting coverage gap. Two things stand out immediately: first, the enrollment rate and gap percentage are constant across all states (67% enrolled, 33% gap), suggesting these are summary-level figures rather than state-specific calculations — they should not be used for cross-state comparison. Second, the three population-count columns (eligible, participants, gap) are all heavily right-skewed with 2 outliers each, pointing to a small number of very large states — likely California and/or Florida — that dwarf the rest and will dominate any totals-based analysis.

citing: snap_enrollment_rate.stats.mean · snap_gap_pct.stats.mean · snap_eligible_est.stats.max · snap_eligible_est.stats.median · snap_eligible_est.n_outliers · snap_gap.stats.max · snap_gap.stats.median · snap_participants_est.n_outliers · snap_eligible_est.stats.skew

Charts the summary said to look at first

snap_eligible_est · Look for the 2 outlier states whose eligible populations (up to 4,685,272) vastly exceed the median of 507,917 — these will skew any aggregate analysis.

Show data table

Histogram bins for snap_eligible_est (median: 507917.0).
bin	count
7.523e+04 – 9.972e+05	16
9.972e+05 – 1.919e+06	2
1.919e+06 – 2.841e+06	1
2.841e+06 – 3.763e+06	0
3.763e+06 – 4.685e+06	1

snap_gap · The unenrolled gap ranges from 24,824 to 1,546,138 across states; identify which states carry the largest absolute burden of unenrolled eligible residents.

Show data table

Histogram bins for snap_gap (median: 167613.0).
bin	count
2.482e+04 – 3.291e+05	16
3.291e+05 – 6.333e+05	2
6.333e+05 – 9.376e+05	1
9.376e+05 – 1.242e+06	0
1.242e+06 – 1.546e+06	1

snap_participants_est · The right-skewed distribution of actual participants shows most states cluster at lower counts, with a long tail driven by a few large states.

Show data table

Histogram bins for snap_participants_est (median: 340304.0).
bin	count
5.04e+04 – 6.681e+05	16
6.681e+05 – 1.286e+06	2
1.286e+06 – 1.904e+06	1
1.904e+06 – 2.521e+06	0
2.521e+06 – 3.139e+06	1

state_name · Use this as the index axis when comparing states — each of the 20 states appears exactly once, confirming one row per state.

Show data table

Top values for state_name (20 unique shown, of 20 total).
value	count	share
Alabama	1	5.0%
Alaska	1	5.0%
Arizona	1	5.0%
Arkansas	1	5.0%
California	1	5.0%
Colorado	1	5.0%
Connecticut	1	5.0%
Delaware	1	5.0%
District of Columbia	1	5.0%
Florida	1	5.0%
Georgia	1	5.0%
Hawaii	1	5.0%
Idaho	1	5.0%
Illinois	1	5.0%
Indiana	1	5.0%
Iowa	1	5.0%
Kansas	1	5.0%
Kentucky	1	5.0%
Louisiana	1	5.0%
Maine	1	5.0%

Schema

6 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
state_name	categorical	0.0%	20	long_tail
snap_eligible_est	numeric	0.0%	20	high_skew outliers
snap_participants_est	numeric	0.0%	20	high_skew outliers
snap_gap	numeric	0.0%	20	high_skew outliers
snap_gap_pct	numeric	0.0%	1	constant
snap_enrollment_rate	numeric	0.0%	1	constant

state_name

categorical label long_tail

This column contains US state (and territory) names, with exactly 20 unique values across 20 rows — every row holds a distinct state name, yielding a perfect entropy ratio of 1.0 and a top_rate of 0.05. The dataset appears to be a small, deduplicated lookup or summary table with one row per geographic unit rather than a transactional dataset. The 'long_tail' alert is a statistical artifact of the perfectly uniform distribution, not a genuine skew concern. Treatment: Use as a join key or display label; consider mapping to standard FIPS codes if merging with other geographic datasets. high · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 20
top_value: Alabama
top_rate: 0.05
cardinality: 20
entropy: 4.322
entropy_ratio: 1

snap_eligible_est

numeric feature high_skew outliers

This column represents estimated counts of SNAP-eligible individuals, likely at a geographic unit level (e.g., county or district). Values span a wide range from 75,227 to 4,685,272, with a median of 507,917 well below the mean of 857,172.75 — a classic signature of population-size data. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of rows), almost certainly large urban jurisdictions pulling the tail sharply upward. Treatment: Log-transform before regression or clustering to reduce right-skew and dampen outlier influence. high · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 20
min: 75,227
max: 4.685e+06
mean: 8.572e+05
median: 507,917
std: 1.103e+06
q1: 1.855e+05
q3: 8.607e+05
iqr: 6.753e+05
skew: 2.421
kurtosis: 5.577
n_outliers: 2
outlier_rate: 0.1
zero_rate: 0

snap_participants_est

numeric feature high_skew outliers

This column represents estimated SNAP (Supplemental Nutrition Assistance Program) participant counts, likely at the state or large geographic-unit level given the scale of values ranging from 50,403 to 3,139,134. With only 20 rows and 20 unique values, each record appears to represent a distinct entity (e.g., a U.S. state or territory). The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of the dataset), suggesting a small number of very large states—likely California or Texas—dominate the upper tail, while the median (340,304) sits well below the mean (574,306). Treatment: Log-transform before regression or clustering to reduce right skew and outlier influence. high · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 20
min: 50,403
max: 3.139e+06
mean: 5.743e+05
median: 340,304
std: 7.39e+05
q1: 1.243e+05
q3: 5.767e+05
iqr: 4.524e+05
skew: 2.421
kurtosis: 5.577
n_outliers: 2
outlier_rate: 0.1
zero_rate: 0

snap_gap

numeric feature high_skew outliers

snap_gap is a numeric column likely representing some form of gap or interval measurement (e.g., a time delta, distance, or capacity gap between snapshots). With only 20 rows and 20 unique values, every observation is distinct. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58): the median is 167,613 while the mean is pulled to 282,866 by a long upper tail, and 2 outliers (10% of rows) push up to a maximum of 1,546,138 — roughly 25× the minimum of 24,824. The IQR of 222,841 spans a wide range relative to Q1 of 61,204, signalling high dispersion. Treatment: log-transform before regression or modelling to reduce skew; investigate the 2 outliers (values near 1,546,138) for validity before inclusion. medium · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 20
min: 24,824
max: 1.546e+06
mean: 2.829e+05
median: 167,613
std: 3.64e+05
q1: 6.12e+04
q3: 2.84e+05
iqr: 222,841
skew: 2.421
kurtosis: 5.577
n_outliers: 2
outlier_rate: 0.1
zero_rate: 0

snap_gap_pct

numeric metadata constant

This column appears to represent a snapshot gap percentage, likely a configuration or policy parameter defining some threshold or interval (e.g., 33%). Across all 20 rows the value is identically 33.0 with zero variance, zero nulls, and a single unique value — making it a degenerate constant with no discriminative power whatsoever. Treatment: Drop before modelling; constant column provides zero information gain and will cause issues in variance-sensitive methods. high · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 1
min: 33
max: 33
mean: 33
median: 33
std: 0
q1: 33
q3: 33
iqr: 0
skew: 0
kurtosis: 0
n_outliers: 0
outlier_rate: 0
zero_rate: 0

snap_enrollment_rate

numeric feature constant

This column represents a SNAP (Supplemental Nutrition Assistance Program) enrollment rate, likely expressed as a percentage. Every single one of the 20 rows holds the identical value of 67.0, with zero variance, zero IQR, and a standard deviation of 0.0 — the column is perfectly constant. This is a strong signal that the value was hardcoded, imputed with a single figure, or the dataset captures a single geographic/temporal stratum where this rate was applied uniformly. It carries no predictive or descriptive information across rows. Treatment: Drop before modelling — zero-variance constant column provides no information and will cause issues with many estimators. high · anthropic:default

n: 20
nulls: 0 (0.0%)
unique: 1
min: 67
max: 67
mean: 67
median: 67
std: 0
q1: 67
q3: 67
iqr: 0
skew: 0
kurtosis: 0
n_outliers: 0
outlier_rate: 0
zero_rate: 0