saturn·

data trove snap participation benefits

source /home/coolhand/html/datavis/data_trove/data/urban/food_deserts/snap_gap_states.json 20 rows 6 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:default

This dataset tracks SNAP (food stamp) program enrollment across 20 U.S. states, capturing estimated eligible populations, actual participants, and the resulting coverage gap. Two things stand out immediately: first, the enrollment rate and gap percentage are constant across all states (67% enrolled, 33% gap), suggesting these are summary-level figures rather than state-specific calculations — they should not be used for cross-state comparison. Second, the three population-count columns (eligible, participants, gap) are all heavily right-skewed with 2 outliers each, pointing to a small number of very large states — likely California and/or Florida — that dwarf the rest and will dominate any totals-based analysis.

citing: snap_enrollment_rate.stats.mean · snap_gap_pct.stats.mean · snap_eligible_est.stats.max · snap_eligible_est.stats.median · snap_eligible_est.n_outliers · snap_gap.stats.max · snap_gap.stats.median · snap_participants_est.n_outliers · snap_eligible_est.stats.skew

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
state_name categorical 0.0% 20
long_tail
snap_eligible_est numeric 0.0% 20
high_skew outliers
snap_participants_est numeric 0.0% 20
high_skew outliers
snap_gap numeric 0.0% 20
high_skew outliers
snap_gap_pct numeric 0.0% 1
constant
snap_enrollment_rate numeric 0.0% 1
constant

state_name

categorical label long_tail
This column contains US state (and territory) names, with exactly 20 unique values across 20 rows — every row holds a distinct state name, yielding a perfect entropy ratio of 1.0 and a top_rate of 0.05. The dataset appears to be a small, deduplicated lookup or summary table with one row per geographic unit rather than a transactional dataset. The 'long_tail' alert is a statistical artifact of the perfectly uniform distribution, not a genuine skew concern. Treatment: Use as a join key or display label; consider mapping to standard FIPS codes if merging with other geographic datasets. high · anthropic:default
n
20
nulls
0 (0.0%)
unique
20
top_value
Alabama
top_rate
0.05
cardinality
20
entropy
4.322
entropy_ratio
1

snap_eligible_est

numeric feature high_skew outliers
This column represents estimated counts of SNAP-eligible individuals, likely at a geographic unit level (e.g., county or district). Values span a wide range from 75,227 to 4,685,272, with a median of 507,917 well below the mean of 857,172.75 — a classic signature of population-size data. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of rows), almost certainly large urban jurisdictions pulling the tail sharply upward. Treatment: Log-transform before regression or clustering to reduce right-skew and dampen outlier influence. high · anthropic:default
n
20
nulls
0 (0.0%)
unique
20
min
75,227
max
4.685e+06
mean
8.572e+05
median
507,917
std
1.103e+06
q1
1.855e+05
q3
8.607e+05
iqr
6.753e+05
skew
2.421
kurtosis
5.577
n_outliers
2
outlier_rate
0.1
zero_rate
0

snap_participants_est

numeric feature high_skew outliers
This column represents estimated SNAP (Supplemental Nutrition Assistance Program) participant counts, likely at the state or large geographic-unit level given the scale of values ranging from 50,403 to 3,139,134. With only 20 rows and 20 unique values, each record appears to represent a distinct entity (e.g., a U.S. state or territory). The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58) with 2 outliers (10% of the dataset), suggesting a small number of very large states—likely California or Texas—dominate the upper tail, while the median (340,304) sits well below the mean (574,306). Treatment: Log-transform before regression or clustering to reduce right skew and outlier influence. high · anthropic:default
n
20
nulls
0 (0.0%)
unique
20
min
50,403
max
3.139e+06
mean
5.743e+05
median
340,304
std
7.39e+05
q1
1.243e+05
q3
5.767e+05
iqr
4.524e+05
skew
2.421
kurtosis
5.577
n_outliers
2
outlier_rate
0.1
zero_rate
0

snap_gap

numeric feature high_skew outliers
snap_gap is a numeric column likely representing some form of gap or interval measurement (e.g., a time delta, distance, or capacity gap between snapshots). With only 20 rows and 20 unique values, every observation is distinct. The distribution is heavily right-skewed (skew = 2.42, kurtosis = 5.58): the median is 167,613 while the mean is pulled to 282,866 by a long upper tail, and 2 outliers (10% of rows) push up to a maximum of 1,546,138 — roughly 25× the minimum of 24,824. The IQR of 222,841 spans a wide range relative to Q1 of 61,204, signalling high dispersion. Treatment: log-transform before regression or modelling to reduce skew; investigate the 2 outliers (values near 1,546,138) for validity before inclusion. medium · anthropic:default
n
20
nulls
0 (0.0%)
unique
20
min
24,824
max
1.546e+06
mean
2.829e+05
median
167,613
std
3.64e+05
q1
6.12e+04
q3
2.84e+05
iqr
222,841
skew
2.421
kurtosis
5.577
n_outliers
2
outlier_rate
0.1
zero_rate
0

snap_gap_pct

numeric metadata constant
This column appears to represent a snapshot gap percentage, likely a configuration or policy parameter defining some threshold or interval (e.g., 33%). Across all 20 rows the value is identically 33.0 with zero variance, zero nulls, and a single unique value — making it a degenerate constant with no discriminative power whatsoever. Treatment: Drop before modelling; constant column provides zero information gain and will cause issues in variance-sensitive methods. high · anthropic:default
n
20
nulls
0 (0.0%)
unique
1
min
33
max
33
mean
33
median
33
std
0
q1
33
q3
33
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

snap_enrollment_rate

numeric feature constant
This column represents a SNAP (Supplemental Nutrition Assistance Program) enrollment rate, likely expressed as a percentage. Every single one of the 20 rows holds the identical value of 67.0, with zero variance, zero IQR, and a standard deviation of 0.0 — the column is perfectly constant. This is a strong signal that the value was hardcoded, imputed with a single figure, or the dataset captures a single geographic/temporal stratum where this rate was applied uniformly. It carries no predictive or descriptive information across rows. Treatment: Drop before modelling — zero-variance constant column provides no information and will cause issues with many estimators. high · anthropic:default
n
20
nulls
0 (0.0%)
unique
1
min
67
max
67
mean
67
median
67
std
0
q1
67
q3
67
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0