saturn·

food deserts food desert merged

source /home/coolhand/datasets/us-inequality-atlas/food_deserts/food_desert_merged.csv 3,222 rows 11 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 3,222 rows and 11 columns of US county-level indicators on poverty, SNAP eligibility and participation, vehicle access, and total population, keyed by FIPS and county/state codes. The population and program-count columns (total_pop, poverty_pop, snap_eligible_est, snap_participants_est, no_vehicle_total) are extremely right-skewed, with skew values from 13 to 20 and around 11-14% of rows flagged as outliers — a handful of very large counties dominate the raw totals. Note that snap_eligible_est and poverty_pop have identical statistics, suggesting one is a direct copy of the other and worth verifying before analysis. The rate-based columns are more tractable: poverty_rate has a moderate skew of 2.1 with a median of 13.55%, and no_vehicle_pct has a median of 5.41% but a long tail reaching 85.94%. Start with the rate columns for cross-county comparison and reserve the totals for absolute-magnitude questions.

citing: row_count · column_count · columns.total_pop.stats · columns.poverty_pop.stats · columns.snap_eligible_est.stats · columns.snap_participants_est.stats · columns.no_vehicle_total.stats · columns.poverty_rate.stats · columns.no_vehicle_pct.stats · columns.state.n_unique

Schema

11 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name text 0.0% 3,222
near_unique
total_pop numeric 0.0% 3,173
high_skew outliers
poverty_pop numeric 0.0% 2,839
high_skew outliers
state numeric 0.0% 52
county numeric 0.0% 330
high_skew outliers
fips numeric 0.0% 3,222
poverty_rate numeric 0.0% 1,719
high_skew
snap_eligible_est numeric 0.0% 2,839
high_skew outliers
snap_participants_est numeric 0.0% 2,636
high_skew outliers
no_vehicle_total numeric 0.0% 1,823
high_skew outliers
no_vehicle_pct numeric 0.0% 1,065
high_skew

name

text identifier near_unique
This column holds full county names paired with state (e.g., "... County, Texas"), as evidenced by "county," appearing 2999 times out of 3222 rows alongside top state tokens like Texas (256), Virginia (189), and Georgia (159). Every value is unique (n_unique=3222, null_rate=0) and lengths are tightly clustered (mean 24.3, min 16, max 59, ~3 words), consistent with a canonical place-name label. The near_unique alert confirms it functions as a row identifier rather than a categorical feature. Treatment: Use as a join key on county-state; do not feed into models as a categorical feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_pop

numeric feature high_skew outliers
Population counts per record, ranging from 47 to 9,782,602 with a median of 25,174 — consistent with US county-level totals. The distribution is extremely right-skewed (skew 13.36, kurtosis 297.59) and 13.9% of rows (449) flag as outliers, driven by a handful of mega-population entities pulling the mean (101,340) far above the median. Treatment: log-transform before regression or distance-based modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,173
min
47
max
9.783e+06
mean
1.013e+05
median
25,174
std
3.246e+05
q1
1.059e+04
q3
6.501e+04
iqr
5.442e+04
skew
13.36
kurtosis
297.6
n_outliers
449
outlier_rate
0.1394
zero_rate
0

poverty_pop

numeric feature high_skew outliers
This is a count of population in poverty per record (likely a county or similar geographic unit), ranging from 3 to 1,343,978 with a median of 3,799.5. The distribution is extremely right-skewed (skew 14.73, kurtosis 342.21) and 362 values (11.2%) are flagged as outliers, consistent with a few very large jurisdictions dwarfing the rest. No nulls or zeros, and 2,839 of 3,222 values are unique. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,839
min
3
max
1.344e+06
mean
1.3e+04
median
3800
std
4.326e+04
q1
1526
q3
9768
iqr
8242
skew
14.73
kurtosis
342.2
n_outliers
362
outlier_rate
0.1124
zero_rate
0

state

numeric identifier
Numeric codes ranging from 1 to 72 with 52 unique values across 3222 rows and no nulls strongly suggest US state/territory FIPS codes rather than a true measurement. The near-uniform spread (mean 31.27, median 30, std 16.29, skew 0.16) and absence of outliers are consistent with a categorical identifier encoded as integers. Treating these as a continuous feature would be misleading. Treatment: Cast to categorical and map FIPS codes to state names before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
min
1
max
72
mean
31.27
median
30
std
16.29
q1
19
q3
46
iqr
27
skew
0.1574
kurtosis
-0.6267
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric foreign_key high_skew outliers
Despite the name 'county', this column is stored as numeric with 330 unique integer values from 1 to 840 across 3,222 rows — consistent with a county FIPS or lookup code rather than a measured quantity. The distribution is heavily right-skewed (skew 2.87, kurtosis 11.6) with 178 outliers (5.5%), which is expected behavior for an ID-like code, not a meaningful statistical signal. No nulls or zeros are present. Treatment: Cast to categorical/string and treat as a county code; do not use as a continuous numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
330
min
1
max
840
mean
103.2
median
79
std
106.6
q1
35
q3
133
iqr
98
skew
2.866
kurtosis
11.64
n_outliers
178
outlier_rate
0.05525
zero_rate
0

fips

numeric identifier
This is the U.S. county FIPS code: every one of the 3222 rows is unique, with values spanning 1001 to 72153, consistent with state-prefixed county identifiers. The distribution is near-symmetric (skew 0.16, kurtosis -0.63) and has no outliers or nulls, as expected for a structured code rather than a measurement. Despite being numeric, the values are categorical labels and arithmetic on them is meaningless. Treatment: treat as a categorical key and left-join county-level attributes on it rather than using it as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

poverty_rate

numeric feature high_skew
This column appears to be a county- or area-level poverty rate expressed as a percentage, with 3222 rows, 1719 unique values, and no nulls. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with a median of 13.55 and mean 15.10, but a long tail stretching to a max of 66.32 versus a min of 1.6. About 4.25% of rows (137) are flagged as outliers, consistent with a small set of severely impoverished areas. Treatment: Consider a log or sqrt transform before regression to tame the right skew. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,719
min
1.6
max
66.32
mean
15.1
median
13.55
std
7.706
q1
10.16
q3
17.91
iqr
7.75
skew
2.096
kurtosis
6.891
n_outliers
137
outlier_rate
0.04252
zero_rate
0

snap_eligible_est

numeric feature high_skew outliers
A numeric estimate of SNAP-eligible counts per record, with 3222 non-null rows and 2839 unique values. The distribution is severely right-skewed (skew 14.73, kurtosis 342.21): the median is 3799.5 but the max reaches 1,343,978, and 11.2% of rows flag as outliers. No nulls or zeros are present, so the spread is real, not missingness artefact. Treatment: log-transform (or winsorize) before any distance- or variance-sensitive modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,839
min
3
max
1.344e+06
mean
1.3e+04
median
3800
std
4.326e+04
q1
1526
q3
9768
iqr
8242
skew
14.73
kurtosis
342.2
n_outliers
362
outlier_rate
0.1124
zero_rate
0

snap_participants_est

numeric feature high_skew outliers
Estimated SNAP participant counts per record, ranging from 2 to 900,465 with a median of 2,546 and mean of 8,711. The distribution is severely right-skewed (skew 14.73, kurtosis 342.21) with 362 outliers (11.2%) and a standard deviation (28,987) more than three times the mean, suggesting a few very large jurisdictions dominate. No nulls or zeros are present across 3,222 rows. Treatment: log-transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
2,636
min
2
max
900,465
mean
8711
median
2,546
std
2.899e+04
q1
1022
q3
6544
iqr
5,522
skew
14.73
kurtosis
342.2
n_outliers
362
outlier_rate
0.1124
zero_rate
0

no_vehicle_total

numeric feature high_skew outliers
This column appears to be an aggregate vehicle count (likely total number of vehicles per record/area). The distribution is extremely heavy-tailed: median is 580 but the mean is 3304 and the maximum reaches 601,621, with skew of 20.26 and kurtosis of 501.27. About 12.6% of rows (407) flag as outliers, while only 0.37% are zeros and there are no nulls. Treatment: Log-transform (or winsorize) before any distance- or variance-based modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,823
min
0
max
601,621
mean
3304
median
580
std
2.005e+04
q1
223
q3
1555
iqr
1332
skew
20.26
kurtosis
501.3
n_outliers
407
outlier_rate
0.1263
zero_rate
0.003724

no_vehicle_pct

numeric feature high_skew
Likely a per-area percentage of households without a vehicle, given values bounded between 0.0 and 85.94 with a median of 5.41 and Q1-Q3 of 3.98-7.36. The distribution is severely right-skewed (skew 6.98, kurtosis 86.23) with 140 outliers (4.35%) stretching far above the typical range, while only 0.37% of rows are exactly zero. No nulls across 3,222 rows. Treatment: Apply a log1p or similar transform before regression to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,065
min
0
max
85.94
mean
6.197
median
5.41
std
4.538
q1
3.98
q3
7.36
iqr
3.38
skew
6.976
kurtosis
86.23
n_outliers
140
outlier_rate
0.04345
zero_rate
0.003724