food_deserts-food_desert_merged

Overview

Source: /home/coolhand/datasets/us-inequality-atlas/food_deserts/food_desert_merged.csv

Saturn profiled 3,222 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-inequality-atlas/food_deserts/food_desert_merged.csv",
    "--findings", "food_deserts-food_desert_merged.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,222 rows and 11 columns of US county-level indicators on poverty, SNAP eligibility and participation, vehicle access, and total population, keyed by FIPS and county/state codes. The population and program-count columns (total_pop, poverty_pop, snap_eligible_est, snap_participants_est, no_vehicle_total) are extremely right-skewed, with skew values from 13 to 20 and around 11-14% of rows flagged as outliers — a handful of very large counties dominate the raw totals. Note that snap_eligible_est and poverty_pop have identical statistics, suggesting one is a direct copy of the other and worth verifying before analysis. The rate-based columns are more tractable: poverty_rate has a moderate skew of 2.1 with a median of 13.55%, and no_vehicle_pct has a median of 5.41% but a long tail reaching 85.94%. Start with the rate columns for cross-county comparison and reserve the totals for absolute-magnitude questions.

citing: row_count · column_count · columns.total_pop.stats · columns.poverty_pop.stats · columns.snap_eligible_est.stats · columns.snap_participants_est.stats · columns.no_vehicle_total.stats · columns.poverty_rate.stats · columns.no_vehicle_pct.stats · columns.state.n_unique

Out[4]:

saturn.schema() · 11 columns

column	kind	n	null%	unique	alerts
name	text	3,222	0.0%	3,222	near_unique
total_pop	numeric	3,222	0.0%	3,173	high_skew outliers
poverty_pop	numeric	3,222	0.0%	2,839	high_skew outliers
state	numeric	3,222	0.0%	52
county	numeric	3,222	0.0%	330	high_skew outliers
fips	numeric	3,222	0.0%	3,222
poverty_rate	numeric	3,222	0.0%	1,719	high_skew
snap_eligible_est	numeric	3,222	0.0%	2,839	high_skew outliers
snap_participants_est	numeric	3,222	0.0%	2,636	high_skew outliers
no_vehicle_total	numeric	3,222	0.0%	1,823	high_skew outliers
no_vehicle_pct	numeric	3,222	0.0%	1,065	high_skew

Fig 1.

poverty_rate · Distribution of county poverty rates; median around 13.55% with a moderate right tail up to 66%.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

Fig 2.

no_vehicle_pct · Share of households without a vehicle per county — most cluster below 8% but a long tail reaches near 86%.

Show data table

Histogram bins for no_vehicle_pct (median: 5.41).
bin	count
0 – 2.148	161
2.148 – 4.297	823
4.297 – 6.445	1091
6.445 – 8.594	630
8.594 – 10.74	283
10.74 – 12.89	111
12.89 – 15.04	61
15.04 – 17.19	23
17.19 – 19.34	8
19.34 – 21.48	3
21.48 – 23.63	4
23.63 – 25.78	2
25.78 – 27.93	3
27.93 – 30.08	2
30.08 – 32.23	2
32.23 – 34.38	2
34.38 – 36.52	2
36.52 – 38.67	2
38.67 – 40.82	0
40.82 – 42.97	1
42.97 – 45.12	0
45.12 – 47.27	1
47.27 – 49.42	0
49.42 – 51.56	0
51.56 – 53.71	0
53.71 – 55.86	1
55.86 – 58.01	1
58.01 – 60.16	0
60.16 – 62.31	2
62.31 – 64.45	0
64.45 – 66.6	1
66.6 – 68.75	0
68.75 – 70.9	0
70.9 – 73.05	0
73.05 – 75.2	0
75.2 – 77.35	0
77.35 – 79.49	1
79.49 – 81.64	0
81.64 – 83.79	0
83.79 – 85.94	1

Fig 3.

total_pop · County population is heavily right-skewed; expect to log-transform before modeling.

Show data table

Histogram bins for total_pop (median: 25174.0).
bin	count
47 – 2.446e+05	2942
2.446e+05 – 4.892e+05	137
4.892e+05 – 7.337e+05	57
7.337e+05 – 9.783e+05	39
9.783e+05 – 1.223e+06	12
1.223e+06 – 1.467e+06	9
1.467e+06 – 1.712e+06	7
1.712e+06 – 1.957e+06	3
1.957e+06 – 2.201e+06	3
2.201e+06 – 2.446e+06	4
2.446e+06 – 2.69e+06	3
2.69e+06 – 2.935e+06	0
2.935e+06 – 3.179e+06	1
3.179e+06 – 3.424e+06	1
3.424e+06 – 3.669e+06	0
3.669e+06 – 3.913e+06	0
3.913e+06 – 4.158e+06	0
4.158e+06 – 4.402e+06	1
4.402e+06 – 4.647e+06	0
4.647e+06 – 4.891e+06	1
4.891e+06 – 5.136e+06	0
5.136e+06 – 5.38e+06	1
5.38e+06 – 5.625e+06	0
5.625e+06 – 5.87e+06	0
5.87e+06 – 6.114e+06	0
6.114e+06 – 6.359e+06	0
6.359e+06 – 6.603e+06	0
6.603e+06 – 6.848e+06	0
6.848e+06 – 7.092e+06	0
7.092e+06 – 7.337e+06	0
7.337e+06 – 7.582e+06	0
7.582e+06 – 7.826e+06	0
7.826e+06 – 8.071e+06	0
8.071e+06 – 8.315e+06	0
8.315e+06 – 8.56e+06	0
8.56e+06 – 8.804e+06	0
8.804e+06 – 9.049e+06	0
9.049e+06 – 9.293e+06	0
9.293e+06 – 9.538e+06	0
9.538e+06 – 9.783e+06	1

Fig 4.

snap_participants_est · SNAP participant counts span from 2 to 900k, with ~11% of counties flagged as high-end outliers.

Show data table

Histogram bins for snap_participants_est (median: 2546.0).
bin	count
2 – 2.251e+04	2963
2.251e+04 – 4.503e+04	152
4.503e+04 – 6.754e+04	49
6.754e+04 – 9.005e+04	19
9.005e+04 – 1.126e+05	10
1.126e+05 – 1.351e+05	6
1.351e+05 – 1.576e+05	3
1.576e+05 – 1.801e+05	3
1.801e+05 – 2.026e+05	5
2.026e+05 – 2.251e+05	1
2.251e+05 – 2.476e+05	4
2.476e+05 – 2.701e+05	1
2.701e+05 – 2.927e+05	1
2.927e+05 – 3.152e+05	0
3.152e+05 – 3.377e+05	2
3.377e+05 – 3.602e+05	0
3.602e+05 – 3.827e+05	0
3.827e+05 – 4.052e+05	0
4.052e+05 – 4.277e+05	0
4.277e+05 – 4.502e+05	0
4.502e+05 – 4.727e+05	1
4.727e+05 – 4.953e+05	1
4.953e+05 – 5.178e+05	0
5.178e+05 – 5.403e+05	0
5.403e+05 – 5.628e+05	0
5.628e+05 – 5.853e+05	0
5.853e+05 – 6.078e+05	0
6.078e+05 – 6.303e+05	0
6.303e+05 – 6.528e+05	0
6.528e+05 – 6.753e+05	0
6.753e+05 – 6.979e+05	0
6.979e+05 – 7.204e+05	0
7.204e+05 – 7.429e+05	0
7.429e+05 – 7.654e+05	0
7.654e+05 – 7.879e+05	0
7.879e+05 – 8.104e+05	0
8.104e+05 – 8.329e+05	0
8.329e+05 – 8.554e+05	0
8.554e+05 – 8.78e+05	0
8.78e+05 – 9.005e+05	1

Fig 5.

state · Counts of counties per state code (52 unique values) to confirm geographic coverage.

Show data table

Histogram bins for state (median: 30.0).
bin	count
1 – 2.775	97
2.775 – 4.55	15
4.55 – 6.325	133
6.325 – 8.1	64
8.1 – 9.875	9
9.875 – 11.65	4
11.65 – 13.42	226
13.42 – 15.2	5
15.2 – 16.98	44
16.98 – 18.75	194
18.75 – 20.52	204
20.52 – 22.3	184
22.3 – 24.07	40
24.07 – 25.85	14
25.85 – 27.62	170
27.62 – 29.4	197
29.4 – 31.17	149
31.17 – 32.95	17
32.95 – 34.73	31
34.73 – 36.5	95
36.5 – 38.27	153
38.27 – 40.05	165
40.05 – 41.82	36
41.82 – 43.6	67
43.6 – 45.38	51
45.38 – 47.15	161
47.15 – 48.92	254
48.92 – 50.7	43
50.7 – 52.47	133
52.47 – 54.25	94
54.25 – 56.02	95
56.02 – 57.8	0
57.8 – 59.57	0
59.57 – 61.35	0
61.35 – 63.12	0
63.12 – 64.9	0
64.9 – 66.67	0
66.67 – 68.45	0
68.45 – 70.22	0
70.22 – 72	78

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
name	text	0.0%
total_pop	numeric	0.0%
poverty_pop	numeric	0.0%
state	numeric	0.0%
county	numeric	0.0%
fips	numeric	0.0%
poverty_rate	numeric	0.0%
snap_eligible_est	numeric	0.0%
snap_participants_est	numeric	0.0%
no_vehicle_total	numeric	0.0%
no_vehicle_pct	numeric	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 10 numeric columns (values clipped to 2 decimals).
	total_pop	poverty_pop	state	county	fips	poverty_rate	snap_eligible_est	snap_participants_est	no_vehicle_total	no_vehicle_pct
total_pop	+1.00	+0.96	-0.07	-0.02	-0.07	-0.11	+0.96	+0.96	+0.90	+0.09
poverty_pop	+0.96	+1.00	-0.07	-0.01	-0.07	-0.03	+1.00	+1.00	+0.92	+0.14
state	-0.07	-0.07	+1.00	+0.14	+1.00	+0.16	-0.07	-0.07	-0.05	+0.04
county	-0.02	-0.01	+0.14	+1.00	+0.15	+0.05	-0.01	-0.01	-0.02	+0.11
fips	-0.07	-0.07	+1.00	+0.15	+1.00	+0.16	-0.07	-0.07	-0.05	+0.04
poverty_rate	-0.11	-0.03	+0.16	+0.05	+0.16	+1.00	-0.03	-0.03	-0.04	+0.45
snap_eligible_est	+0.96	+1.00	-0.07	-0.01	-0.07	-0.03	+1.00	+1.00	+0.92	+0.14
snap_participants_est	+0.96	+1.00	-0.07	-0.01	-0.07	-0.03	+1.00	+1.00	+0.92	+0.14
no_vehicle_total	+0.90	+0.92	-0.05	-0.02	-0.05	-0.04	+0.92	+0.92	+1.00	+0.15
no_vehicle_pct	+0.09	+0.14	+0.04	+0.11	+0.04	+0.45	+0.14	+0.14	+0.15	+1.00

name text identifier

This column holds full county names paired with state (e.g., "... County, Texas"), as evidenced by "county," appearing 2999 times out of 3222 rows alongside top state tokens like Texas (256), Virginia (189), and Georgia (159). Every value is unique (n_unique=3222, null_rate=0) and lengths are tightly clustered (mean 24.3, min 16, max 59, ~3 words), consistent with a canonical place-name label. The near_unique alert confirms it functions as a row identifier rather than a categorical feature.

Treatment: Use as a join key on county-state; do not feed into models as a categorical feature.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["name"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
len_min	16
len_max	59
len_mean	24.32
len_median	24
len_p95	31
word_mean	3.248
word_median	3
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,990
readability_flesch_mean	10.28
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 8.

Character-length distribution for name.

Show data table

Character-length distribution for name (mean: 24.324022346368714).
chars	count
16 – 17	26
17 – 18	72
18 – 19	121
19 – 20	190
20 – 21	264
21 – 22	407
22 – 24	420
24 – 25	363
25 – 26	320
26 – 27	240
27 – 28	231
28 – 29	152
29 – 30	139
30 – 31	165
31 – 32	41
32 – 33	28
33 – 34	16
34 – 35	10
35 – 36	5
36 – 38	0
38 – 39	1
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1
43 – 44	0
44 – 45	2
45 – 46	0
46 – 47	1
47 – 48	1
48 – 49	0
49 – 50	0
50 – 51	0
51 – 53	0
53 – 54	2
54 – 55	1
55 – 56	0
56 – 57	0
57 – 58	0
58 – 59	1

total_pop numeric feature

Population counts per record, ranging from 47 to 9,782,602 with a median of 25,174 — consistent with US county-level totals. The distribution is extremely right-skewed (skew 13.36, kurtosis 297.59) and 13.9% of rows (449) flag as outliers, driven by a handful of mega-population entities pulling the mean (101,340) far above the median.

Treatment: log-transform before regression or distance-based modelling.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["total_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,173
min	47
max	9.783e+06
mean	1.013e+05
median	25,174
std	3.246e+05
q1	1.059e+04
q3	6.501e+04
iqr	5.442e+04
skew	13.36
kurtosis	297.6
n_outliers	449
outlier_rate	0.1394
zero_rate	0
alert: high_skew	skew=+13.36
alert: outliers	13.9% rows beyond 1.5 IQR

Fig 9.

Distribution of total_pop. Vertical dash marks the median.

Show data table

Histogram bins for total_pop (median: 25174.0).
bin	count
47 – 2.446e+05	2942
2.446e+05 – 4.892e+05	137
4.892e+05 – 7.337e+05	57
7.337e+05 – 9.783e+05	39
9.783e+05 – 1.223e+06	12
1.223e+06 – 1.467e+06	9
1.467e+06 – 1.712e+06	7
1.712e+06 – 1.957e+06	3
1.957e+06 – 2.201e+06	3
2.201e+06 – 2.446e+06	4
2.446e+06 – 2.69e+06	3
2.69e+06 – 2.935e+06	0
2.935e+06 – 3.179e+06	1
3.179e+06 – 3.424e+06	1
3.424e+06 – 3.669e+06	0
3.669e+06 – 3.913e+06	0
3.913e+06 – 4.158e+06	0
4.158e+06 – 4.402e+06	1
4.402e+06 – 4.647e+06	0
4.647e+06 – 4.891e+06	1
4.891e+06 – 5.136e+06	0
5.136e+06 – 5.38e+06	1
5.38e+06 – 5.625e+06	0
5.625e+06 – 5.87e+06	0
5.87e+06 – 6.114e+06	0
6.114e+06 – 6.359e+06	0
6.359e+06 – 6.603e+06	0
6.603e+06 – 6.848e+06	0
6.848e+06 – 7.092e+06	0
7.092e+06 – 7.337e+06	0
7.337e+06 – 7.582e+06	0
7.582e+06 – 7.826e+06	0
7.826e+06 – 8.071e+06	0
8.071e+06 – 8.315e+06	0
8.315e+06 – 8.56e+06	0
8.56e+06 – 8.804e+06	0
8.804e+06 – 9.049e+06	0
9.049e+06 – 9.293e+06	0
9.293e+06 – 9.538e+06	0
9.538e+06 – 9.783e+06	1

poverty_pop numeric feature

This is a count of population in poverty per record (likely a county or similar geographic unit), ranging from 3 to 1,343,978 with a median of 3,799.5. The distribution is extremely right-skewed (skew 14.73, kurtosis 342.21) and 362 values (11.2%) are flagged as outliers, consistent with a few very large jurisdictions dwarfing the rest. No nulls or zeros, and 2,839 of 3,222 values are unique.

Treatment: log-transform before regression to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["poverty_pop"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2,839
min	3
max	1.344e+06
mean	1.3e+04
median	3800
std	4.326e+04
q1	1526
q3	9768
iqr	8242
skew	14.73
kurtosis	342.2
n_outliers	362
outlier_rate	0.1124
zero_rate	0
alert: high_skew	skew=+14.73
alert: outliers	11.2% rows beyond 1.5 IQR

Fig 10.

Distribution of poverty_pop. Vertical dash marks the median.

Show data table

Histogram bins for poverty_pop (median: 3799.5).
bin	count
3 – 3.36e+04	2963
3.36e+04 – 6.72e+04	152
6.72e+04 – 1.008e+05	49
1.008e+05 – 1.344e+05	19
1.344e+05 – 1.68e+05	10
1.68e+05 – 2.016e+05	6
2.016e+05 – 2.352e+05	3
2.352e+05 – 2.688e+05	3
2.688e+05 – 3.024e+05	5
3.024e+05 – 3.36e+05	1
3.36e+05 – 3.696e+05	4
3.696e+05 – 4.032e+05	1
4.032e+05 – 4.368e+05	1
4.368e+05 – 4.704e+05	0
4.704e+05 – 5.04e+05	2
5.04e+05 – 5.376e+05	0
5.376e+05 – 5.712e+05	0
5.712e+05 – 6.048e+05	0
6.048e+05 – 6.384e+05	0
6.384e+05 – 6.72e+05	0
6.72e+05 – 7.056e+05	1
7.056e+05 – 7.392e+05	1
7.392e+05 – 7.728e+05	0
7.728e+05 – 8.064e+05	0
8.064e+05 – 8.4e+05	0
8.4e+05 – 8.736e+05	0
8.736e+05 – 9.072e+05	0
9.072e+05 – 9.408e+05	0
9.408e+05 – 9.744e+05	0
9.744e+05 – 1.008e+06	0
1.008e+06 – 1.042e+06	0
1.042e+06 – 1.075e+06	0
1.075e+06 – 1.109e+06	0
1.109e+06 – 1.142e+06	0
1.142e+06 – 1.176e+06	0
1.176e+06 – 1.21e+06	0
1.21e+06 – 1.243e+06	0
1.243e+06 – 1.277e+06	0
1.277e+06 – 1.31e+06	0
1.31e+06 – 1.344e+06	1

state numeric identifier

Numeric codes ranging from 1 to 72 with 52 unique values across 3222 rows and no nulls strongly suggest US state/territory FIPS codes rather than a true measurement. The near-uniform spread (mean 31.27, median 30, std 16.29, skew 0.16) and absence of outliers are consistent with a categorical identifier encoded as integers. Treating these as a continuous feature would be misleading.

Treatment: Cast to categorical and map FIPS codes to state names before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["state"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	52
min	1
max	72
mean	31.27
median	30
std	16.29
q1	19
q3	46
iqr	27
skew	0.1574
kurtosis	-0.6267
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 11.

Distribution of state. Vertical dash marks the median.

Show data table

Histogram bins for state (median: 30.0).
bin	count
1 – 2.775	97
2.775 – 4.55	15
4.55 – 6.325	133
6.325 – 8.1	64
8.1 – 9.875	9
9.875 – 11.65	4
11.65 – 13.42	226
13.42 – 15.2	5
15.2 – 16.98	44
16.98 – 18.75	194
18.75 – 20.52	204
20.52 – 22.3	184
22.3 – 24.07	40
24.07 – 25.85	14
25.85 – 27.62	170
27.62 – 29.4	197
29.4 – 31.17	149
31.17 – 32.95	17
32.95 – 34.73	31
34.73 – 36.5	95
36.5 – 38.27	153
38.27 – 40.05	165
40.05 – 41.82	36
41.82 – 43.6	67
43.6 – 45.38	51
45.38 – 47.15	161
47.15 – 48.92	254
48.92 – 50.7	43
50.7 – 52.47	133
52.47 – 54.25	94
54.25 – 56.02	95
56.02 – 57.8	0
57.8 – 59.57	0
59.57 – 61.35	0
61.35 – 63.12	0
63.12 – 64.9	0
64.9 – 66.67	0
66.67 – 68.45	0
68.45 – 70.22	0
70.22 – 72	78

county numeric foreign_key

Despite the name 'county', this column is stored as numeric with 330 unique integer values from 1 to 840 across 3,222 rows — consistent with a county FIPS or lookup code rather than a measured quantity. The distribution is heavily right-skewed (skew 2.87, kurtosis 11.6) with 178 outliers (5.5%), which is expected behavior for an ID-like code, not a meaningful statistical signal. No nulls or zeros are present.

Treatment: Cast to categorical/string and treat as a county code; do not use as a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["county"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	330
min	1
max	840
mean	103.2
median	79
std	106.6
q1	35
q3	133
iqr	98
skew	2.866
kurtosis	11.64
n_outliers	178
outlier_rate	0.05525
zero_rate	0
alert: high_skew	skew=+2.87
alert: outliers	5.5% rows beyond 1.5 IQR

Fig 12.

Distribution of county. Vertical dash marks the median.

Show data table

Histogram bins for county (median: 79.0).
bin	count
1 – 21.98	523
21.98 – 42.95	418
42.95 – 63.93	411
63.93 – 84.9	345
84.9 – 105.9	352
105.9 – 126.9	281
126.9 – 147.8	236
147.8 – 168.8	168
168.8 – 189.8	140
189.8 – 210.8	71
210.8 – 231.7	45
231.7 – 252.7	25
252.7 – 273.7	22
273.7 – 294.7	23
294.7 – 315.6	22
315.6 – 336.6	13
336.6 – 357.6	11
357.6 – 378.6	10
378.6 – 399.5	11
399.5 – 420.5	10
420.5 – 441.5	11
441.5 – 462.5	10
462.5 – 483.4	11
483.4 – 504.4	10
504.4 – 525.4	7
525.4 – 546.4	2
546.4 – 567.3	1
567.3 – 588.3	2
588.3 – 609.3	3
609.3 – 630.2	3
630.2 – 651.2	2
651.2 – 672.2	2
672.2 – 693.2	5
693.2 – 714.2	2
714.2 – 735.1	3
735.1 – 756.1	2
756.1 – 777.1	3
777.1 – 798.1	1
798.1 – 819	2
819 – 840	3

fips numeric identifier

This is the U.S. county FIPS code: every one of the 3222 rows is unique, with values spanning 1001 to 72153, consistent with state-prefixed county identifiers. The distribution is near-symmetric (skew 0.16, kurtosis -0.63) and has no outliers or nulls, as expected for a structured code rather than a measurement. Despite being numeric, the values are categorical labels and arithmetic on them is meaningless.

Treatment: treat as a categorical key and left-join county-level attributes on it rather than using it as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["fips"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	3,222
min	1,001
max	72,153
mean	3.138e+04
median	30,022
std	1.63e+04
q1	1.903e+04
q3	4.61e+04
iqr	27,075
skew	0.1574
kurtosis	-0.6314
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 13.

Distribution of fips. Vertical dash marks the median.

Show data table

Histogram bins for fips (median: 30022.0).
bin	count
1001 – 2780	97
2780 – 4559	15
4559 – 6337	133
6337 – 8116	59
8116 – 9895	14
9895 – 1.167e+04	4
1.167e+04 – 1.345e+04	226
1.345e+04 – 1.523e+04	5
1.523e+04 – 1.701e+04	49
1.701e+04 – 1.879e+04	189
1.879e+04 – 2.057e+04	204
2.057e+04 – 2.235e+04	184
2.235e+04 – 2.413e+04	39
2.413e+04 – 2.59e+04	15
2.59e+04 – 2.768e+04	170
2.768e+04 – 2.946e+04	196
2.946e+04 – 3.124e+04	150
3.124e+04 – 3.302e+04	27
3.302e+04 – 3.48e+04	21
3.48e+04 – 3.658e+04	95
3.658e+04 – 3.836e+04	153
3.836e+04 – 4.013e+04	155
4.013e+04 – 4.191e+04	46
4.191e+04 – 4.369e+04	67
4.369e+04 – 4.547e+04	51
4.547e+04 – 4.725e+04	161
4.725e+04 – 4.903e+04	268
4.903e+04 – 5.081e+04	29
5.081e+04 – 5.259e+04	133
5.259e+04 – 5.436e+04	94
5.436e+04 – 5.614e+04	95
5.614e+04 – 5.792e+04	0
5.792e+04 – 5.97e+04	0
5.97e+04 – 6.148e+04	0
6.148e+04 – 6.326e+04	0
6.326e+04 – 6.504e+04	0
6.504e+04 – 6.682e+04	0
6.682e+04 – 6.86e+04	0
6.86e+04 – 7.037e+04	0
7.037e+04 – 7.215e+04	78

poverty_rate numeric feature

This column appears to be a county- or area-level poverty rate expressed as a percentage, with 3222 rows, 1719 unique values, and no nulls. The distribution is right-skewed (skew 2.10, kurtosis 6.89) with a median of 13.55 and mean 15.10, but a long tail stretching to a max of 66.32 versus a min of 1.6. About 4.25% of rows (137) are flagged as outliers, consistent with a small set of severely impoverished areas.

Treatment: Consider a log or sqrt transform before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["poverty_rate"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,719
min	1.6
max	66.32
mean	15.1
median	13.55
std	7.706
q1	10.16
q3	17.91
iqr	7.75
skew	2.096
kurtosis	6.891
n_outliers	137
outlier_rate	0.04252
zero_rate	0
alert: high_skew	skew=+2.10

Fig 14.

Distribution of poverty_rate. Vertical dash marks the median.

Show data table

Histogram bins for poverty_rate (median: 13.55).
bin	count
1.6 – 3.218	7
3.218 – 4.836	34
4.836 – 6.454	106
6.454 – 8.072	246
8.072 – 9.69	320
9.69 – 11.31	354
11.31 – 12.93	393
12.93 – 14.54	364
14.54 – 16.16	306
16.16 – 17.78	262
17.78 – 19.4	192
19.4 – 21.02	149
21.02 – 22.63	123
22.63 – 24.25	91
24.25 – 25.87	52
25.87 – 27.49	44
27.49 – 29.11	34
29.11 – 30.72	23
30.72 – 32.34	18
32.34 – 33.96	14
33.96 – 35.58	6
35.58 – 37.2	8
37.2 – 38.81	3
38.81 – 40.43	8
40.43 – 42.05	5
42.05 – 43.67	9
43.67 – 45.29	4
45.29 – 46.9	11
46.9 – 48.52	7
48.52 – 50.14	8
50.14 – 51.76	2
51.76 – 53.38	6
53.38 – 54.99	5
54.99 – 56.61	5
56.61 – 58.23	1
58.23 – 59.85	0
59.85 – 61.47	0
61.47 – 63.08	0
63.08 – 64.7	1
64.7 – 66.32	1

snap_eligible_est numeric feature

A numeric estimate of SNAP-eligible counts per record, with 3222 non-null rows and 2839 unique values. The distribution is severely right-skewed (skew 14.73, kurtosis 342.21): the median is 3799.5 but the max reaches 1,343,978, and 11.2% of rows flag as outliers. No nulls or zeros are present, so the spread is real, not missingness artefact.

Treatment: log-transform (or winsorize) before any distance- or variance-sensitive modelling.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["snap_eligible_est"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2,839
min	3
max	1.344e+06
mean	1.3e+04
median	3800
std	4.326e+04
q1	1526
q3	9768
iqr	8242
skew	14.73
kurtosis	342.2
n_outliers	362
outlier_rate	0.1124
zero_rate	0
alert: high_skew	skew=+14.73
alert: outliers	11.2% rows beyond 1.5 IQR

Fig 15.

Distribution of snap_eligible_est. Vertical dash marks the median.

Show data table

Histogram bins for snap_eligible_est (median: 3799.5).
bin	count
3 – 3.36e+04	2963
3.36e+04 – 6.72e+04	152
6.72e+04 – 1.008e+05	49
1.008e+05 – 1.344e+05	19
1.344e+05 – 1.68e+05	10
1.68e+05 – 2.016e+05	6
2.016e+05 – 2.352e+05	3
2.352e+05 – 2.688e+05	3
2.688e+05 – 3.024e+05	5
3.024e+05 – 3.36e+05	1
3.36e+05 – 3.696e+05	4
3.696e+05 – 4.032e+05	1
4.032e+05 – 4.368e+05	1
4.368e+05 – 4.704e+05	0
4.704e+05 – 5.04e+05	2
5.04e+05 – 5.376e+05	0
5.376e+05 – 5.712e+05	0
5.712e+05 – 6.048e+05	0
6.048e+05 – 6.384e+05	0
6.384e+05 – 6.72e+05	0
6.72e+05 – 7.056e+05	1
7.056e+05 – 7.392e+05	1
7.392e+05 – 7.728e+05	0
7.728e+05 – 8.064e+05	0
8.064e+05 – 8.4e+05	0
8.4e+05 – 8.736e+05	0
8.736e+05 – 9.072e+05	0
9.072e+05 – 9.408e+05	0
9.408e+05 – 9.744e+05	0
9.744e+05 – 1.008e+06	0
1.008e+06 – 1.042e+06	0
1.042e+06 – 1.075e+06	0
1.075e+06 – 1.109e+06	0
1.109e+06 – 1.142e+06	0
1.142e+06 – 1.176e+06	0
1.176e+06 – 1.21e+06	0
1.21e+06 – 1.243e+06	0
1.243e+06 – 1.277e+06	0
1.277e+06 – 1.31e+06	0
1.31e+06 – 1.344e+06	1

snap_participants_est numeric feature

Estimated SNAP participant counts per record, ranging from 2 to 900,465 with a median of 2,546 and mean of 8,711. The distribution is severely right-skewed (skew 14.73, kurtosis 342.21) with 362 outliers (11.2%) and a standard deviation (28,987) more than three times the mean, suggesting a few very large jurisdictions dominate. No nulls or zeros are present across 3,222 rows.

Treatment: log-transform before regression to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["snap_participants_est"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	2,636
min	2
max	900,465
mean	8711
median	2,546
std	2.899e+04
q1	1022
q3	6544
iqr	5,522
skew	14.73
kurtosis	342.2
n_outliers	362
outlier_rate	0.1124
zero_rate	0
alert: high_skew	skew=+14.73
alert: outliers	11.2% rows beyond 1.5 IQR

Fig 16.

Distribution of snap_participants_est. Vertical dash marks the median.

Show data table

Histogram bins for snap_participants_est (median: 2546.0).
bin	count
2 – 2.251e+04	2963
2.251e+04 – 4.503e+04	152
4.503e+04 – 6.754e+04	49
6.754e+04 – 9.005e+04	19
9.005e+04 – 1.126e+05	10
1.126e+05 – 1.351e+05	6
1.351e+05 – 1.576e+05	3
1.576e+05 – 1.801e+05	3
1.801e+05 – 2.026e+05	5
2.026e+05 – 2.251e+05	1
2.251e+05 – 2.476e+05	4
2.476e+05 – 2.701e+05	1
2.701e+05 – 2.927e+05	1
2.927e+05 – 3.152e+05	0
3.152e+05 – 3.377e+05	2
3.377e+05 – 3.602e+05	0
3.602e+05 – 3.827e+05	0
3.827e+05 – 4.052e+05	0
4.052e+05 – 4.277e+05	0
4.277e+05 – 4.502e+05	0
4.502e+05 – 4.727e+05	1
4.727e+05 – 4.953e+05	1
4.953e+05 – 5.178e+05	0
5.178e+05 – 5.403e+05	0
5.403e+05 – 5.628e+05	0
5.628e+05 – 5.853e+05	0
5.853e+05 – 6.078e+05	0
6.078e+05 – 6.303e+05	0
6.303e+05 – 6.528e+05	0
6.528e+05 – 6.753e+05	0
6.753e+05 – 6.979e+05	0
6.979e+05 – 7.204e+05	0
7.204e+05 – 7.429e+05	0
7.429e+05 – 7.654e+05	0
7.654e+05 – 7.879e+05	0
7.879e+05 – 8.104e+05	0
8.104e+05 – 8.329e+05	0
8.329e+05 – 8.554e+05	0
8.554e+05 – 8.78e+05	0
8.78e+05 – 9.005e+05	1

no_vehicle_total numeric feature

This column appears to be an aggregate vehicle count (likely total number of vehicles per record/area). The distribution is extremely heavy-tailed: median is 580 but the mean is 3304 and the maximum reaches 601,621, with skew of 20.26 and kurtosis of 501.27. About 12.6% of rows (407) flag as outliers, while only 0.37% are zeros and there are no nulls.

Treatment: Log-transform (or winsorize) before any distance- or variance-based modelling.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["no_vehicle_total"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,823
min	0
max	601,621
mean	3304
median	580
std	2.005e+04
q1	223
q3	1555
iqr	1332
skew	20.26
kurtosis	501.3
n_outliers	407
outlier_rate	0.1263
zero_rate	0.003724
alert: high_skew	skew=+20.26
alert: outliers	12.6% rows beyond 1.5 IQR

Fig 17.

Distribution of no_vehicle_total. Vertical dash marks the median.

Show data table

Histogram bins for no_vehicle_total (median: 580.0).
bin	count
0 – 1.504e+04	3108
1.504e+04 – 3.008e+04	62
3.008e+04 – 4.512e+04	19
4.512e+04 – 6.016e+04	9
6.016e+04 – 7.52e+04	8
7.52e+04 – 9.024e+04	2
9.024e+04 – 1.053e+05	4
1.053e+05 – 1.203e+05	3
1.203e+05 – 1.354e+05	0
1.354e+05 – 1.504e+05	0
1.504e+05 – 1.654e+05	0
1.654e+05 – 1.805e+05	0
1.805e+05 – 1.955e+05	1
1.955e+05 – 2.106e+05	0
2.106e+05 – 2.256e+05	0
2.256e+05 – 2.406e+05	0
2.406e+05 – 2.557e+05	0
2.557e+05 – 2.707e+05	0
2.707e+05 – 2.858e+05	0
2.858e+05 – 3.008e+05	2
3.008e+05 – 3.159e+05	0
3.159e+05 – 3.309e+05	1
3.309e+05 – 3.459e+05	0
3.459e+05 – 3.61e+05	0
3.61e+05 – 3.76e+05	1
3.76e+05 – 3.911e+05	0
3.911e+05 – 4.061e+05	0
4.061e+05 – 4.211e+05	0
4.211e+05 – 4.362e+05	0
4.362e+05 – 4.512e+05	0
4.512e+05 – 4.663e+05	0
4.663e+05 – 4.813e+05	0
4.813e+05 – 4.963e+05	0
4.963e+05 – 5.114e+05	0
5.114e+05 – 5.264e+05	0
5.264e+05 – 5.415e+05	0
5.415e+05 – 5.565e+05	1
5.565e+05 – 5.715e+05	0
5.715e+05 – 5.866e+05	0
5.866e+05 – 6.016e+05	1

no_vehicle_pct numeric feature

Likely a per-area percentage of households without a vehicle, given values bounded between 0.0 and 85.94 with a median of 5.41 and Q1-Q3 of 3.98-7.36. The distribution is severely right-skewed (skew 6.98, kurtosis 86.23) with 140 outliers (4.35%) stretching far above the typical range, while only 0.37% of rows are exactly zero. No nulls across 3,222 rows.

Treatment: Apply a log1p or similar transform before regression to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[43]:

saturn.columns["no_vehicle_pct"].stats

stat	value
n	3,222
nulls	0 (0.0%)
unique	1,065
min	0
max	85.94
mean	6.197
median	5.41
std	4.538
q1	3.98
q3	7.36
iqr	3.38
skew	6.976
kurtosis	86.23
n_outliers	140
outlier_rate	0.04345
zero_rate	0.003724
alert: high_skew	skew=+6.98

Fig 18.

Distribution of no_vehicle_pct. Vertical dash marks the median.

Show data table

Histogram bins for no_vehicle_pct (median: 5.41).
bin	count
0 – 2.148	161
2.148 – 4.297	823
4.297 – 6.445	1091
6.445 – 8.594	630
8.594 – 10.74	283
10.74 – 12.89	111
12.89 – 15.04	61
15.04 – 17.19	23
17.19 – 19.34	8
19.34 – 21.48	3
21.48 – 23.63	4
23.63 – 25.78	2
25.78 – 27.93	3
27.93 – 30.08	2
30.08 – 32.23	2
32.23 – 34.38	2
34.38 – 36.52	2
36.52 – 38.67	2
38.67 – 40.82	0
40.82 – 42.97	1
42.97 – 45.12	0
45.12 – 47.27	1
47.27 – 49.42	0
49.42 – 51.56	0
51.56 – 53.71	0
53.71 – 55.86	1
55.86 – 58.01	1
58.01 – 60.16	0
60.16 – 62.31	2
62.31 – 64.45	0
64.45 – 66.6	1
66.6 – 68.75	0
68.75 – 70.9	0
70.9 – 73.05	0
73.05 – 75.2	0
75.2 – 77.35	0
77.35 – 79.49	1
79.49 – 81.64	0
81.64 – 83.79	0
83.79 – 85.94	1

food deserts food desert merged

Overview

Summary confidence: high

name text identifier

total_pop numeric feature

poverty_pop numeric feature

state numeric identifier

county numeric foreign_key

fips numeric identifier

poverty_rate numeric feature

snap_eligible_est numeric feature

snap_participants_est numeric feature

no_vehicle_total numeric feature

no_vehicle_pct numeric feature

How to cite