data-trove-food-desert-states-summary

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/food_desert_states.json

Saturn profiled 51 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/food_desert_states.json",
    "--findings", "data-trove-food-desert-states-summary.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset contains one row per U.S. state (plus D.C., 51 rows total) with figures on food desert populations, vehicle access, and poverty. The most striking feature is the extreme right-skew in desert-exposed population counts: the median desertPop is just 21,000 but the max reaches 449,000, with 6 outlier states driving the distribution far above the norm — a pattern mirrored almost identically in noVehicle counts. Poverty rate, by contrast, is far more normally distributed (mean 12.4%, std 2.6%), suggesting that food desert exposure is more strongly shaped by state size and car dependency than by poverty alone — worth cross-examining. The noVehiclePct column (max 17.37% vs. median 2.45%) flags a small handful of states with dramatically higher car-free household rates that likely align with the desertPop outliers.

citing: desertPop.stats.median · desertPop.stats.max · desertPop.stats.n_outliers · noVehicle.stats.n_outliers · noVehiclePct.stats.max · noVehiclePct.stats.median · povertyRate.stats.mean · povertyRate.stats.std · pop.stats.skew

Out[4]:

saturn.schema() · 11 columns

column	kind	n	null%	unique	alerts
name	categorical	51	0.0%	51	long_tail
abbr	categorical	51	0.0%	51	long_tail
pop	numeric	51	0.0%	51	high_skew outliers
desertPop	numeric	51	0.0%	34	high_skew outliers
povertyPop	numeric	51	0.0%	49	high_skew outliers
noVehicle	numeric	51	0.0%	45	high_skew outliers
povertyRate	numeric	51	0.0%	50	outliers
noVehiclePct	numeric	51	0.0%	45	high_skew outliers
counties	numeric	51	0.0%	46
lat	numeric	51	0.0%	51
lon	numeric	51	0.0%	48

Fig 1.

desertPop · Look for the long right tail — a handful of states have food desert populations many times larger than the median of 21,000.

Show data table

Histogram bins for desertPop (median: 21.0).
bin	count
1 – 65	44
65 – 129	5
129 – 193	1
193 – 257	0
257 – 321	0
321 – 385	0
385 – 449	1

Fig 2.

noVehiclePct · Most states cluster below 4%, but outliers push past 17% — identify which states are extreme car-free outliers.

Show data table

Histogram bins for noVehiclePct (median: 2.45).
bin	count
1.29 – 3.587	44
3.587 – 5.884	5
5.884 – 8.181	0
8.181 – 10.48	0
10.48 – 12.78	1
12.78 – 15.07	0
15.07 – 17.37	1

Fig 3.

povertyRate · Poverty rate is the most normally distributed metric here, centered around 12.4%, offering a useful contrast to the skewed desert and vehicle columns.

Show data table

Histogram bins for povertyRate (median: 11.91).
bin	count
7.33 – 9.026	2
9.026 – 10.72	14
10.72 – 12.42	14
12.42 – 14.11	11
14.11 – 15.81	4
15.81 – 17.5	3
17.5 – 19.2	3

Fig 4.

counties · County counts vary widely from 1 to 254 — check whether states with more counties show higher aggregate desert or poverty exposure.

Show data table

Histogram bins for counties (median: 62.0).
bin	count
1 – 37.14	18
37.14 – 73.29	15
73.29 – 109.4	13
109.4 – 145.6	3
145.6 – 181.7	1
181.7 – 217.9	0
217.9 – 254	1

Fig 5.

povertyPop · Like desertPop, poverty population is heavily right-skewed; compare these two distributions to see how well they track each other across states.

Show data table

Histogram bins for povertyPop (median: 548.0).
bin	count
60 – 720.7	32
720.7 – 1381	11
1381 – 2042	4
2042 – 2703	1
2703 – 3364	1
3364 – 4024	1
4024 – 4685	1

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
name	categorical	0.0%
abbr	categorical	0.0%
pop	numeric	0.0%
desertPop	numeric	0.0%
povertyPop	numeric	0.0%
noVehicle	numeric	0.0%
povertyRate	numeric	0.0%
noVehiclePct	numeric	0.0%
counties	numeric	0.0%
lat	numeric	0.0%
lon	numeric	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 9 numeric columns (values clipped to 2 decimals).
	pop	desertPop	povertyPop	noVehicle	povertyRate	noVehiclePct	counties	lat	lon
pop	+1.00	+0.68	+0.99	+0.70	+0.06	+0.05	+0.45	-0.25	+0.03
desertPop	+0.68	+1.00	+0.69	+1.00	+0.12	+0.44	+0.20	-0.07	+0.16
povertyPop	+0.99	+0.69	+1.00	+0.70	+0.17	+0.06	+0.50	-0.30	+0.04
noVehicle	+0.70	+1.00	+0.70	+1.00	+0.06	+0.44	+0.19	-0.05	+0.17
povertyRate	+0.06	+0.12	+0.17	+0.06	+1.00	+0.14	+0.28	-0.43	+0.10
noVehiclePct	+0.05	+0.44	+0.06	+0.44	+0.14	+1.00	-0.22	+0.07	+0.25
counties	+0.45	+0.20	+0.50	+0.19	+0.28	-0.22	+1.00	-0.22	+0.07
lat	-0.25	-0.07	-0.30	-0.05	-0.43	+0.07	-0.22	+1.00	-0.09
lon	+0.03	+0.16	+0.04	+0.17	+0.10	+0.25	+0.07	-0.09	+1.00

name categorical label

This column contains US state names, with all 51 entries being unique (cardinality = 51, n = 51), consistent with a full list of US states plus Washington D.C. or a territory. Entropy ratio is exactly 1.0, meaning perfect uniformity — every value appears exactly once (top_rate = 0.0196, or 1/51). The 'long_tail' alert is technically correct but misleading here: the distribution is not skewed, it is perfectly flat.

Treatment: Use as a categorical label or join key for state-level lookups; one-hot encode or ordinal-map if used as a model feature.

anthropic:default · confidence high

Out[13]:

saturn.columns["name"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	51
top_value	New York
top_rate	0.01961
cardinality	51
entropy	5.672
entropy_ratio	1
alert: long_tail	51 singleton categories

Fig 8.

Top values for name.

Show data table

Top values for name (20 unique shown, of 51 total).
value	count	share
New York	1	2.0%
California	1	2.0%
Texas	1	2.0%
Florida	1	2.0%
Pennsylvania	1	2.0%
Illinois	1	2.0%
Ohio	1	2.0%
Michigan	1	2.0%
New Jersey	1	2.0%
Massachusetts	1	2.0%
Georgia	1	2.0%
North Carolina	1	2.0%
Louisiana	1	2.0%
Missouri	1	2.0%
Indiana	1	2.0%
Tennessee	1	2.0%
Washington	1	2.0%
Arizona	1	2.0%
Virginia	1	2.0%
Kentucky	1	2.0%

abbr categorical identifier

This column contains two-letter US state abbreviations, with exactly 51 unique values across 51 rows — covering all 50 states plus one additional entry (likely Washington D.C. or a territory). Every value appears exactly once (top_rate = 0.0196), yielding a perfect entropy ratio of 1.0, meaning this is a fully uniform identifier with zero redundancy. The 'long_tail' alert is a statistical artifact of perfect uniformity, not a genuine concern here.

Treatment: Use as a join key or primary identifier for state-level lookups; one-hot encode or map to region groupings if used as a feature.

anthropic:default · confidence high

Out[16]:

saturn.columns["abbr"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	51
top_value	NY
top_rate	0.01961
cardinality	51
entropy	5.672
entropy_ratio	1
alert: long_tail	51 singleton categories

Fig 9.

Top values for abbr.

Show data table

Top values for abbr (20 unique shown, of 51 total).
value	count	share
NY	1	2.0%
CA	1	2.0%
TX	1	2.0%
FL	1	2.0%
PA	1	2.0%
IL	1	2.0%
OH	1	2.0%
MI	1	2.0%
NJ	1	2.0%
MA	1	2.0%
GA	1	2.0%
NC	1	2.0%
LA	1	2.0%
MO	1	2.0%
IN	1	2.0%
TN	1	2.0%
WA	1	2.0%
AZ	1	2.0%
VA	1	2.0%
KY	1	2.0%

pop numeric feature

This column likely represents population counts for 51 distinct geographic or administrative units (e.g., U.S. states or territories), given exactly 51 fully unique, non-null integer values. The distribution is heavily right-skewed (skew = 2.58, kurtosis = 7.61), with a median of 4,372 far below the mean of 6,338 and a maximum of 38,643 — suggesting a small number of very large-population entities pulling the tail; 4 outliers (~7.8% of rows) drive this effect. The std of 7,243 exceeds the mean, confirming high dispersion relative to the central tendency.

Treatment: Log-transform before regression or distance-based modelling to reduce skew and outlier influence.

anthropic:default · confidence medium

Out[19]:

saturn.columns["pop"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	51
min	564
max	38,643
mean	6338
median	4,372
std	7243
q1	1770
q3	7,285
iqr	5514
skew	2.583
kurtosis	7.608
n_outliers	4
outlier_rate	0.07843
zero_rate	0
alert: high_skew	skew=+2.58
alert: outliers	7.8% rows beyond 1.5 IQR

Fig 10.

Distribution of pop. Vertical dash marks the median.

Show data table

Histogram bins for pop (median: 4372.0).
bin	count
564 – 6004	33
6004 – 1.144e+04	11
1.144e+04 – 1.688e+04	3
1.688e+04 – 2.232e+04	2
2.232e+04 – 2.776e+04	0
2.776e+04 – 3.32e+04	1
3.32e+04 – 3.864e+04	1

desertPop numeric feature

This column likely represents a population count associated with desert regions (e.g., population living in desert areas, possibly by U.S. state or similar unit given n=51). The distribution is severely right-skewed (skew=4.73, kurtosis=25.51): the median is just 21.0 while the mean is 38.27 and the max reaches 449.0, indicating a small number of entities dominate desert population totals. With 6 outliers (≈11.8% of rows) and a standard deviation of 67.39 against a median of 21.0, those extreme values will heavily distort any linear model trained on raw values.

Treatment: Log-transform (or apply sqrt) before modelling to reduce skew; investigate and cap or flag the 6 outliers separately.

anthropic:default · confidence medium

Out[22]:

saturn.columns["desertPop"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	34
min	1
max	449
mean	38.27
median	21
std	67.39
q1	6
q3	35.5
iqr	29.5
skew	4.734
kurtosis	25.51
n_outliers	6
outlier_rate	0.1176
zero_rate	0
alert: high_skew	skew=+4.73
alert: outliers	11.8% rows beyond 1.5 IQR

Fig 11.

Distribution of desertPop. Vertical dash marks the median.

Show data table

Histogram bins for desertPop (median: 21.0).
bin	count
1 – 65	44
65 – 129	5
129 – 193	1
193 – 257	0
257 – 321	0
321 – 385	0
385 – 449	1

povertyPop numeric feature

This column likely represents a count of people living in poverty, measured per U.S. state (n=51, matching the 50 states plus DC). The distribution is heavily right-skewed (skew=2.53, kurtosis=6.80), with a median of 548 but a mean of 794 and a maximum of 4685, indicating a small number of high-population states pull the mean well above the typical value. Four outliers (~7.8% of rows) are flagged, likely corresponding to the most populous states with the largest absolute poverty populations. The near-uniqueness (49 of 51 distinct values) suggests this is a genuine count variable, not a derived category.

Treatment: Log-transform before regression to reduce skew and mitigate outlier influence.

anthropic:default · confidence high

Out[25]:

saturn.columns["povertyPop"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	49
min	60
max	4,685
mean	794
median	548
std	932.9
q1	198
q3	860.5
iqr	662.5
skew	2.526
kurtosis	6.8
n_outliers	4
outlier_rate	0.07843
zero_rate	0
alert: high_skew	skew=+2.53
alert: outliers	7.8% rows beyond 1.5 IQR

Fig 12.

Distribution of povertyPop. Vertical dash marks the median.

Show data table

Histogram bins for povertyPop (median: 548.0).
bin	count
60 – 720.7	32
720.7 – 1381	11
1381 – 2042	4
2042 – 2703	1
2703 – 3364	1
3364 – 4024	1
4024 – 4685	1

noVehicle numeric feature

This column likely represents a count of households or individuals without access to a vehicle, aggregated at some geographic unit (e.g., census tract or neighbourhood) across 51 observations. The distribution is severely right-skewed (skew = 4.41, kurtosis = 22.57), with a median of 115 but a mean pulled to 204.9 by a long upper tail reaching 2202. Six outliers (≈11.8% of rows) are driving this extreme shape, suggesting a small number of densely populated or car-deprived areas dominate the upper end while most units cluster between 40 and 203 (IQR = 163).

Treatment: Log-transform or apply a robust scaler before modelling; investigate the 6 outlier units for data quality or genuine extreme values.

anthropic:default · confidence medium

Out[28]:

saturn.columns["noVehicle"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	45
min	8
max	2,202
mean	204.9
median	115
std	337.4
q1	40
q3	203
iqr	163
skew	4.408
kurtosis	22.57
n_outliers	6
outlier_rate	0.1176
zero_rate	0
alert: high_skew	skew=+4.41
alert: outliers	11.8% rows beyond 1.5 IQR

Fig 13.

Distribution of noVehicle. Vertical dash marks the median.

Show data table

Histogram bins for noVehicle (median: 115.0).
bin	count
8 – 321.4	42
321.4 – 634.9	7
634.9 – 948.3	1
948.3 – 1262	0
1262 – 1575	0
1575 – 1889	0
1889 – 2202	1

povertyRate numeric feature

This column represents poverty rate (likely percentage of population below the poverty line) across 51 observations — almost certainly U.S. states plus DC. Values range from 7.33 to 19.2 with a mean of 12.35 and median of 11.91, indicating a modest right skew (skew=0.75) consistent with a handful of higher-poverty states pulling the tail. Three outliers (~5.9% of rows) at the upper end are flagged, likely representing the highest-poverty states; the near-zero kurtosis (0.20) suggests the distribution is otherwise fairly normal.

Treatment: Use as-is or apply mild log-transform to reduce right skew before regression; investigate 3 upper outliers for leverage effects.

anthropic:default · confidence high

Out[31]:

saturn.columns["povertyRate"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	50
min	7.33
max	19.2
mean	12.35
median	11.91
std	2.632
q1	10.46
q3	13.57
iqr	3.11
skew	0.753
kurtosis	0.1951
n_outliers	3
outlier_rate	0.05882
zero_rate	0
alert: outliers	5.9% rows beyond 1.5 IQR

Fig 14.

Distribution of povertyRate. Vertical dash marks the median.

Show data table

Histogram bins for povertyRate (median: 11.91).
bin	count
7.33 – 9.026	2
9.026 – 10.72	14
10.72 – 12.42	14
12.42 – 14.11	11
14.11 – 15.81	4
15.81 – 17.5	3
17.5 – 19.2	3

noVehiclePct numeric feature

This column represents the percentage of households without a vehicle, likely a census or survey-derived socioeconomic indicator across 51 geographic units (e.g., states or counties). The distribution is heavily right-skewed (skew=4.53, kurtosis=21.72) with the bulk of values tightly clustered between Q1=2.15% and Q3=3.05%, yet 3 outliers pull the max to 17.37% — more than 5× the median of 2.45%. That extreme upper tail almost certainly reflects a high-density urban area (e.g., New York City) where car-free households are far more common than in typical units.

Treatment: Cap or Winsorize at the 95th percentile before modelling, or log-transform to compress the extreme upper tail.

anthropic:default · confidence high

Out[34]:

saturn.columns["noVehiclePct"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	45
min	1.29
max	17.37
mean	3.092
median	2.45
std	2.484
q1	2.15
q3	3.05
iqr	0.9
skew	4.533
kurtosis	21.72
n_outliers	3
outlier_rate	0.05882
zero_rate	0
alert: high_skew	skew=+4.53
alert: outliers	5.9% rows beyond 1.5 IQR

Fig 15.

Distribution of noVehiclePct. Vertical dash marks the median.

Show data table

Histogram bins for noVehiclePct (median: 2.45).
bin	count
1.29 – 3.587	44
3.587 – 5.884	5
5.884 – 8.181	0
8.181 – 10.48	0
10.48 – 12.78	1
12.78 – 15.07	0
15.07 – 17.37	1

counties numeric feature

This column most likely represents the number of counties per U.S. state (plus D.C.), matching the dataset's 51 rows exactly. The mean of ~62 and median of 62 are consistent with typical state county counts, while the maximum of 254 is almost certainly Texas (which has 254 counties). The distribution is right-skewed (skew 1.44) with high kurtosis (3.91), driven by that single outlier — Texas — which sits far above the rest of the distribution.

Treatment: Use as-is for regression/analysis; consider log-transform to reduce skew caused by the Texas outlier.

anthropic:default · confidence high

Out[37]:

saturn.columns["counties"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	46
min	1
max	254
mean	61.65
median	62
std	46.73
q1	23.5
q3	87.5
iqr	64
skew	1.442
kurtosis	3.907
n_outliers	1
outlier_rate	0.01961
zero_rate	0

Fig 16.

Distribution of counties. Vertical dash marks the median.

Show data table

Histogram bins for counties (median: 62.0).
bin	count
1 – 37.14	18
37.14 – 73.29	15
73.29 – 109.4	13
109.4 – 145.6	3
145.6 – 181.7	1
181.7 – 217.9	0
217.9 – 254	1

lat numeric feature

This column contains latitude coordinates, almost certainly representing the 50 US states plus Washington D.C. (n=51, all unique). The mean of 39.57 and median of 39.55 are tightly aligned, indicating near-symmetric distribution centered on the mid-continental US, though a kurtosis of 3.94 flags heavier tails than normal — driven by the 2 outliers likely corresponding to Alaska (max 64.2) and Hawaii (min 19.9).

Treatment: Use as-is or pair with a longitude column for geospatial modelling; consider flagging Alaska and Hawaii as geographic outliers if contiguous-US analysis is intended.

anthropic:default · confidence high

Out[40]:

saturn.columns["lat"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	51
min	19.9
max	64.2
mean	39.57
median	39.55
std	6.418
q1	35.64
q3	43.13
iqr	7.495
skew	0.4074
kurtosis	3.94
n_outliers	2
outlier_rate	0.03922
zero_rate	0

Fig 17.

Distribution of lat. Vertical dash marks the median.

Show data table

Histogram bins for lat (median: 39.55).
bin	count
19.9 – 26.23	1
26.23 – 32.56	6
32.56 – 38.89	13
38.89 – 45.21	25
45.21 – 51.54	5
51.54 – 57.87	0
57.87 – 64.2	1

lon numeric feature

This column contains longitude coordinates, almost certainly representing geographic locations of 51 entities (e.g., US states or cities), all with negative values indicating the Western Hemisphere. The range spans -155.58 to -69.45, consistent with continental US plus Hawaii (≈-155°), and the left skew (skew = -1.27) reflects Hawaii and Alaska pulling the distribution westward. Three duplicate longitude values exist (51 records, 48 unique), and 2 outliers (~3.9%) likely correspond to Hawaii and/or Alaska.

Treatment: Use as-is for spatial analysis or pair with latitude for geographic modelling; consider flagging the 2 outlier values (Hawaii/Alaska) if contiguous-US-only analysis is needed.

anthropic:default · confidence high

Out[43]:

saturn.columns["lon"].stats

stat	value
n	51
nulls	0 (0.0%)
unique	48
min	-155.6
max	-69.45
mean	-93.36
median	-89.4
std	19.13
q1	-103.4
q3	-78.84
iqr	24.55
skew	-1.274
kurtosis	1.845
n_outliers	2
outlier_rate	0.03922
zero_rate	0

Fig 18.

Distribution of lon. Vertical dash marks the median.

Show data table

Histogram bins for lon (median: -89.4).
bin	count
-155.6 – -143.3	2
-143.3 – -131	0
-131 – -118.7	3
-118.7 – -106.4	6
-106.4 – -94.06	9
-94.06 – -81.75	14
-81.75 – -69.45	17

data trove food desert states summary

Overview

Summary confidence: high

name categorical label

abbr categorical identifier

pop numeric feature

desertPop numeric feature

povertyPop numeric feature

noVehicle numeric feature

povertyRate numeric feature

noVehiclePct numeric feature

counties numeric feature

lat numeric feature

lon numeric feature

How to cite