saturn·

food deserts vehicle access

source /home/coolhand/html/datavis/data_trove/data/urban/food_deserts/vehicle_access.csv 3,222 rows 9 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset covers vehicle access for 3,222 US counties (one row per county, identified by FIPS code and name) across 9 columns, with no missing values. The household and no-vehicle counts are extremely right-skewed — `no_vehicle_total` has a median of 580 but a max of 601,621, and `total_households` ranges from 32 up to roughly 3.36 million — so a handful of large urban counties dominate the absolute totals. The more comparable signal is `no_vehicle_pct`, which has a median of 5.41% but stretches up to 85.94%, flagging a small set of counties with extreme transit dependence worth investigating first. State coverage looks complete (52 distinct state codes), so geographic breakdowns should be straightforward.

citing: row_count · column_count · columns.no_vehicle_total.stats · columns.no_vehicle_pct.stats · columns.total_households.stats · columns.state.n_unique · columns.name.n_unique · columns.fips.n_unique

Schema

9 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name text 0.0% 3,222
near_unique
total_households numeric 0.0% 3,074
high_skew outliers
no_vehicle_owner numeric 0.0% 1,176
high_skew outliers
no_vehicle_renter numeric 0.0% 1,517
high_skew outliers
state numeric 0.0% 52
county numeric 0.0% 330
high_skew outliers
fips numeric 0.0% 3,222
no_vehicle_total numeric 0.0% 1,823
high_skew outliers
no_vehicle_pct numeric 0.0% 1,065
high_skew

name

text identifier near_unique
This column appears to hold US county names with state suffixes — 2,999 of 3,222 rows contain the token 'county,' and the remaining top words are state names (texas, virginia, georgia, north carolina, dakota, kentucky, missouri). Every value is unique (n_unique=3222, duplicate_rate=0.0) with no nulls, and lengths are tightly clustered (mean 24.3, min 16, max 59, p95 31), consistent with 'X County, State' formatting. The near_unique alert confirms this behaves as a row identifier rather than a categorical feature. Treatment: Use as a row label or join key on county; do not one-hot encode. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
len_min
16
len_max
59
len_mean
24.32
len_median
24
len_p95
31
word_mean
3.248
word_median
3
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,990
readability_flesch_mean
10.28
emoji_rate
0
url_rate
0
one_word_rate
0
allcaps_rate
0
boilerplate_rate
0

total_households

numeric feature high_skew outliers
Likely a count of households per geographic unit (e.g., county or tract), with 3222 rows, 3074 unique values, and no nulls or zeros. The distribution is extremely right-skewed (skew 12.05, kurtosis 240.5): the median is 10021 while the mean is 39402.86 and the max reaches 3363093, roughly 28x the standard deviation above the mean. Saturn flags 443 outliers (13.7%), consistent with a few very large jurisdictions dominating the tail. Treatment: Log-transform before regression to tame the skew and outliers. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,074
min
32
max
3.363e+06
mean
3.94e+04
median
10,021
std
1.201e+05
q1
4211
q3
25,939
iqr
2.173e+04
skew
12.05
kurtosis
240.5
n_outliers
443
outlier_rate
0.1375
zero_rate
0

no_vehicle_owner

numeric feature high_skew outliers
Likely a count of non-vehicle-owners aggregated per row (e.g., per geography or unit), ranging from 0 to 113,473 with a median of just 214. The distribution is severely right-skewed (skew 18.55, kurtosis 433.5) with 360 outliers (11.2%) and a std (3777.8) nearly 5x the mean (820.8), signalling a heavy tail dominated by a few extreme rows. Only 1.2% of rows are zero and there are no nulls, so the column is densely populated but dispersed across 1,176 unique values. Treatment: Apply a log1p transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,176
min
0
max
113,473
mean
820.8
median
214
std
3778
q1
81
q3
548.8
iqr
467.8
skew
18.55
kurtosis
433.5
n_outliers
360
outlier_rate
0.1117
zero_rate
0.01179

no_vehicle_renter

numeric feature high_skew outliers
A heavily right-skewed numeric count, plausibly the number of renters without a vehicle per record/area. The median is 351 but the mean is 2483 and the max reaches 488148, with skew 20.7 and kurtosis 517.5 driving 436 outliers (13.5%). About 1.5% of rows are zero and there are no nulls, so the long tail — not missingness — dominates the distribution. Treatment: Apply a log1p transform before modelling to tame the extreme skew and outliers. medium · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,517
min
0
max
488,148
mean
2483
median
351
std
1.646e+04
q1
125.2
q3
987.8
iqr
862.5
skew
20.69
kurtosis
517.5
n_outliers
436
outlier_rate
0.1353
zero_rate
0.01459

state

numeric feature
Despite being typed as numeric, this column holds 52 distinct integer values between 1 and 72 with no nulls or zeros, which matches a FIPS-style state code encoding (50 states plus DC and territories) rather than a true measurement. The near-uniform spread (mean 31.27, median 30, std 16.29, skew 0.16) and absence of outliers reinforce that these are categorical identifiers, not quantities. Treatment: Cast to categorical and one-hot or target-encode; do not use as a continuous variable. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
52
min
1
max
72
mean
31.27
median
30
std
16.29
q1
19
q3
46
iqr
27
skew
0.1574
kurtosis
-0.6267
n_outliers
0
outlier_rate
0
zero_rate
0

county

numeric identifier high_skew outliers
Despite the name 'county', the column is stored numerically with 330 unique values across 3222 rows and no nulls, suggesting it holds county FIPS codes or similar integer identifiers rather than a measured quantity. The distribution is heavily right-skewed (skew 2.87, kurtosis 11.6) with a median of 79 but a max of 840 and 178 outliers (5.5%), which is expected for code-like values but misleading if treated as a continuous feature. Treatment: Treat as a categorical code (e.g., FIPS) and one-hot or target-encode rather than using as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
330
min
1
max
840
mean
103.2
median
79
std
106.6
q1
35
q3
133
iqr
98
skew
2.866
kurtosis
11.64
n_outliers
178
outlier_rate
0.05525
zero_rate
0

fips

numeric identifier
This is the FIPS county code: every one of the 3222 rows is unique with no nulls, and the value range (1001 to 72153) matches the standard 5-digit state+county FIPS encoding. Distribution stats (mean 31377, median 30022, near-zero skew) are essentially meaningless here since the codes are categorical identifiers, not quantities. No outliers flagged, consistent with a clean geographic key. Treatment: Treat as a categorical geographic key; left-join on this code rather than using it as a numeric feature. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
3,222
min
1,001
max
72,153
mean
3.138e+04
median
30,022
std
1.63e+04
q1
1.903e+04
q3
4.61e+04
iqr
27,075
skew
0.1574
kurtosis
-0.6314
n_outliers
0
outlier_rate
0
zero_rate
0

no_vehicle_total

numeric feature high_skew outliers
Counts of vehicles (totals) per record, ranging from 0 to 601,621 with a median of 580 but a mean of 3,304. The distribution is extremely right-skewed (skew 20.26, kurtosis 501.27) with 407 outliers (12.6%) and a std (20,050) far exceeding the IQR (1,331.75), indicating a few enormous records dominate. Treatment: Apply a log1p transform and consider winsorising before modelling. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,823
min
0
max
601,621
mean
3304
median
580
std
2.005e+04
q1
223
q3
1555
iqr
1332
skew
20.26
kurtosis
501.3
n_outliers
407
outlier_rate
0.1263
zero_rate
0.003724

no_vehicle_pct

numeric feature high_skew
Likely the percentage of households (or similar units) with no vehicle, recorded for 3,222 rows with no nulls and values bounded between 0.0 and 85.94. The distribution is tightly clustered (median 5.41, IQR 3.38) but extremely right-skewed (skew 6.98, kurtosis 86.23), with 140 outliers (4.35%) pulling the mean to 6.20 well above the median. Only 0.37% are exact zeros, so true absence is rare; the long tail is the surprise here. Treatment: Apply a log1p or winsorising transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
3,222
nulls
0 (0.0%)
unique
1,065
min
0
max
85.94
mean
6.197
median
5.41
std
4.538
q1
3.98
q3
7.36
iqr
3.38
skew
6.976
kurtosis
86.23
n_outliers
140
outlier_rate
0.04345
zero_rate
0.003724