environmental desert data

source /home/coolhand/html/datavis/data_trove/data/environmental/desert_data.json 52,037 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 52,037 records describing US Census-tract-level demographics, with an 11-character ID, county and state labels, and five numeric measures: distance/share, income, population, poverty rate, and SNAP counts. State coverage spans all 51 entries (50 states plus DC), led by Texas (4,010), California (3,727), and Florida (3,018), and counties are dominated by common names like Jefferson and Montgomery. The income distribution is right-skewed (mean $78,215 vs median $70,455, max $250,001) with about 4% flagged as outliers, and poverty rate shows a similar skew (mean 13.7%, median 10.8%, max 99.5%). Worth a closer look: the strong skew and outlier rates in inc, pov, and snap, plus how dist_share spreads almost uniformly from 0 to 10,000 (kurtosis -1.5), suggesting it may be a percentile-style metric rather than a raw count.

citing: row_count · column_count · columns.st.top_values · columns.inc.stats · columns.pov.stats · columns.snap.stats · columns.dist_share.stats · columns.cty.top_values

Charts the summary said to look at first

st · State coverage is broad but tilted toward Texas, California, and Florida.

Show data table

Top values for st (20 unique shown, of 51 total).
value	count	share
Texas	4010	7.7%
California	3727	7.2%
Florida	3018	5.8%
Ohio	2302	4.4%
Pennsylvania	2242	4.3%
Michigan	2073	4.0%
New York	2059	4.0%
North Carolina	1935	3.7%
Illinois	1906	3.7%
Georgia	1751	3.4%
Virginia	1441	2.8%
Tennessee	1358	2.6%
Indiana	1328	2.6%
New Jersey	1210	2.3%
Missouri	1165	2.2%
Washington	1127	2.2%
Wisconsin	1092	2.1%
Minnesota	1085	2.1%
Alabama	1081	2.1%
Arizona	1076	2.1%

inc · Income is right-skewed with a long tail up to $250K and ~4% outliers.

Show data table

Histogram bins for inc (median: 70455.0).
bin	count
2499 – 8687	4
8687 – 1.487e+04	38
1.487e+04 – 2.106e+04	193
2.106e+04 – 2.725e+04	634
2.725e+04 – 3.344e+04	1297
3.344e+04 – 3.962e+04	2050
3.962e+04 – 4.581e+04	3155
4.581e+04 – 5.2e+04	4048
5.2e+04 – 5.819e+04	4797
5.819e+04 – 6.437e+04	5037
6.437e+04 – 7.056e+04	4829
7.056e+04 – 7.675e+04	4317
7.675e+04 – 8.294e+04	3593
8.294e+04 – 8.912e+04	2932
8.912e+04 – 9.531e+04	2448
9.531e+04 – 1.015e+05	2120
1.015e+05 – 1.077e+05	1859
1.077e+05 – 1.139e+05	1463
1.139e+05 – 1.201e+05	1241
1.201e+05 – 1.262e+05	1079
1.262e+05 – 1.324e+05	910
1.324e+05 – 1.386e+05	701
1.386e+05 – 1.448e+05	506
1.448e+05 – 1.51e+05	484
1.51e+05 – 1.572e+05	382
1.572e+05 – 1.634e+05	361
1.634e+05 – 1.696e+05	261
1.696e+05 – 1.758e+05	178
1.758e+05 – 1.819e+05	171
1.819e+05 – 1.881e+05	131
1.881e+05 – 1.943e+05	141
1.943e+05 – 2.005e+05	106
2.005e+05 – 2.067e+05	100
2.067e+05 – 2.129e+05	63
2.129e+05 – 2.191e+05	64
2.191e+05 – 2.253e+05	62
2.253e+05 – 2.314e+05	40
2.314e+05 – 2.376e+05	29
2.376e+05 – 2.438e+05	32
2.438e+05 – 2.5e+05	181

pov · Poverty rate clusters below 20% but stretches to nearly 100% in extreme tracts.

Show data table

Histogram bins for pov (median: 10.8).
bin	count
0 – 2.487	2933
2.487 – 4.975	7061
4.975 – 7.462	7249
7.462 – 9.95	6599
9.95 – 12.44	5536
12.44 – 14.92	4615
14.92 – 17.41	3752
17.41 – 19.9	2876
19.9 – 22.39	2480
22.39 – 24.88	1844
24.88 – 27.36	1480
27.36 – 29.85	1171
29.85 – 32.34	1029
32.34 – 34.82	766
34.82 – 37.31	607
37.31 – 39.8	488
39.8 – 42.29	399
42.29 – 44.77	278
44.77 – 47.26	222
47.26 – 49.75	148
49.75 – 52.24	114
52.24 – 54.72	112
54.72 – 57.21	78
57.21 – 59.7	48
59.7 – 62.19	42
62.19 – 64.67	36
64.67 – 67.16	25
67.16 – 69.65	14
69.65 – 72.14	10
72.14 – 74.62	7
74.62 – 77.11	6
77.11 – 79.6	4
79.6 – 82.09	5
82.09 – 84.57	2
84.57 – 87.06	0
87.06 – 89.55	0
89.55 – 92.04	0
92.04 – 94.52	0
94.52 – 97.01	0
97.01 – 99.5	1

dist_share · Distribution is nearly flat from 0 to 10,000 — check whether this is a ranked or percentile measure.

Show data table

Histogram bins for dist_share (median: 5503.2).
bin	count
0 – 250	5132
250 – 500	1829
500 – 750	1421
750 – 1000	1277
1000 – 1250	1119
1250 – 1500	1073
1500 – 1750	1017
1750 – 2000	973
2000 – 2250	976
2250 – 2500	907
2500 – 2750	894
2750 – 3000	893
3000 – 3250	857
3250 – 3500	862
3500 – 3750	840
3750 – 4000	865
4000 – 4250	822
4250 – 4500	829
4500 – 4750	876
4750 – 5000	844
5000 – 5250	871
5250 – 5500	834
5500 – 5750	839
5750 – 6000	845
6000 – 6250	767
6250 – 6500	798
6500 – 6750	779
6750 – 7000	878
7000 – 7250	872
7250 – 7500	803
7500 – 7750	788
7750 – 8000	843
8000 – 8250	810
8250 – 8500	861
8500 – 8750	787
8750 – 9000	871
9000 – 9250	952
9250 – 9500	1070
9500 – 9750	1401
9750 – 1e+04	11062

snap · SNAP counts are right-skewed with a median of 146 and a max near 1,900.

Show data table

Histogram bins for snap (median: 146.0).
bin	count
0 – 47.2	9587
47.2 – 94.4	8464
94.4 – 141.6	7361
141.6 – 188.8	6046
188.8 – 236	4810
236 – 283.2	3869
283.2 – 330.4	2938
330.4 – 377.6	2328
377.6 – 424.8	1718
424.8 – 472	1286
472 – 519.2	980
519.2 – 566.4	673
566.4 – 613.6	492
613.6 – 660.8	417
660.8 – 708	288
708 – 755.2	213
755.2 – 802.4	139
802.4 – 849.6	119
849.6 – 896.8	77
896.8 – 944	68
944 – 991.2	45
991.2 – 1038	29
1038 – 1086	22
1086 – 1133	19
1133 – 1180	9
1180 – 1227	10
1227 – 1274	6
1274 – 1322	5
1322 – 1369	7
1369 – 1416	0
1416 – 1463	1
1463 – 1510	2
1510 – 1558	1
1558 – 1605	0
1605 – 1652	1
1652 – 1699	1
1699 – 1746	2
1746 – 1794	2
1794 – 1841	1
1841 – 1888	1

Schema

8 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
id	text	0.0%	52,037	near_unique one_word allcaps short_text
st	categorical	0.0%	51
cty	text	0.0%	1,870	short_text duplicates
pov	numeric	0.0%	693
inc	numeric	0.0%	31,375
dist_share	numeric	0.0%	33,000
pop	numeric	0.0%	8,732
snap	numeric	0.0%	1,034

id

text identifier near_unique one_word allcaps short_text

This is a unique row identifier: all 52037 values are distinct (n_unique equals n) with zero nulls or duplicates. Values are 10-11 character single-token strings (len_min 10, len_max 11, one_word_rate 1.0, allcaps_rate 1.0), and the samples are numeric strings resembling 10-11 digit codes (e.g., FIPS-style geography IDs like 42069110300). Treatment: Use as the primary key for joins; exclude from modelling features. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 52,037
len_min: 10
len_max: 11
len_mean: 10.84
len_median: 11
len_p95: 11
word_mean: 1
word_median: 1
n_empty: 0
n_duplicates: 0
duplicate_rate: 0
vocab_size: 20,000
readability_flesch_mean: 121.2
emoji_rate: 0
url_rate: 0
one_word_rate: 1
allcaps_rate: 1
boilerplate_rate: 0

st

categorical feature

This column holds US state names — 51 unique values (likely 50 states plus DC) across 52,037 rows with no nulls. The distribution roughly tracks population: Texas leads at 7.7%, followed by California, Florida, Ohio, and Pennsylvania. Entropy ratio of 0.915 indicates a fairly even spread with no single state dominating. Treatment: One-hot or target-encode for modelling; consider grouping low-frequency states. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 51
top_value: Texas
top_rate: 0.07706
cardinality: 51
entropy: 5.192
entropy_ratio: 0.9153

cty

text feature short_text duplicates

This column ('cty') holds US county/parish names — short text averaging 2 words and 14 characters, with 'county' appearing in 19,399 rows and recurring names like Jefferson, Montgomery, and Maricopa County topping the list. With only 1,870 unique values across 52,037 rows, the duplicate rate is 96.4%, which is expected for a categorical geography field rather than a data-quality issue. No nulls, no URLs, no emoji — clean categorical text. Treatment: Treat as a high-cardinality categorical; encode via target/frequency encoding or join to a county FIPS lookup. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 1,870
len_min: 10
len_max: 33
len_mean: 14.3
len_median: 14
len_p95: 18
word_mean: 2.099
word_median: 2
n_empty: 0
n_duplicates: 50,167
duplicate_rate: 0.9641
vocab_size: 1,651
readability_flesch_mean: 25.91
emoji_rate: 0
url_rate: 0
one_word_rate: 0
allcaps_rate: 0
boilerplate_rate: 0

pov

numeric feature

Likely a poverty rate (percent) feature, bounded between 0 and 99.5 with a median of 10.8 and IQR 6.0-18.4. The distribution is right-skewed (skew 1.54, kurtosis 3.08) with 2116 outliers (4.07%) in the upper tail, and only 0.09% zeros across 693 distinct values over 52037 rows. Treatment: Consider a log1p or winsorising transform before modelling to dampen the right-skewed tail. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 693
min: 0
max: 99.5
mean: 13.7
median: 10.8
std: 10.58
q1: 6
q3: 18.4
iqr: 12.4
skew: 1.536
kurtosis: 3.077
n_outliers: 2,116
outlier_rate: 0.04066
zero_rate: 0.0008648

inc

numeric feature

Likely an income or annual revenue figure: values range from 2,499 to 250,001 with a mean of 78,215 and median of 70,455. The distribution is right-skewed (skew 1.45) with about 3.9% outliers (2,047 rows), and the suspiciously round max of 250,001 hints at a censoring cap. Treatment: log-transform and consider clipping at the 250,001 cap before modelling. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 31,375
min: 2,499
max: 250,001
mean: 7.821e+04
median: 70,455
std: 3.573e+04
q1: 54,059
q3: 94,375
iqr: 40,316
skew: 1.451
kurtosis: 3.144
n_outliers: 2,047
outlier_rate: 0.03934
zero_rate: 0

dist_share

numeric feature

A numeric share/proportion field bounded between 0 and 10000, suggesting basis-point or per-myriad encoding rather than a 0-1 ratio. The distribution is nearly symmetric (skew -0.09) but distinctly flat with negative kurtosis (-1.51) and an IQR spanning 1785.6 to 9377.5, pointing to a near-uniform spread across the range rather than a peaked distribution. About 2.1% of rows are exactly zero, and there are no statistical outliers. Treatment: Rescale to [0,1] by dividing by 10000 before modelling; consider a separate zero-flag for the 2% exact zeros. high · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 33,000
min: 0
max: 10,000
mean: 5396
median: 5503
std: 3671
q1: 1786
q3: 9378
iqr: 7592
skew: -0.09003
kurtosis: -1.506
n_outliers: 0
outlier_rate: 0
zero_rate: 0.02056

pop

numeric feature

This is a numeric 'pop' column with 52037 non-null values, likely a population count (or similar headcount metric) per row, ranging from 102 to 37452 with a median of 4169. The distribution is right-skewed (skew 1.68, kurtosis 9.99) with 972 outliers (1.87%) above the upper tail, and there are no zeros or nulls. With 8732 unique values across 52037 rows, the same population figures repeat frequently, suggesting the same entity appears across multiple rows. Treatment: Log-transform before modelling to dampen the right skew and outliers. medium · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 8,732
min: 102
max: 37,452
mean: 4432
median: 4,169
std: 2025
q1: 3,022
q3: 5,523
iqr: 2,501
skew: 1.684
kurtosis: 9.986
n_outliers: 972
outlier_rate: 0.01868
zero_rate: 0

snap

numeric feature

A right-skewed numeric feature with values spanning 0 to 1888 and a median of 146, well below the mean of 191. Skew of 1.71 and kurtosis of 4.70 confirm a long upper tail, with 1918 outliers (3.69%) and 2.28% zeros. The 1034 unique values across 52037 rows suggest a bounded count or score rather than a continuous measurement. Treatment: log-transform or winsorize before regression to tame the right tail. medium · anthropic:claude-opus-4-7

n: 52,037
nulls: 0 (0.0%)
unique: 1,034
min: 0
max: 1,888
mean: 191
median: 146
std: 170.2
q1: 66
q3: 268
iqr: 202
skew: 1.709
kurtosis: 4.702
n_outliers: 1,918
outlier_rate: 0.03686
zero_rate: 0.02279