environmental desert data
Reading
This dataset contains 52,037 records describing US Census-tract-level demographics, with an 11-character ID, county and state labels, and five numeric measures: distance/share, income, population, poverty rate, and SNAP counts. State coverage spans all 51 entries (50 states plus DC), led by Texas (4,010), California (3,727), and Florida (3,018), and counties are dominated by common names like Jefferson and Montgomery. The income distribution is right-skewed (mean $78,215 vs median $70,455, max $250,001) with about 4% flagged as outliers, and poverty rate shows a similar skew (mean 13.7%, median 10.8%, max 99.5%). Worth a closer look: the strong skew and outlier rates in inc, pov, and snap, plus how dist_share spreads almost uniformly from 0 to 10,000 (kurtosis -1.5), suggesting it may be a percentile-style metric rather than a raw count.
citing: row_count · column_count · columns.st.top_values · columns.inc.stats · columns.pov.stats · columns.snap.stats · columns.dist_share.stats · columns.cty.top_values
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Texas | 4010 | 7.7% |
| California | 3727 | 7.2% |
| Florida | 3018 | 5.8% |
| Ohio | 2302 | 4.4% |
| Pennsylvania | 2242 | 4.3% |
| Michigan | 2073 | 4.0% |
| New York | 2059 | 4.0% |
| North Carolina | 1935 | 3.7% |
| Illinois | 1906 | 3.7% |
| Georgia | 1751 | 3.4% |
| Virginia | 1441 | 2.8% |
| Tennessee | 1358 | 2.6% |
| Indiana | 1328 | 2.6% |
| New Jersey | 1210 | 2.3% |
| Missouri | 1165 | 2.2% |
| Washington | 1127 | 2.2% |
| Wisconsin | 1092 | 2.1% |
| Minnesota | 1085 | 2.1% |
| Alabama | 1081 | 2.1% |
| Arizona | 1076 | 2.1% |
Show data table
| bin | count |
|---|---|
| 2499 – 8687 | 4 |
| 8687 – 1.487e+04 | 38 |
| 1.487e+04 – 2.106e+04 | 193 |
| 2.106e+04 – 2.725e+04 | 634 |
| 2.725e+04 – 3.344e+04 | 1297 |
| 3.344e+04 – 3.962e+04 | 2050 |
| 3.962e+04 – 4.581e+04 | 3155 |
| 4.581e+04 – 5.2e+04 | 4048 |
| 5.2e+04 – 5.819e+04 | 4797 |
| 5.819e+04 – 6.437e+04 | 5037 |
| 6.437e+04 – 7.056e+04 | 4829 |
| 7.056e+04 – 7.675e+04 | 4317 |
| 7.675e+04 – 8.294e+04 | 3593 |
| 8.294e+04 – 8.912e+04 | 2932 |
| 8.912e+04 – 9.531e+04 | 2448 |
| 9.531e+04 – 1.015e+05 | 2120 |
| 1.015e+05 – 1.077e+05 | 1859 |
| 1.077e+05 – 1.139e+05 | 1463 |
| 1.139e+05 – 1.201e+05 | 1241 |
| 1.201e+05 – 1.262e+05 | 1079 |
| 1.262e+05 – 1.324e+05 | 910 |
| 1.324e+05 – 1.386e+05 | 701 |
| 1.386e+05 – 1.448e+05 | 506 |
| 1.448e+05 – 1.51e+05 | 484 |
| 1.51e+05 – 1.572e+05 | 382 |
| 1.572e+05 – 1.634e+05 | 361 |
| 1.634e+05 – 1.696e+05 | 261 |
| 1.696e+05 – 1.758e+05 | 178 |
| 1.758e+05 – 1.819e+05 | 171 |
| 1.819e+05 – 1.881e+05 | 131 |
| 1.881e+05 – 1.943e+05 | 141 |
| 1.943e+05 – 2.005e+05 | 106 |
| 2.005e+05 – 2.067e+05 | 100 |
| 2.067e+05 – 2.129e+05 | 63 |
| 2.129e+05 – 2.191e+05 | 64 |
| 2.191e+05 – 2.253e+05 | 62 |
| 2.253e+05 – 2.314e+05 | 40 |
| 2.314e+05 – 2.376e+05 | 29 |
| 2.376e+05 – 2.438e+05 | 32 |
| 2.438e+05 – 2.5e+05 | 181 |
Show data table
| bin | count |
|---|---|
| 0 – 2.487 | 2933 |
| 2.487 – 4.975 | 7061 |
| 4.975 – 7.462 | 7249 |
| 7.462 – 9.95 | 6599 |
| 9.95 – 12.44 | 5536 |
| 12.44 – 14.92 | 4615 |
| 14.92 – 17.41 | 3752 |
| 17.41 – 19.9 | 2876 |
| 19.9 – 22.39 | 2480 |
| 22.39 – 24.88 | 1844 |
| 24.88 – 27.36 | 1480 |
| 27.36 – 29.85 | 1171 |
| 29.85 – 32.34 | 1029 |
| 32.34 – 34.82 | 766 |
| 34.82 – 37.31 | 607 |
| 37.31 – 39.8 | 488 |
| 39.8 – 42.29 | 399 |
| 42.29 – 44.77 | 278 |
| 44.77 – 47.26 | 222 |
| 47.26 – 49.75 | 148 |
| 49.75 – 52.24 | 114 |
| 52.24 – 54.72 | 112 |
| 54.72 – 57.21 | 78 |
| 57.21 – 59.7 | 48 |
| 59.7 – 62.19 | 42 |
| 62.19 – 64.67 | 36 |
| 64.67 – 67.16 | 25 |
| 67.16 – 69.65 | 14 |
| 69.65 – 72.14 | 10 |
| 72.14 – 74.62 | 7 |
| 74.62 – 77.11 | 6 |
| 77.11 – 79.6 | 4 |
| 79.6 – 82.09 | 5 |
| 82.09 – 84.57 | 2 |
| 84.57 – 87.06 | 0 |
| 87.06 – 89.55 | 0 |
| 89.55 – 92.04 | 0 |
| 92.04 – 94.52 | 0 |
| 94.52 – 97.01 | 0 |
| 97.01 – 99.5 | 1 |
Show data table
| bin | count |
|---|---|
| 0 – 250 | 5132 |
| 250 – 500 | 1829 |
| 500 – 750 | 1421 |
| 750 – 1000 | 1277 |
| 1000 – 1250 | 1119 |
| 1250 – 1500 | 1073 |
| 1500 – 1750 | 1017 |
| 1750 – 2000 | 973 |
| 2000 – 2250 | 976 |
| 2250 – 2500 | 907 |
| 2500 – 2750 | 894 |
| 2750 – 3000 | 893 |
| 3000 – 3250 | 857 |
| 3250 – 3500 | 862 |
| 3500 – 3750 | 840 |
| 3750 – 4000 | 865 |
| 4000 – 4250 | 822 |
| 4250 – 4500 | 829 |
| 4500 – 4750 | 876 |
| 4750 – 5000 | 844 |
| 5000 – 5250 | 871 |
| 5250 – 5500 | 834 |
| 5500 – 5750 | 839 |
| 5750 – 6000 | 845 |
| 6000 – 6250 | 767 |
| 6250 – 6500 | 798 |
| 6500 – 6750 | 779 |
| 6750 – 7000 | 878 |
| 7000 – 7250 | 872 |
| 7250 – 7500 | 803 |
| 7500 – 7750 | 788 |
| 7750 – 8000 | 843 |
| 8000 – 8250 | 810 |
| 8250 – 8500 | 861 |
| 8500 – 8750 | 787 |
| 8750 – 9000 | 871 |
| 9000 – 9250 | 952 |
| 9250 – 9500 | 1070 |
| 9500 – 9750 | 1401 |
| 9750 – 1e+04 | 11062 |
Show data table
| bin | count |
|---|---|
| 0 – 47.2 | 9587 |
| 47.2 – 94.4 | 8464 |
| 94.4 – 141.6 | 7361 |
| 141.6 – 188.8 | 6046 |
| 188.8 – 236 | 4810 |
| 236 – 283.2 | 3869 |
| 283.2 – 330.4 | 2938 |
| 330.4 – 377.6 | 2328 |
| 377.6 – 424.8 | 1718 |
| 424.8 – 472 | 1286 |
| 472 – 519.2 | 980 |
| 519.2 – 566.4 | 673 |
| 566.4 – 613.6 | 492 |
| 613.6 – 660.8 | 417 |
| 660.8 – 708 | 288 |
| 708 – 755.2 | 213 |
| 755.2 – 802.4 | 139 |
| 802.4 – 849.6 | 119 |
| 849.6 – 896.8 | 77 |
| 896.8 – 944 | 68 |
| 944 – 991.2 | 45 |
| 991.2 – 1038 | 29 |
| 1038 – 1086 | 22 |
| 1086 – 1133 | 19 |
| 1133 – 1180 | 9 |
| 1180 – 1227 | 10 |
| 1227 – 1274 | 6 |
| 1274 – 1322 | 5 |
| 1322 – 1369 | 7 |
| 1369 – 1416 | 0 |
| 1416 – 1463 | 1 |
| 1463 – 1510 | 2 |
| 1510 – 1558 | 1 |
| 1558 – 1605 | 0 |
| 1605 – 1652 | 1 |
| 1652 – 1699 | 1 |
| 1699 – 1746 | 2 |
| 1746 – 1794 | 2 |
| 1794 – 1841 | 1 |
| 1841 – 1888 | 1 |
Schema
8 columns| Alerts | ||||
|---|---|---|---|---|
| id | text | 0.0% | 52,037 |
near_unique
one_word
allcaps
short_text
|
| st | categorical | 0.0% | 51 |
|
| cty | text | 0.0% | 1,870 |
short_text
duplicates
|
| pov | numeric | 0.0% | 693 |
|
| inc | numeric | 0.0% | 31,375 |
|
| dist_share | numeric | 0.0% | 33,000 |
|
| pop | numeric | 0.0% | 8,732 |
|
| snap | numeric | 0.0% | 1,034 |
|
id
text identifier near_unique one_word allcaps short_textThis is a unique row identifier: all 52037 values are distinct (n_unique equals n) with zero nulls or duplicates. Values are 10-11 character single-token strings (len_min 10, len_max 11, one_word_rate 1.0, allcaps_rate 1.0), and the samples are numeric strings resembling 10-11 digit codes (e.g., FIPS-style geography IDs like 42069110300). Treatment: Use as the primary key for joins; exclude from modelling features.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 52,037
- len_min
- 10
- len_max
- 11
- len_mean
- 10.84
- len_median
- 11
- len_p95
- 11
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 20,000
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
st
categorical featureThis column holds US state names — 51 unique values (likely 50 states plus DC) across 52,037 rows with no nulls. The distribution roughly tracks population: Texas leads at 7.7%, followed by California, Florida, Ohio, and Pennsylvania. Entropy ratio of 0.915 indicates a fairly even spread with no single state dominating. Treatment: One-hot or target-encode for modelling; consider grouping low-frequency states.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- Texas
- top_rate
- 0.07706
- cardinality
- 51
- entropy
- 5.192
- entropy_ratio
- 0.9153
cty
text feature short_text duplicatesThis column ('cty') holds US county/parish names — short text averaging 2 words and 14 characters, with 'county' appearing in 19,399 rows and recurring names like Jefferson, Montgomery, and Maricopa County topping the list. With only 1,870 unique values across 52,037 rows, the duplicate rate is 96.4%, which is expected for a categorical geography field rather than a data-quality issue. No nulls, no URLs, no emoji — clean categorical text. Treatment: Treat as a high-cardinality categorical; encode via target/frequency encoding or join to a county FIPS lookup.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 1,870
- len_min
- 10
- len_max
- 33
- len_mean
- 14.3
- len_median
- 14
- len_p95
- 18
- word_mean
- 2.099
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 50,167
- duplicate_rate
- 0.9641
- vocab_size
- 1,651
- readability_flesch_mean
- 25.91
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
pov
numeric featureLikely a poverty rate (percent) feature, bounded between 0 and 99.5 with a median of 10.8 and IQR 6.0-18.4. The distribution is right-skewed (skew 1.54, kurtosis 3.08) with 2116 outliers (4.07%) in the upper tail, and only 0.09% zeros across 693 distinct values over 52037 rows. Treatment: Consider a log1p or winsorising transform before modelling to dampen the right-skewed tail.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 693
- min
- 0
- max
- 99.5
- mean
- 13.7
- median
- 10.8
- std
- 10.58
- q1
- 6
- q3
- 18.4
- iqr
- 12.4
- skew
- 1.536
- kurtosis
- 3.077
- n_outliers
- 2,116
- outlier_rate
- 0.04066
- zero_rate
- 0.0008648
inc
numeric featureLikely an income or annual revenue figure: values range from 2,499 to 250,001 with a mean of 78,215 and median of 70,455. The distribution is right-skewed (skew 1.45) with about 3.9% outliers (2,047 rows), and the suspiciously round max of 250,001 hints at a censoring cap. Treatment: log-transform and consider clipping at the 250,001 cap before modelling.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 31,375
- min
- 2,499
- max
- 250,001
- mean
- 7.821e+04
- median
- 70,455
- std
- 3.573e+04
- q1
- 54,059
- q3
- 94,375
- iqr
- 40,316
- skew
- 1.451
- kurtosis
- 3.144
- n_outliers
- 2,047
- outlier_rate
- 0.03934
- zero_rate
- 0
pop
numeric featureThis is a numeric 'pop' column with 52037 non-null values, likely a population count (or similar headcount metric) per row, ranging from 102 to 37452 with a median of 4169. The distribution is right-skewed (skew 1.68, kurtosis 9.99) with 972 outliers (1.87%) above the upper tail, and there are no zeros or nulls. With 8732 unique values across 52037 rows, the same population figures repeat frequently, suggesting the same entity appears across multiple rows. Treatment: Log-transform before modelling to dampen the right skew and outliers.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 8,732
- min
- 102
- max
- 37,452
- mean
- 4432
- median
- 4,169
- std
- 2025
- q1
- 3,022
- q3
- 5,523
- iqr
- 2,501
- skew
- 1.684
- kurtosis
- 9.986
- n_outliers
- 972
- outlier_rate
- 0.01868
- zero_rate
- 0
snap
numeric featureA right-skewed numeric feature with values spanning 0 to 1888 and a median of 146, well below the mean of 191. Skew of 1.71 and kurtosis of 4.70 confirm a long upper tail, with 1918 outliers (3.69%) and 2.28% zeros. The 1034 unique values across 52037 rows suggest a bounded count or score rather than a continuous measurement. Treatment: log-transform or winsorize before regression to tame the right tail.
- n
- 52,037
- nulls
- 0 (0.0%)
- unique
- 1,034
- min
- 0
- max
- 1,888
- mean
- 191
- median
- 146
- std
- 170.2
- q1
- 66
- q3
- 268
- iqr
- 202
- skew
- 1.709
- kurtosis
- 4.702
- n_outliers
- 1,918
- outlier_rate
- 0.03686
- zero_rate
- 0.02279