rural urban
Reading
This dataset is a county-level reference table covering 3,222 U.S. counties, with each row uniquely identified by a county name and FIPS code and labeled as either rural or urban/suburban. The headline finding is the rural skew: 2,212 counties (about 68.7%) are flagged Rural versus 1,010 Urban/Suburban, and the `rural` and `rural_category` columns are perfectly redundant duplicates of each other. County names are dominated by Texas (256), Virginia (189), and Georgia (159), reflecting how many counties those states contain rather than any data quality issue.
citing: row_count · column_count · columns.rural.top_values · columns.rural.stats.top_rate · columns.rural_category.top_values · columns.county_name.top_words · columns.fips.stats.min · columns.fips.stats.max
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Rural | 2212 | 68.7% |
| Urban/Suburban | 1010 | 31.3% |
Show data table
| bin | count |
|---|---|
| 1001 – 2780 | 97 |
| 2780 – 4559 | 15 |
| 4559 – 6337 | 133 |
| 6337 – 8116 | 59 |
| 8116 – 9895 | 14 |
| 9895 – 1.167e+04 | 4 |
| 1.167e+04 – 1.345e+04 | 226 |
| 1.345e+04 – 1.523e+04 | 5 |
| 1.523e+04 – 1.701e+04 | 49 |
| 1.701e+04 – 1.879e+04 | 189 |
| 1.879e+04 – 2.057e+04 | 204 |
| 2.057e+04 – 2.235e+04 | 184 |
| 2.235e+04 – 2.413e+04 | 39 |
| 2.413e+04 – 2.59e+04 | 15 |
| 2.59e+04 – 2.768e+04 | 170 |
| 2.768e+04 – 2.946e+04 | 196 |
| 2.946e+04 – 3.124e+04 | 150 |
| 3.124e+04 – 3.302e+04 | 27 |
| 3.302e+04 – 3.48e+04 | 21 |
| 3.48e+04 – 3.658e+04 | 95 |
| 3.658e+04 – 3.836e+04 | 153 |
| 3.836e+04 – 4.013e+04 | 155 |
| 4.013e+04 – 4.191e+04 | 46 |
| 4.191e+04 – 4.369e+04 | 67 |
| 4.369e+04 – 4.547e+04 | 51 |
| 4.547e+04 – 4.725e+04 | 161 |
| 4.725e+04 – 4.903e+04 | 268 |
| 4.903e+04 – 5.081e+04 | 29 |
| 5.081e+04 – 5.259e+04 | 133 |
| 5.259e+04 – 5.436e+04 | 94 |
| 5.436e+04 – 5.614e+04 | 95 |
| 5.614e+04 – 5.792e+04 | 0 |
| 5.792e+04 – 5.97e+04 | 0 |
| 5.97e+04 – 6.148e+04 | 0 |
| 6.148e+04 – 6.326e+04 | 0 |
| 6.326e+04 – 6.504e+04 | 0 |
| 6.504e+04 – 6.682e+04 | 0 |
| 6.682e+04 – 6.86e+04 | 0 |
| 6.86e+04 – 7.037e+04 | 0 |
| 7.037e+04 – 7.215e+04 | 78 |
Show data table
| value | count | share |
|---|---|---|
| True | 2212 | 68.7% |
| False | 1010 | 31.3% |
Show data table
| chars | count |
|---|---|
| 16 – 17 | 26 |
| 17 – 18 | 72 |
| 18 – 19 | 121 |
| 19 – 20 | 190 |
| 20 – 21 | 264 |
| 21 – 22 | 407 |
| 22 – 24 | 420 |
| 24 – 25 | 363 |
| 25 – 26 | 320 |
| 26 – 27 | 240 |
| 27 – 28 | 231 |
| 28 – 29 | 152 |
| 29 – 30 | 139 |
| 30 – 31 | 165 |
| 31 – 32 | 41 |
| 32 – 33 | 28 |
| 33 – 34 | 16 |
| 34 – 35 | 10 |
| 35 – 36 | 5 |
| 36 – 38 | 0 |
| 38 – 39 | 1 |
| 39 – 40 | 1 |
| 40 – 41 | 0 |
| 41 – 42 | 1 |
| 42 – 43 | 1 |
| 43 – 44 | 0 |
| 44 – 45 | 2 |
| 45 – 46 | 0 |
| 46 – 47 | 1 |
| 47 – 48 | 1 |
| 48 – 49 | 0 |
| 49 – 50 | 0 |
| 50 – 51 | 0 |
| 51 – 53 | 0 |
| 53 – 54 | 2 |
| 54 – 55 | 1 |
| 55 – 56 | 0 |
| 56 – 57 | 0 |
| 57 – 58 | 0 |
| 58 – 59 | 1 |
Schema
4 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 3,222 |
near_unique
|
| rural | categorical | 0.0% | 2 |
|
| rural_category | categorical | 0.0% | 2 |
|
fips
numeric identifierThis is the FIPS county/state code, with all 3222 rows unique and no nulls. Values span 1001 to 72153 with a near-symmetric distribution (skew 0.16, kurtosis -0.63), consistent with the standard 5-digit US county FIPS encoding rather than a measured quantity. Treat it as a categorical key, not a number. Treatment: Cast to zero-padded string and use as a join key to geographic reference tables.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text identifier near_uniqueEach of the 3,222 rows holds a unique county-plus-state string (e.g., 'X County, Texas'), with 'county,' appearing 2,999 times and state names like Texas (256), Virginia (189), and Georgia (159) dominating the top words. Lengths are tight (16-59 chars, median 24) and there are zero nulls or duplicates, consistent with a complete US county roster. The near_unique alert is expected here rather than a data-quality issue. Treatment: Use as a join key to county-level reference tables; do not feed raw into a model.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- len_min
- 16
- len_max
- 59
- len_mean
- 24.32
- len_median
- 24
- len_p95
- 31
- word_mean
- 3.248
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,990
- readability_flesch_mean
- 10.28
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
rural
categorical featureBinary boolean flag indicating whether a record is rural, fully populated across all 3222 rows. The split is roughly 69/31 in favour of True (2212 vs 1010), giving a high entropy ratio of 0.90 — imbalanced but far from degenerate. Treatment: Cast to 0/1 and use directly as a binary feature.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- True
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971
rural_category
categorical featureBinary geographic classifier splitting records into 'Rural' (2212) and 'Urban/Suburban' (1010) across all 3222 rows with no nulls. The split is roughly 69/31 toward Rural, giving an entropy ratio of 0.897, so both classes are well represented despite the imbalance. Treatment: Encode as a binary indicator (e.g., is_rural) for modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Rural
- top_rate
- 0.6865
- cardinality
- 2
- entropy
- 0.8971
- entropy_ratio
- 0.8971