education education by county
Reading
This dataset contains 3,222 US county-level education records with 6 columns covering county identifiers (county_name, fips, state) and educational attainment metrics (pct_hs_or_higher, pct_bachelors_or_higher, total_25_plus). The bachelor's degree rate averages 23.5% but ranges from 0% to nearly 79%, suggesting wide regional disparities worth investigating. The total_25_plus population column is heavily skewed (skew=13.5) with 440 outliers and a max of nearly 6.9 million, so any analysis using it should consider log transforms or per-capita normalization. State coverage is fairly even across 52 entries, with TX, GA, and VA contributing the most counties.
citing: row_count · column_count · columns.pct_bachelors_or_higher.stats · columns.pct_hs_or_higher.stats · columns.total_25_plus.stats · columns.state.top_values · columns.county_name.top_values
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 0 – 1.972 | 1 |
| 1.972 – 3.944 | 0 |
| 3.944 – 5.915 | 4 |
| 5.915 – 7.887 | 9 |
| 7.887 – 9.859 | 32 |
| 9.859 – 11.83 | 135 |
| 11.83 – 13.8 | 169 |
| 13.8 – 15.77 | 317 |
| 15.77 – 17.75 | 328 |
| 17.75 – 19.72 | 376 |
| 19.72 – 21.69 | 345 |
| 21.69 – 23.66 | 262 |
| 23.66 – 25.63 | 232 |
| 25.63 – 27.6 | 189 |
| 27.6 – 29.58 | 123 |
| 29.58 – 31.55 | 116 |
| 31.55 – 33.52 | 118 |
| 33.52 – 35.49 | 96 |
| 35.49 – 37.46 | 60 |
| 37.46 – 39.44 | 68 |
| 39.44 – 41.41 | 40 |
| 41.41 – 43.38 | 34 |
| 43.38 – 45.35 | 34 |
| 45.35 – 47.32 | 24 |
| 47.32 – 49.29 | 21 |
| 49.29 – 51.27 | 19 |
| 51.27 – 53.24 | 15 |
| 53.24 – 55.21 | 10 |
| 55.21 – 57.18 | 11 |
| 57.18 – 59.15 | 10 |
| 59.15 – 61.12 | 9 |
| 61.12 – 63.1 | 6 |
| 63.1 – 65.07 | 5 |
| 65.07 – 67.04 | 1 |
| 67.04 – 69.01 | 0 |
| 69.01 – 70.98 | 1 |
| 70.98 – 72.95 | 0 |
| 72.95 – 74.93 | 0 |
| 74.93 – 76.9 | 1 |
| 76.9 – 78.87 | 1 |
Show data table
| bin | count |
|---|---|
| 33.33 – 34.99 | 1 |
| 34.99 – 36.65 | 0 |
| 36.65 – 38.31 | 0 |
| 38.31 – 39.97 | 0 |
| 39.97 – 41.62 | 0 |
| 41.62 – 43.28 | 0 |
| 43.28 – 44.94 | 0 |
| 44.94 – 46.6 | 0 |
| 46.6 – 48.26 | 0 |
| 48.26 – 49.92 | 0 |
| 49.92 – 51.58 | 0 |
| 51.58 – 53.24 | 0 |
| 53.24 – 54.9 | 0 |
| 54.9 – 56.56 | 1 |
| 56.56 – 58.22 | 1 |
| 58.22 – 59.87 | 1 |
| 59.87 – 61.53 | 3 |
| 61.53 – 63.19 | 3 |
| 63.19 – 64.85 | 3 |
| 64.85 – 66.51 | 2 |
| 66.51 – 68.17 | 6 |
| 68.17 – 69.83 | 7 |
| 69.83 – 71.49 | 15 |
| 71.49 – 73.15 | 30 |
| 73.15 – 74.81 | 30 |
| 74.81 – 76.46 | 46 |
| 76.46 – 78.12 | 60 |
| 78.12 – 79.78 | 88 |
| 79.78 – 81.44 | 131 |
| 81.44 – 83.1 | 174 |
| 83.1 – 84.76 | 189 |
| 84.76 – 86.42 | 256 |
| 86.42 – 88.08 | 289 |
| 88.08 – 89.74 | 360 |
| 89.74 – 91.39 | 429 |
| 91.39 – 93.05 | 460 |
| 93.05 – 94.71 | 389 |
| 94.71 – 96.37 | 192 |
| 96.37 – 98.03 | 47 |
| 98.03 – 99.69 | 9 |
Show data table
| bin | count |
|---|---|
| 50 – 1.728e+05 | 2948 |
| 1.728e+05 – 3.455e+05 | 135 |
| 3.455e+05 – 5.183e+05 | 53 |
| 5.183e+05 – 6.91e+05 | 37 |
| 6.91e+05 – 8.638e+05 | 13 |
| 8.638e+05 – 1.036e+06 | 10 |
| 1.036e+06 – 1.209e+06 | 7 |
| 1.209e+06 – 1.382e+06 | 4 |
| 1.382e+06 – 1.555e+06 | 2 |
| 1.555e+06 – 1.727e+06 | 5 |
| 1.727e+06 – 1.9e+06 | 1 |
| 1.9e+06 – 2.073e+06 | 1 |
| 2.073e+06 – 2.246e+06 | 1 |
| 2.246e+06 – 2.418e+06 | 1 |
| 2.418e+06 – 2.591e+06 | 0 |
| 2.591e+06 – 2.764e+06 | 0 |
| 2.764e+06 – 2.937e+06 | 0 |
| 2.937e+06 – 3.109e+06 | 2 |
| 3.109e+06 – 3.282e+06 | 0 |
| 3.282e+06 – 3.455e+06 | 0 |
| 3.455e+06 – 3.628e+06 | 0 |
| 3.628e+06 – 3.8e+06 | 1 |
| 3.8e+06 – 3.973e+06 | 0 |
| 3.973e+06 – 4.146e+06 | 0 |
| 4.146e+06 – 4.319e+06 | 0 |
| 4.319e+06 – 4.491e+06 | 0 |
| 4.491e+06 – 4.664e+06 | 0 |
| 4.664e+06 – 4.837e+06 | 0 |
| 4.837e+06 – 5.01e+06 | 0 |
| 5.01e+06 – 5.182e+06 | 0 |
| 5.182e+06 – 5.355e+06 | 0 |
| 5.355e+06 – 5.528e+06 | 0 |
| 5.528e+06 – 5.7e+06 | 0 |
| 5.7e+06 – 5.873e+06 | 0 |
| 5.873e+06 – 6.046e+06 | 0 |
| 6.046e+06 – 6.219e+06 | 0 |
| 6.219e+06 – 6.391e+06 | 0 |
| 6.391e+06 – 6.564e+06 | 0 |
| 6.564e+06 – 6.737e+06 | 0 |
| 6.737e+06 – 6.91e+06 | 1 |
Show data table
| value | count | share |
|---|---|---|
| TX | 254 | 7.9% |
| GA | 159 | 4.9% |
| VA | 133 | 4.1% |
| KY | 120 | 3.7% |
| MO | 115 | 3.6% |
| KS | 105 | 3.3% |
| IL | 102 | 3.2% |
| NC | 100 | 3.1% |
| IA | 99 | 3.1% |
| TN | 95 | 2.9% |
| NE | 93 | 2.9% |
| IN | 92 | 2.9% |
| OH | 88 | 2.7% |
| MN | 87 | 2.7% |
| MI | 83 | 2.6% |
| MS | 82 | 2.5% |
| PR | 78 | 2.4% |
| OK | 77 | 2.4% |
| AR | 75 | 2.3% |
| WI | 72 | 2.2% |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| fips | numeric | 0.0% | 3,222 |
|
| county_name | text | 0.0% | 1,960 |
short_text
duplicates
|
| state | categorical | 0.0% | 52 |
|
| total_25_plus | numeric | 0.0% | 3,140 |
high_skew
outliers
|
| pct_hs_or_higher | numeric | 0.0% | 1,612 |
|
| pct_bachelors_or_higher | numeric | 0.0% | 1,982 |
|
fips
numeric identifierThis column holds U.S. FIPS county/state codes: every one of the 3222 rows is unique, no nulls, and values span 1001 to 72153, consistent with state-prefixed county identifiers. The distribution is roughly symmetric (skew 0.16, kurtosis -0.63) with no outliers, which is expected for an enumerated geographic key rather than a measured quantity. Treat the numeric stats as incidental — these are categorical identifiers, not magnitudes. Treatment: Cast to zero-padded string and use as a join key to county/state geographies; do not model as a number.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,222
- min
- 1,001
- max
- 72,153
- mean
- 3.138e+04
- median
- 30,022
- std
- 1.63e+04
- q1
- 1.903e+04
- q3
- 4.61e+04
- iqr
- 27,075
- skew
- 0.1574
- kurtosis
- -0.6314
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text metadata short_text duplicatesThis is a US county-level place-name field: 2-word entries averaging 14 characters, with 'County' appearing 2999 times alongside 'Parish' (64, Louisiana) and 'Municipio' (78, Puerto Rico). Duplication is heavy at 39.2% (1262 rows) because common names like Washington County (30), Jefferson County (25), and Franklin County (24) recur across states — so this column is not unique on its own. With 1960 distinct values across 3222 rows, it needs a state qualifier to act as a key. Treatment: Concatenate with a state/territory code before using as a join key; do not treat as unique.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,960
- len_min
- 10
- len_max
- 46
- len_mean
- 14.17
- len_median
- 14
- len_p95
- 18
- word_mean
- 2.083
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 1,262
- duplicate_rate
- 0.3917
- vocab_size
- 1,963
- readability_flesch_mean
- 33.36
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
state
categorical featureThis is a US state code column with 52 distinct values, matching the 50 states plus likely DC and a territory. Distribution is fairly even (entropy ratio 0.93), with TX leading at 254 rows (7.88%) followed by GA, VA, and KY — consistent with county-level row counts where larger states contribute more records. No nulls. Treatment: one-hot or target-encode for modelling; usable as a join key to state-level reference tables.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 52
- top_value
- TX
- top_rate
- 0.07883
- cardinality
- 52
- entropy
- 5.314
- entropy_ratio
- 0.9322
total_25_plus
numeric feature high_skew outliersA heavily right-skewed count of population (or units) aged 25 and over per row, ranging from 50 to 6,909,650 with a median of 18,313.5 but a mean of 71,074.3. Skew of 13.5 and kurtosis of 306.9 indicate a long heavy tail, and 440 rows (13.7%) flag as outliers. No nulls or zeros, and 3,140 of 3,222 values are unique, consistent with geographic aggregates of varying size. Treatment: log-transform before any regression or distance-based modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 3,140
- min
- 50
- max
- 6.91e+06
- mean
- 7.107e+04
- median
- 1.831e+04
- std
- 2.266e+05
- q1
- 7696
- q3
- 4.649e+04
- iqr
- 3.879e+04
- skew
- 13.51
- kurtosis
- 306.9
- n_outliers
- 440
- outlier_rate
- 0.1366
- zero_rate
- 0
pct_hs_or_higher
numeric featureThis column reports the percentage of a population with at least a high school education, ranging from 33.33 to 99.69 with a mean of 88.08 and median of 89.39. The distribution is left-skewed (skew -1.33) with heavy tails (kurtosis 3.74), and 86 low-end outliers (2.67%) pull below the bulk concentrated between Q1 84.9 and Q3 92.47. With 1612 unique values across 3222 rows and no nulls, it looks like a clean geographic feature (likely county- or tract-level). Treatment: Use as-is or apply a reflected log/Box-Cox to address the left skew before linear modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,612
- min
- 33.33
- max
- 99.69
- mean
- 88.08
- median
- 89.39
- std
- 5.97
- q1
- 84.9
- q3
- 92.47
- iqr
- 7.567
- skew
- -1.328
- kurtosis
- 3.742
- n_outliers
- 86
- outlier_rate
- 0.02669
- zero_rate
- 0
pct_bachelors_or_higher
numeric featureThis column reports the percentage of residents with a bachelor's degree or higher across 3,222 rows, ranging from 0.0 to 78.87 with a median of 21.07 and mean of 23.50. The distribution is right-skewed (skew 1.36, kurtosis 2.31) with 141 high-end outliers (4.4%) reflecting a long tail of highly-educated areas. Near-zero zero_rate (0.0003) and no nulls suggest clean coverage. Treatment: Consider a log or sqrt transform to tame the right skew before linear modelling.
- n
- 3,222
- nulls
- 0 (0.0%)
- unique
- 1,982
- min
- 0
- max
- 78.87
- mean
- 23.5
- median
- 21.07
- std
- 9.983
- q1
- 16.59
- q3
- 27.85
- iqr
- 11.26
- skew
- 1.357
- kurtosis
- 2.306
- n_outliers
- 141
- outlier_rate
- 0.04376
- zero_rate
- 0.0003104