veterans merged county analysis
Reading
This dataset contains 3,144 rows — one per U.S. county — combining Census geographic identifiers (GEOID, STATE_NAME, NAMELSAD, ALAND, AWATER) with veteran and active-duty military estimates and rate-normalized fields. The raw count columns (total_pop, active_duty_est, veterans_est, ALAND) are extremely right-skewed with skew values above 8 and hundreds of outliers each, so any analysis on them should use logs or per-capita versions. The rate columns tell a cleaner story: active_duty_per_10k is roughly symmetric (skew -0.38, mean ~4,694 per 10k) while veterans_per_100 is mildly right-skewed (mean 6.19, max 18.09) and is the better candidate for ranking counties. State coverage is uneven — Texas alone supplies 254 counties (8.1%), followed by Georgia and Virginia — which matters when aggregating. Note also that LSAD is heavily imbalanced (95% code '06') and GEOID and fips are duplicates of each other.
citing: row_count · column_count · ALAND · total_pop · active_duty_est · veterans_est · active_duty_per_10k · veterans_per_100 · STATE_NAME · LSAD · NAMELSAD
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Texas | 254 | 8.1% |
| Georgia | 159 | 5.1% |
| Virginia | 133 | 4.2% |
| Kentucky | 120 | 3.8% |
| Missouri | 115 | 3.7% |
| Kansas | 105 | 3.3% |
| Illinois | 102 | 3.2% |
| North Carolina | 100 | 3.2% |
| Iowa | 99 | 3.1% |
| Tennessee | 95 | 3.0% |
| Nebraska | 93 | 3.0% |
| Indiana | 92 | 2.9% |
| Ohio | 88 | 2.8% |
| Minnesota | 87 | 2.8% |
| Michigan | 83 | 2.6% |
| Mississippi | 82 | 2.6% |
| Oklahoma | 77 | 2.4% |
| Arkansas | 75 | 2.4% |
| Wisconsin | 72 | 2.3% |
| Alabama | 67 | 2.1% |
Show data table
| bin | count |
|---|---|
| 0 – 0.4522 | 1 |
| 0.4522 – 0.9043 | 0 |
| 0.9043 – 1.357 | 6 |
| 1.357 – 1.809 | 13 |
| 1.809 – 2.261 | 21 |
| 2.261 – 2.713 | 34 |
| 2.713 – 3.165 | 60 |
| 3.165 – 3.617 | 84 |
| 3.617 – 4.07 | 137 |
| 4.07 – 4.522 | 187 |
| 4.522 – 4.974 | 271 |
| 4.974 – 5.426 | 319 |
| 5.426 – 5.878 | 346 |
| 5.878 – 6.33 | 358 |
| 6.33 – 6.783 | 313 |
| 6.783 – 7.235 | 259 |
| 7.235 – 7.687 | 173 |
| 7.687 – 8.139 | 128 |
| 8.139 – 8.591 | 94 |
| 8.591 – 9.043 | 88 |
| 9.043 – 9.496 | 60 |
| 9.496 – 9.948 | 38 |
| 9.948 – 10.4 | 39 |
| 10.4 – 10.85 | 26 |
| 10.85 – 11.3 | 16 |
| 11.3 – 11.76 | 21 |
| 11.76 – 12.21 | 14 |
| 12.21 – 12.66 | 15 |
| 12.66 – 13.11 | 4 |
| 13.11 – 13.57 | 6 |
| 13.57 – 14.02 | 6 |
| 14.02 – 14.47 | 1 |
| 14.47 – 14.92 | 1 |
| 14.92 – 15.37 | 3 |
| 15.37 – 15.83 | 0 |
| 15.83 – 16.28 | 0 |
| 16.28 – 16.73 | 1 |
| 16.73 – 17.18 | 0 |
| 17.18 – 17.63 | 0 |
| 17.63 – 18.09 | 1 |
Show data table
| bin | count |
|---|---|
| 1708 – 1845 | 1 |
| 1845 – 1983 | 0 |
| 1983 – 2120 | 1 |
| 2120 – 2257 | 0 |
| 2257 – 2395 | 1 |
| 2395 – 2532 | 1 |
| 2532 – 2669 | 1 |
| 2669 – 2807 | 6 |
| 2807 – 2944 | 7 |
| 2944 – 3081 | 9 |
| 3081 – 3218 | 21 |
| 3218 – 3356 | 20 |
| 3356 – 3493 | 29 |
| 3493 – 3630 | 57 |
| 3630 – 3768 | 52 |
| 3768 – 3905 | 92 |
| 3905 – 4042 | 111 |
| 4042 – 4179 | 165 |
| 4179 – 4317 | 193 |
| 4317 – 4454 | 224 |
| 4454 – 4591 | 267 |
| 4591 – 4729 | 303 |
| 4729 – 4866 | 280 |
| 4866 – 5003 | 301 |
| 5003 – 5141 | 277 |
| 5141 – 5278 | 268 |
| 5278 – 5415 | 177 |
| 5415 – 5552 | 136 |
| 5552 – 5690 | 61 |
| 5690 – 5827 | 37 |
| 5827 – 5964 | 18 |
| 5964 – 6102 | 8 |
| 6102 – 6239 | 5 |
| 6239 – 6376 | 5 |
| 6376 – 6514 | 3 |
| 6514 – 6651 | 2 |
| 6651 – 6788 | 2 |
| 6788 – 6925 | 0 |
| 6925 – 7063 | 0 |
| 7063 – 7200 | 3 |
Show data table
| bin | count |
|---|---|
| 50 – 2.485e+05 | 2863 |
| 2.485e+05 – 4.969e+05 | 137 |
| 4.969e+05 – 7.453e+05 | 57 |
| 7.453e+05 – 9.937e+05 | 37 |
| 9.937e+05 – 1.242e+06 | 14 |
| 1.242e+06 – 1.491e+06 | 10 |
| 1.491e+06 – 1.739e+06 | 7 |
| 1.739e+06 – 1.987e+06 | 3 |
| 1.987e+06 – 2.236e+06 | 3 |
| 2.236e+06 – 2.484e+06 | 4 |
| 2.484e+06 – 2.733e+06 | 3 |
| 2.733e+06 – 2.981e+06 | 0 |
| 2.981e+06 – 3.229e+06 | 1 |
| 3.229e+06 – 3.478e+06 | 1 |
| 3.478e+06 – 3.726e+06 | 0 |
| 3.726e+06 – 3.975e+06 | 0 |
| 3.975e+06 – 4.223e+06 | 0 |
| 4.223e+06 – 4.472e+06 | 1 |
| 4.472e+06 – 4.72e+06 | 0 |
| 4.72e+06 – 4.968e+06 | 1 |
| 4.968e+06 – 5.217e+06 | 0 |
| 5.217e+06 – 5.465e+06 | 1 |
| 5.465e+06 – 5.714e+06 | 0 |
| 5.714e+06 – 5.962e+06 | 0 |
| 5.962e+06 – 6.21e+06 | 0 |
| 6.21e+06 – 6.459e+06 | 0 |
| 6.459e+06 – 6.707e+06 | 0 |
| 6.707e+06 – 6.956e+06 | 0 |
| 6.956e+06 – 7.204e+06 | 0 |
| 7.204e+06 – 7.453e+06 | 0 |
| 7.453e+06 – 7.701e+06 | 0 |
| 7.701e+06 – 7.949e+06 | 0 |
| 7.949e+06 – 8.198e+06 | 0 |
| 8.198e+06 – 8.446e+06 | 0 |
| 8.446e+06 – 8.695e+06 | 0 |
| 8.695e+06 – 8.943e+06 | 0 |
| 8.943e+06 – 9.191e+06 | 0 |
| 9.191e+06 – 9.44e+06 | 0 |
| 9.44e+06 – 9.688e+06 | 0 |
| 9.688e+06 – 9.937e+06 | 1 |
Show data table
| value | count | share |
|---|---|---|
| 06 | 2999 | 95.4% |
| 15 | 64 | 2.0% |
| 25 | 40 | 1.3% |
| 04 | 13 | 0.4% |
| 05 | 11 | 0.3% |
| PL | 9 | 0.3% |
| 03 | 4 | 0.1% |
| 00 | 2 | 0.1% |
| 12 | 2 | 0.1% |
Schema
18 columns| Alerts | ||||
|---|---|---|---|---|
| STATEFP | numeric | 0.0% | 51 |
|
| COUNTYFP | numeric | 0.0% | 329 |
high_skew
outliers
|
| COUNTYNS | numeric | 0.0% | 3,144 |
|
| GEOIDFQ | text | 0.0% | 3,144 |
near_unique
one_word
allcaps
short_text
|
| GEOID | numeric | 0.0% | 3,144 |
|
| NAME | text | 0.0% | 1,838 |
one_word
short_text
duplicates
|
| NAMELSAD | text | 0.0% | 1,882 |
short_text
duplicates
|
| STUSPS | categorical | 0.0% | 51 |
|
| STATE_NAME | categorical | 0.0% | 51 |
|
| LSAD | categorical | 0.0% | 9 |
imbalance
|
| ALAND | numeric | 0.0% | 3,144 |
high_skew
outliers
|
| AWATER | numeric | 0.0% | 3,144 |
high_skew
outliers
|
| fips | numeric | 0.0% | 3,144 |
|
| active_duty_est | numeric | 0.0% | 3,028 |
high_skew
outliers
|
| veterans_est | numeric | 0.0% | 2,424 |
high_skew
outliers
|
| total_pop | numeric | 0.0% | 3,080 |
high_skew
outliers
|
| active_duty_per_10k | numeric | 0.0% | 3,144 |
|
| veterans_per_100 | numeric | 0.0% | 3,143 |
|
STATEFP
numeric foreign_keyThis is the US Census STATEFP code, a 1-2 digit FIPS identifier for states stored numerically. Values range from 1 to 56 with 51 unique entries across 3144 rows, matching the count of US states plus DC, and the row count aligns with the number of US counties. The near-uniform spread (skew -0.08, kurtosis -1.10) and zero outliers are consistent with a categorical state code rather than a measured quantity. Treatment: Cast to zero-padded string and treat as a categorical state key for joins, not a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- min
- 1
- max
- 56
- mean
- 30.26
- median
- 29
- std
- 15.15
- q1
- 18
- q3
- 45
- iqr
- 27
- skew
- -0.08128
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
COUNTYFP
numeric identifier high_skew outliersCOUNTYFP is the 3-digit FIPS county code, stored numerically across 3144 rows with no nulls and 329 unique values. The distribution is heavily right-skewed (skew 2.84, kurtosis 11.4) with a max of 840 well beyond Q3 of 133.5, flagging 176 outliers — expected behavior since FIPS codes are categorical identifiers, not measurements, and high values correspond to specific county assignments. Treatment: Cast to zero-padded string and combine with STATEFP to form a 5-digit GEOID join key; do not treat as numeric.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 329
- min
- 1
- max
- 840
- mean
- 103.9
- median
- 79
- std
- 107.6
- q1
- 35
- q3
- 133.5
- iqr
- 98.5
- skew
- 2.841
- kurtosis
- 11.38
- n_outliers
- 176
- outlier_rate
- 0.05598
- zero_rate
- 0
COUNTYNS
numeric identifierCOUNTYNS is the Census Bureau's permanent numeric ANSI/GNIS identifier for U.S. counties: all 3144 values are unique with no nulls or zeros, and the range (23901 to 2830254) matches the GNIS ID space. The distribution is broad but unremarkable (skew 0.17, kurtosis -0.80), as expected for an ID code rather than a measurement. Treatment: Treat as a county-level key for joins; do not use as a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 23,901
- max
- 2.83e+06
- mean
- 9.503e+05
- median
- 9.741e+05
- std
- 5.168e+05
- q1
- 4.85e+05
- q3
- 1.384e+06
- iqr
- 8.99e+05
- skew
- 0.1721
- kurtosis
- -0.8015
- n_outliers
- 11
- outlier_rate
- 0.003499
- zero_rate
- 0
GEOIDFQ
text identifier near_unique one_word allcaps short_textThis is the Census Bureau's fully-qualified GEOID (GEOIDFQ) for U.S. counties: every value is exactly 14 characters, single-token, all-caps, and follows the `0500000US` summary-level prefix followed by a state+county FIPS code. All 3144 rows are unique with no nulls or duplicates, matching the count of U.S. counties. Vocab size equals row count (3144), confirming it is a pure identifier with no analytical signal of its own. Treatment: Use as a left-join key against Census geographies; do not feed into models.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- len_min
- 14
- len_max
- 14
- len_mean
- 14
- len_median
- 14
- len_p95
- 14
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 3,144
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
GEOID
numeric identifierGEOID is the 5-digit FIPS code identifying US counties: every one of the 3,144 rows is unique with no nulls, and the range 1001 to 56045 matches the state+county FIPS convention (Alabama through Wyoming). The near-zero skew (-0.08) and flat kurtosis (-1.10) reflect roughly uniform coverage across state codes rather than any meaningful distribution. Treating these as numbers is misleading—they are categorical keys. Treatment: Cast to zero-padded string and use as a join key to county-level geographies; do not model as numeric.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 1,001
- max
- 56,045
- mean
- 3.037e+04
- median
- 29,174
- std
- 1.517e+04
- q1
- 1.817e+04
- q3
- 4.508e+04
- iqr
- 26,905
- skew
- -0.07923
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
NAME
text label one_word short_text duplicatesThis column holds short place names — almost certainly US county names, given the dominance of 'Washington' (31), 'Franklin' (26), 'Jefferson' (26), 'Lincoln' (24) and 'Madison' (20), all classic county namesakes. Values are overwhelmingly single-word (one_word_rate 0.934, word_mean 1.07) and short (len_mean 7.0, len_max 30), with no nulls. The 41.5% duplicate_rate is expected here: the same county name recurs across states, so 3144 rows collapse to 1838 unique strings. Treatment: Treat as a non-unique name; pair with a state/FIPS column before joining or grouping.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 1,838
- len_min
- 3
- len_max
- 30
- len_mean
- 7.05
- len_median
- 7
- len_p95
- 11
- word_mean
- 1.072
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 1,306
- duplicate_rate
- 0.4154
- vocab_size
- 1,875
- readability_flesch_mean
- 36.74
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.9342
- allcaps_rate
- 0
- boilerplate_rate
- 0
NAMELSAD
text label short_text duplicatesThis is the full legal name of US county-equivalents (NAMELSAD from Census TIGER), with 'county' appearing 2999 times alongside 64 'parish' and 47 'city' entries reflecting Louisiana and independent-city conventions. Names are short (mean 14.08 chars, ~2 words) and heavily duplicated across states: 1262 duplicates (40.1%) driven by repeated names like 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24). Only 1882 unique values across 3144 rows, so this field alone does not identify a county. Treatment: Use as a display label only; join on a state+FIPS code rather than this name to avoid duplicate collisions.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 1,882
- len_min
- 10
- len_max
- 46
- len_mean
- 14.08
- len_median
- 14
- len_p95
- 18
- word_mean
- 2.08
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 1,262
- duplicate_rate
- 0.4014
- vocab_size
- 1,883
- readability_flesch_mean
- 35.29
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0
- boilerplate_rate
- 0
STUSPS
categorical foreign_keySTUSPS is the USPS two-letter state abbreviation, with 51 distinct values across 3,144 rows — consistent with US states plus DC at the county grain. Distribution matches county counts: TX leads at 254 (8.08%), followed by GA (159), VA (133), and KY (120). No nulls and high entropy ratio (0.93) indicate clean, well-spread categorical coverage. Treatment: left-join on this code to state-level reference tables, or one-hot/target-encode for modelling.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- TX
- top_rate
- 0.08079
- cardinality
- 51
- entropy
- 5.277
- entropy_ratio
- 0.9304
STATE_NAME
categorical featureSTATE_NAME holds US state labels across 3,144 rows with exactly 51 unique values (50 states plus likely DC) and zero nulls. The distribution mirrors county counts per state: Texas leads at 254 (8.1%), followed by Georgia (159) and Virginia (133), consistent with this being one row per US county. Entropy ratio of 0.93 indicates a fairly even spread across states given their natural county-count differences. Treatment: use as a categorical grouping key or one-hot/target-encode for modelling.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 51
- top_value
- Texas
- top_rate
- 0.08079
- cardinality
- 51
- entropy
- 5.277
- entropy_ratio
- 0.9304
LSAD
categorical metadata imbalanceLSAD is a Census Legal/Statistical Area Description code identifying the type of geographic entity for each of 3144 rows. The distribution is extremely imbalanced: code '06' (county) accounts for 95.39% of rows, leaving only 9 distinct codes and an entropy ratio of 0.117. Minor categories like '15', '25', and 'PL' tail off quickly into single-digit counts. Treatment: Collapse rare codes into an 'other' bucket or drop, since one value dominates.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- 06
- top_rate
- 0.9539
- cardinality
- 9
- entropy
- 0.3707
- entropy_ratio
- 0.1169
ALAND
numeric feature high_skew outliersALAND looks like land-area in square meters for 3,144 unique geographic units (matching the U.S. county count), with no nulls or zeros. The distribution is extremely right-skewed (skew 26.8, kurtosis 953) — the median is 1.59B while the max reaches 377B, and 11.5% of rows flag as outliers. A handful of very large areas dominate the mean (2.91B) versus the median. Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 5.3e+06
- max
- 3.771e+11
- mean
- 2.911e+09
- median
- 1.594e+09
- std
- 9.306e+09
- q1
- 1.116e+09
- q3
- 2.394e+09
- iqr
- 1.277e+09
- skew
- 26.82
- kurtosis
- 953.2
- n_outliers
- 362
- outlier_rate
- 0.1151
- zero_rate
- 0
AWATER
numeric feature high_skew outliersAWATER is the standard Census TIGER field for water-area in square meters, here at what looks like county granularity given n=3144 unique values. The distribution is extremely right-skewed (skew 13.18, kurtosis 210.8): the median is 19.4M but the mean is 222M and the max reaches 25.99B, with 440 outliers (14.0% of rows). One row is zero, and all 3144 values are unique, so this behaves like a continuous geographic feature rather than a key. Treatment: Apply a log1p transform before any modelling to tame the 13.2 skew and heavy outlier tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 0
- max
- 2.599e+10
- mean
- 2.22e+08
- median
- 1.939e+07
- std
- 1.241e+09
- q1
- 7.132e+06
- q3
- 5.946e+07
- iqr
- 5.233e+07
- skew
- 13.18
- kurtosis
- 210.8
- n_outliers
- 440
- outlier_rate
- 0.1399
- zero_rate
- 0.0003181
fips
numeric identifierThis is the 5-digit US county FIPS code: every one of the 3144 rows is unique, there are no nulls, and the range 1001–56045 spans the standard state+county encoding. The distribution is essentially uniform across the code space (skew −0.08, kurtosis −1.10), as expected for an identifier rather than a measurement. Treatment: left-join on this id; do not use as a numeric feature.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 1,001
- max
- 56,045
- mean
- 3.037e+04
- median
- 29,174
- std
- 1.517e+04
- q1
- 1.817e+04
- q3
- 4.508e+04
- iqr
- 26,905
- skew
- -0.07923
- kurtosis
- -1.099
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
active_duty_est
numeric feature high_skew outliersNumeric estimate of active-duty population per record, with 3144 rows and 3028 unique values suggesting one row per geographic unit (likely county-level given the row count). The distribution is severely right-skewed (skew 13.14, kurtosis 288.57): median is 11698 but mean is 53782.95 and the max reaches 5240842, with 449 outliers (14.3%). No nulls or zeros, and the IQR of 27868 is dwarfed by a std of 176262.59. Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,028
- min
- 36
- max
- 5.241e+06
- mean
- 5.378e+04
- median
- 11,698
- std
- 1.763e+05
- q1
- 4722
- q3
- 3.259e+04
- iqr
- 27,868
- skew
- 13.14
- kurtosis
- 288.6
- n_outliers
- 449
- outlier_rate
- 0.1428
- zero_rate
- 0
veterans_est
numeric feature high_skew outliersEstimated veteran counts per row (likely US counties given n=3144), ranging from 0 to 244,160 with a median of 1,547.5 but a mean of 5,419.5. The distribution is heavily right-skewed (skew 8.01, kurtosis 100.0) with 408 outliers (12.98%) reflecting a few highly populous areas dwarfing the rest. Near-zero null and zero rates, so coverage is essentially complete. Treatment: log1p-transform before modelling to tame the heavy right skew.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 2,424
- min
- 0
- max
- 244,160
- mean
- 5419
- median
- 1548
- std
- 1.311e+04
- q1
- 634.8
- q3
- 4428
- iqr
- 3,793
- skew
- 8.014
- kurtosis
- 100
- n_outliers
- 408
- outlier_rate
- 0.1298
- zero_rate
- 0.0003181
total_pop
numeric feature high_skew outliersLooks like a per-row population total across 3,144 rows (suggestive of US counties), with no nulls and 3,080 unique values. The distribution is severely right-skewed (skew 13.17, kurtosis 289.76): median is 25,784.5 but the mean is 105,310.94 and the max reaches 9,936,690, with 440 rows (14.0%) flagged as outliers. Min is 50 and zero_rate is 0, so every row carries a real count. Treatment: log-transform before regression to tame the heavy right tail.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,080
- min
- 50
- max
- 9.937e+06
- mean
- 1.053e+05
- median
- 2.578e+04
- std
- 3.338e+05
- q1
- 1.084e+04
- q3
- 6.808e+04
- iqr
- 57,244
- skew
- 13.17
- kurtosis
- 289.8
- n_outliers
- 440
- outlier_rate
- 0.1399
- zero_rate
- 0
active_duty_per_10k
numeric featureA per-capita rate (active duty personnel per 10,000) reported across 3,144 rows with no nulls, no zeros, and every value unique. The distribution is tight around a mean of 4,693.79 and median of 4,733.28 with std 592.22, mildly left-skewed (-0.38), and 57 outliers (1.81%) span a range from 1,708.13 to 7,200.00. The 3,144 row count strongly suggests one record per US county. Treatment: Use as-is as a continuous feature; the mild skew does not require transformation.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,144
- min
- 1708
- max
- 7,200
- mean
- 4694
- median
- 4733
- std
- 592.2
- q1
- 4331
- q3
- 5102
- iqr
- 771.6
- skew
- -0.3768
- kurtosis
- 0.8418
- n_outliers
- 57
- outlier_rate
- 0.01813
- zero_rate
- 0
veterans_per_100
numeric featureThis column reports veterans per 100 residents, with 3143 unique values across 3144 rows (likely one row per US county). Values range from 0 to 18.09 with a mean of 6.19 and median of 5.98, showing a mild right skew (0.88) and 125 outliers (~3.98%) on the high end. Only one row is zero, so the distribution is effectively continuous and well-populated. Treatment: Use as-is for modelling; optionally winsorize the upper ~4% outliers.
- n
- 3,144
- nulls
- 0 (0.0%)
- unique
- 3,143
- min
- 0
- max
- 18.09
- mean
- 6.19
- median
- 5.985
- std
- 1.998
- q1
- 4.92
- q3
- 7.136
- iqr
- 2.216
- skew
- 0.8797
- kurtosis
- 2.029
- n_outliers
- 125
- outlier_rate
- 0.03976
- zero_rate
- 0.0003181