data trove us military veteran analysis

source /home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv 54 rows 23 columns profiled 2026-06-22 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · medium confidence anthropic:default

This is a 54-row, state-level dataset merging U.S. military and veteran demographics with firearm licensing, suicide rates, and installation-level economic data. The most striking signal is the veteran suicide rate (mean 35.6, range 24.9–52.3), which is roughly double the civilian suicide rate (mean 17.2, range 7.7–28.9), and the veteran_risk_ratio column directly quantifies this gap (mean 2.2x) across states. A second area worth scrutiny is the extreme right-skew in active_duty_per_100k (median 92, max 5,544) and ffl_per_100k (median 12, max 342), suggesting a handful of states—likely those hosting large installations—are pulling these distributions hard; about 22% of rows also carry heavy null rates on installation-level columns (county, installation, economic impact), meaning the installation-linked data covers only ~12 records. Analysts should examine how firearm density and military concentration interact with veteran mental health outcomes across states.

citing: veteran_suicide_rate.stats.mean · veteran_suicide_rate.stats.max · civilian_suicide_rate.stats.mean · veteran_risk_ratio.stats.mean · active_duty_per_100k.stats.median · active_duty_per_100k.stats.max · active_duty_per_100k.alerts · ffl_per_100k.stats.median · ffl_per_100k.stats.max · county.null_rate · installation.null_rate · ptsd_prevalence_pct.null_rate · row_count

Charts the summary said to look at first

veteran_suicide_rate · Look for the spread and upper tail — rates above 40 in some states reveal where veteran suicide risk is most acute.

Show data table

Histogram bins for veteran_suicide_rate (median: 34.65).
bin	count
24.9 – 28.81	12
28.81 – 32.73	10
32.73 – 36.64	10
36.64 – 40.56	8
40.56 – 44.47	6
44.47 – 48.39	4
48.39 – 52.3	4

civilian_suicide_rate · Compare this distribution against the veteran rate to visually confirm the roughly 2x risk gap across states.

Show data table

Histogram bins for civilian_suicide_rate (median: 17.1).
bin	count
7.7 – 10.73	10
10.73 – 13.76	9
13.76 – 16.79	7
16.79 – 19.81	9
19.81 – 22.84	7
22.84 – 25.87	7
25.87 – 28.9	5

ffl_per_100k · A heavily right-skewed distribution with a max of 342 — identify the outlier states driving firearm density far above the median of 12.

Show data table

Histogram bins for ffl_per_100k (median: 12.368927310188392).
bin	count
1.377 – 49.99	41
49.99 – 98.6	4
98.6 – 147.2	2
147.2 – 195.8	1
195.8 – 244.4	2
244.4 – 293	0
293 – 341.7	3

va_utilization_pct · VA utilization ranges from 14% to 42%, worth comparing against suicide rates to see if access to care correlates with outcomes.

Show data table

Histogram bins for va_utilization_pct (median: 26.2).
bin	count
13.8 – 17.87	8
17.87 – 21.94	9
21.94 – 26.01	10
26.01 – 30.09	7
30.09 – 34.16	8
34.16 – 38.23	7
38.23 – 42.3	5

state · California, Texas, and North Carolina appear multiple times, reflecting multiple installations — check whether multi-row states skew aggregate metrics.

Show data table

Top values for state (20 unique shown, of 50 total).
value	count	share
California	3	5.6%
North Carolina	2	3.7%
Texas	2	3.7%
Alabama	1	1.9%
Alaska	1	1.9%
Arizona	1	1.9%
Arkansas	1	1.9%
Colorado	1	1.9%
Connecticut	1	1.9%
Delaware	1	1.9%
Florida	1	1.9%
Georgia	1	1.9%
Hawaii	1	1.9%
Idaho	1	1.9%
Illinois	1	1.9%
Indiana	1	1.9%
Iowa	1	1.9%
Kansas	1	1.9%
Kentucky	1	1.9%
Louisiana	1	1.9%

Schema

23 columns

Per-column summary. Click column name to jump to its detail.
				Alerts
NAME	categorical	1.9%	49	long_tail
state	categorical	0.0%	50	long_tail
veteran_population	numeric	1.9%	49
total_population	numeric	1.9%	49
veteran_percentage	numeric	1.9%	49	high_skew outliers
active_duty_personnel	numeric	0.0%	50	outliers
ownership_percentage	numeric	0.0%	48	outliers
ffl_count	numeric	0.0%	50	outliers
veteran_suicide_rate	numeric	0.0%	50
civilian_suicide_rate	numeric	0.0%	50
veteran_risk_ratio	numeric	0.0%	41
ptsd_prevalence_pct	numeric	46.3%	25	null_rate
va_users_with_ptsd_pct	numeric	46.3%	25	null_rate
spouse_unemployment_rate	numeric	64.8%	15	null_rate outliers
spouse_labor_force_participation	numeric	64.8%	15	null_rate
va_utilization_pct	numeric	0.0%	50
installation	categorical	77.8%	12	long_tail null_rate
county	categorical	77.8%	10	long_tail null_rate
annual_economic_impact_millions	numeric	77.8%	12	null_rate
total_personnel	numeric	77.8%	12	null_rate
direct_jobs	numeric	77.8%	12	null_rate
ffl_per_100k	numeric	1.9%	49	high_skew outliers
active_duty_per_100k	numeric	1.9%	49	high_skew outliers

NAME

categorical label long_tail

This column contains U.S. state names, functioning as a geographic label or identifier in a small dataset of 54 rows. With 49 unique values and an entropy ratio of 0.991, cardinality is near-maximal — almost every row has a distinct state name. The top value 'California' appears only 3 times (5.66%), and the long-tail alert confirms that most values appear just once, suggesting this may be a nearly one-per-state lookup table with a handful of duplicates. Treatment: Use as a geographic join key or group-by label; deduplicate or aggregate the 5 repeated state entries (California ×3, North Carolina ×2, Texas ×2) before any state-level analysis. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
top_value: California
top_rate: 0.0566
cardinality: 49
entropy: 5.563
entropy_ratio: 0.9907

state

categorical label long_tail

This column contains U.S. state names, with 50 unique values across only 54 rows — nearly one entry per state, suggesting near-complete national coverage. California appears 3 times (top_rate 5.6%) and North Carolina and Texas appear twice each, while the remaining 47 states appear exactly once. The entropy ratio of 0.991 confirms an almost flat distribution, and the long_tail alert is technically triggered but is largely an artifact of the tiny dataset size rather than meaningful concentration. Treatment: Use as a categorical grouping key; encode with target or ordinal encoding if modelling, or use as a join/filter dimension for geographic aggregation. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
top_value: California
top_rate: 0.05556
cardinality: 50
entropy: 5.593
entropy_ratio: 0.9909

veteran_population

numeric feature

This column represents veteran population counts, likely at the U.S. state or territory level given the 54 rows and the plausible range of 61,090 to 1,786,891. The distribution is remarkably symmetric (skew ≈ 0.009) and platykurtic (kurtosis ≈ −1.35), meaning values are broadly spread across the range with no sharp central peak and no outliers detected. The IQR of 988,439 relative to a mean of ~820,444 indicates substantial spread across geographies, consistent with large population differences between small and large states. Treatment: Use as-is or normalize by total population to create a veteran share ratio before modelling. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
min: 61,090
max: 1.787e+06
mean: 8.204e+05
median: 811,743
std: 5.29e+05
q1: 279,178
q3: 1.268e+06
iqr: 988,439
skew: 0.009014
kurtosis: -1.347
n_outliers: 0
outlier_rate: 0
zero_rate: 0

total_population

numeric feature

This column represents total population counts, almost certainly at a regional or state/province level given the value range (min 548,984 to max 39,227,468) and the small row count of 54 rows — consistent with US states or similar administrative units. The distribution is notably flat and near-uniform: kurtosis of -1.24 indicates lighter tails than normal, skew is near zero (0.093), and the IQR of 22,033,717 spans more than half the full range, confirming wide spread without outliers. There are 5 duplicate values among 54 rows (49 unique) and a 1.85% null rate (roughly 1 missing record) worth investigating. Treatment: Use as-is or normalize per area/density for modelling; investigate the 1 null row and 5 duplicate values before joining or aggregating. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
min: 548,984
max: 3.923e+07
mean: 1.87e+07
median: 1.958e+07
std: 1.247e+07
q1: 6.898e+06
q3: 2.893e+07
iqr: 2.203e+07
skew: 0.09278
kurtosis: -1.243
n_outliers: 0
outlier_rate: 0
zero_rate: 0

veteran_percentage

numeric feature high_skew outliers

This column represents veteran percentage, likely a share (%) of veterans within some geographic or demographic unit across 54 records. The distribution is severely right-skewed (skew=6.03, kurtosis=37.997) with a median of 4.85% but a mean of 14.09%, driven by 9 outliers (17% of rows) including a maximum of 277.05 — a value that cannot represent a valid percentage and almost certainly reflects a data quality issue such as a raw count, a decimal-point error, or a different unit entirely. The std of 38.90 dwarfs the IQR of 5.89, confirming extreme contamination from these outliers. Treatment: Investigate and cap or correct values exceeding 100 (max=277.05 is impossible for a percentage), then consider log-transform or winsorization before modelling. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
min: 0.22
max: 277.1
mean: 14.09
median: 4.85
std: 38.9
q1: 1.88
q3: 7.77
iqr: 5.89
skew: 6.031
kurtosis: 38
n_outliers: 9
outlier_rate: 0.1698
zero_rate: 0

active_duty_personnel

numeric feature outliers

This column represents the count of active duty military personnel per record (likely per country or military branch), with 54 observations and no nulls. The distribution is heavily right-skewed (skew=1.59) with a median of 11,584 but a mean of 34,003, driven by a long upper tail stretching to 162,362. Eight records (14.8% of the dataset) are flagged as outliers, indicating a small number of entities with disproportionately large forces. The IQR of 38,504 vs. a std of 46,995 confirms the spread is dominated by extreme high-end values. Treatment: Log-transform before regression or clustering to reduce skew impact from high-value outliers. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
min: 1,166
max: 162,362
mean: 3.4e+04
median: 11,584
std: 4.7e+04
q1: 3884
q3: 4.239e+04
iqr: 3.85e+04
skew: 1.594
kurtosis: 1.291
n_outliers: 8
outlier_rate: 0.1481
zero_rate: 0

ownership_percentage

numeric feature outliers

This column represents ownership percentage stakes, likely equity shareholdings in companies or assets, ranging from 14.7% to 66.3% across 54 records with no nulls. The distribution is moderately left-skewed (skew = -0.69) with values tightly clustered between Q1 40.05% and Q3 51.4%, suggesting most holdings hover around majority or near-majority control thresholds. Notably, 5 outliers (~9.3% of rows) pull the lower tail, and the max of 66.3% implies no full buyouts are present. The near-platykurtic shape (kurtosis ≈ 0) indicates an unusually flat, spread-out distribution rather than a peaked one. Treatment: Use as-is or clip outliers at IQR boundaries before modelling; consider binning into control-threshold buckets (minority <50%, majority ≥50%). high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 48
min: 14.7
max: 66.3
mean: 43.58
median: 45.75
std: 13.04
q1: 40.05
q3: 51.4
iqr: 11.35
skew: -0.6887
kurtosis: -0.002081
n_outliers: 5
outlier_rate: 0.09259
zero_rate: 0

ffl_count

numeric feature outliers

This column represents a count of Federal Firearms Licensees (FFL) — likely per geographic unit such as state or county — across 54 observations with no nulls. The distribution is right-skewed (skew = 1.66) with a wide IQR of 2496.5 and a standard deviation (2772.66) nearly equal to the mean (3073.11), signaling high dispersion. Five outliers (≈9.3% of rows) pull the tail toward the maximum of 10904, while the minimum is 220 and median only 2096.5, confirming a long upper tail. The kurtosis of 2.0 is moderate, suggesting the outliers are notable but not extreme relative to a normal distribution. Treatment: Log-transform before regression or modelling to reduce right skew; investigate the 5 outlier rows for data integrity. medium · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
min: 220
max: 10,904
mean: 3073
median: 2096
std: 2773
q1: 1300
q3: 3,796
iqr: 2496
skew: 1.66
kurtosis: 2.002
n_outliers: 5
outlier_rate: 0.09259
zero_rate: 0

veteran_suicide_rate

numeric numeric_target

This column contains veteran suicide rates, likely per 100,000 veterans, covering 54 observations (probably U.S. states plus a few territories or summary rows, given n=54 and 50 unique values). Values range from 24.9 to 52.3 with a mean of 35.6 and median of 34.65, indicating a relatively symmetric but mildly right-skewed distribution (skew=0.498). Noteworthy is the wide spread—an IQR of ~10.85 and max nearly double the min—highlighting substantial geographic disparity in veteran suicide rates, yet no statistical outliers were flagged. Treatment: Use as-is or apply mild log-transform if residuals show heteroscedasticity; investigate the 4 duplicate values among 54 rows before modelling. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
min: 24.9
max: 52.3
mean: 35.64
median: 34.65
std: 7.393
q1: 29.8
q3: 40.65
iqr: 10.85
skew: 0.4978
kurtosis: -0.6953
n_outliers: 0
outlier_rate: 0
zero_rate: 0

civilian_suicide_rate

numeric feature

This column represents a civilian suicide rate (likely per 100,000 population) across 54 records, with no nulls and no zeros. The distribution is notably well-behaved: near-zero skew (0.17), platykurtic shape (kurtosis −1.09), and no detected outliers, suggesting values spread broadly but uniformly between 7.7 and 28.9. The mean (17.22) and median (17.1) are nearly identical, and the IQR of 9.8 covers a substantial range, indicating genuine cross-unit variation rather than clustering. Treatment: Use as-is in modelling; no transformation needed given near-symmetric distribution and absence of outliers. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
min: 7.7
max: 28.9
mean: 17.22
median: 17.1
std: 6.026
q1: 12.2
q3: 22
iqr: 9.8
skew: 0.1737
kurtosis: -1.092
n_outliers: 0
outlier_rate: 0
zero_rate: 0

veteran_risk_ratio

numeric feature

This column represents a risk ratio specifically for a veteran population, with all 54 rows populated and no outliers detected. Values range from 1.8 to 3.23, with a mean of ~2.20 and median of 2.025, indicating veterans in this dataset are consistently at elevated risk (all values above 1.0 by a wide margin). The distribution is moderately right-skewed (skew ≈ 0.95) with a relatively tight IQR of 0.59, suggesting most observations cluster in the 1.85–2.44 range but a tail of higher-risk cases pulls the mean upward. The near-platykurtic shape (kurtosis ≈ -0.31) and 41 unique values out of 54 rows suggest this is a continuous derived metric rather than a categorised score. Treatment: Use as-is or apply mild log-transform to reduce right skew before regression or classification modelling. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 41
min: 1.8
max: 3.23
mean: 2.197
median: 2.025
std: 0.4101
q1: 1.85
q3: 2.44
iqr: 0.59
skew: 0.9459
kurtosis: -0.3143
n_outliers: 0
outlier_rate: 0
zero_rate: 0

ptsd_prevalence_pct

numeric feature null_rate

This column captures PTSD prevalence as a percentage, likely drawn from epidemiological or clinical survey data across 54 records. The most striking issue is a 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., tied to a subgroup or data source). Among the 29 non-null values, the distribution is fairly compact (min 6.3, max 12.3, IQR 3.0) with a near-flat kurtosis of −1.05, suggesting a roughly uniform spread rather than a peaked central cluster. Only 25 unique values across 29 non-null rows implies some repeated percentage figures, possibly due to rounding or grouped reporting. Treatment: Investigate missingness mechanism before imputing; if MAR, impute with group-level median; if MNAR, model missingness as a separate binary indicator. medium · anthropic:default

n: 54
nulls: 25 (46.3%)
unique: 25
min: 6.3
max: 12.3
mean: 8.621
median: 8.3
std: 1.848
q1: 7.1
q3: 10.1
iqr: 3
skew: 0.4299
kurtosis: -1.049
n_outliers: 0
outlier_rate: 0
zero_rate: 0

va_users_with_ptsd_pct

numeric feature null_rate

This column represents the percentage of VA users diagnosed with PTSD, likely aggregated at a state or facility level given n=54 (matching U.S. states/territories). The most surprising signal is the 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., certain facility types or regions not reporting). Among observed values, the distribution is fairly uniform (kurtosis –1.13, near-zero skew of 0.32) ranging from 7.9% to 18.5% with no outliers, suggesting genuine geographic variation rather than data error. Treatment: Investigate missingness pattern before use; impute or subset to complete cases, then use as-is (no transform needed given near-normal distribution). medium · anthropic:default

n: 54
nulls: 25 (46.3%)
unique: 25
min: 7.9
max: 18.5
mean: 12.26
median: 11.9
std: 3.315
q1: 9.5
q3: 14.8
iqr: 5.3
skew: 0.3168
kurtosis: -1.132
n_outliers: 0
outlier_rate: 0
zero_rate: 0

spouse_unemployment_rate

numeric feature null_rate outliers

This column records the unemployment rate of a respondent's spouse, expressed as a percentage. It is missing for 64.81% of records — almost certainly because many respondents have no spouse — making the high null rate structurally expected rather than a data quality failure. Among the 19 non-null observations, values cluster tightly between 7.35 and 16.28 with a mean of ~10.46 and median of ~10.34, suggesting a near-symmetric distribution (skew 0.77, kurtosis 0.08); one outlier at 16.28 sits just beyond the upper fence. Only 15 unique values across 19 observations hints the rate may be recorded at a coarse or categorical granularity. Treatment: Model nulls as a separate binary indicator (has_spouse); impute or subset non-null rows for any spouse-specific analysis; investigate the single outlier at 16.28 before inclusion. medium · anthropic:default

n: 54
nulls: 35 (64.8%)
unique: 15
min: 7.35
max: 16.28
mean: 10.46
median: 10.34
std: 2.382
q1: 8.59
q3: 11.45
iqr: 2.86
skew: 0.7742
kurtosis: 0.07957
n_outliers: 1
outlier_rate: 0.05263
zero_rate: 0

spouse_labor_force_participation

numeric feature null_rate

This column represents the labor force participation rate (likely as a percentage) of spouses in a surveyed population. The most striking feature is a null rate of 64.81% — nearly two-thirds of the 54 rows are missing, triggering an alert and severely limiting usability. Among the 19 non-null observations, values are tightly clustered between 66.8 and 73.4 with a mean of ~69.7 and IQR of only 2.2, suggesting very low variance and minimal outlier risk within the observed subset. Treatment: Investigate missingness mechanism before use; if MAR/MCAR, consider imputation with caution given only 19 valid observations across 15 unique values. medium · anthropic:default

n: 54
nulls: 35 (64.8%)
unique: 15
min: 66.8
max: 73.4
mean: 69.69
median: 69.8
std: 1.68
q1: 68.35
q3: 70.55
iqr: 2.2
skew: 0.3659
kurtosis: -0.3936
n_outliers: 0
outlier_rate: 0
zero_rate: 0

va_utilization_pct

numeric feature

This column represents a utilization percentage for VA (likely Veterans Affairs) resources or capacity, expressed as a numeric rate across 54 records. The distribution is notably uniform and platykurtic (kurtosis ≈ −1.05), meaning values are spread broadly and flatly across the range 13.8–42.3 with no outliers and near-zero skew (0.165). The mean (27.04) and median (26.2) are tightly aligned, and the IQR of 12.48 spans a moderate but consistent band, suggesting this metric reflects genuine variation across units or time periods rather than a concentrated signal. Treatment: Use as-is in modelling; no transformation needed given near-symmetric, outlier-free distribution. high · anthropic:default

n: 54
nulls: 0 (0.0%)
unique: 50
min: 13.8
max: 42.3
mean: 27.04
median: 26.2
std: 7.952
q1: 21
q3: 33.48
iqr: 12.48
skew: 0.1651
kurtosis: -1.051
n_outliers: 0
outlier_rate: 0
zero_rate: 0

installation

categorical label long_tail null_rate

This column records U.S. military installation names (bases and airfields) associated with records in the dataset. The most striking issue is that 77.78% of the 54 rows are null, leaving only 12 non-null values — each appearing exactly once, yielding a perfectly uniform distribution (entropy_ratio = 1.0) with no dominant installation. The long-tail alert is somewhat misleading given the uniformity; the real concern is the extreme missingness, which severely limits analytical utility. Treatment: Investigate source of 77.78% nulls before use; if imputation is not feasible, treat as a sparse categorical feature or exclude from models dependent on this column. high · anthropic:default

n: 54
nulls: 42 (77.8%)
unique: 12
top_value: Naval Base San Diego, CA
top_rate: 0.08333
cardinality: 12
entropy: 3.585
entropy_ratio: 1

county

categorical metadata long_tail null_rate

This column represents a US county name associated with each record, likely geographic metadata for a location or address field. The most striking issue is a 77.78% null rate — only 12 of 54 rows have a value at all, rendering the column nearly empty. Among the 12 non-null values, cardinality is 10 with a near-uniform distribution (entropy ratio 0.979), meaning almost every populated entry is a distinct county, with only San Diego and Cumberland appearing twice. Treatment: Investigate source of missing values before use; with 77.78% nulls the column is unreliable as a feature without significant imputation or enrichment. high · anthropic:default

n: 54
nulls: 42 (77.8%)
unique: 10
top_value: San Diego
top_rate: 0.1667
cardinality: 10
entropy: 3.252
entropy_ratio: 0.9788

annual_economic_impact_millions

numeric feature null_rate

This column records estimated annual economic impact in millions of currency units, but 77.78% of the 54 rows are null — meaning only 12 rows carry a value, and those 12 values collapse to just 12 unique entries (effectively no repeats among non-null rows). The non-null values span 15,000 to 41,000 with a mean of ~23,833 and an IQR of 10,250, indicating a wide but plausibly real spread across large-scale economic entities; skew is moderate (0.87) and no outliers are flagged. The extreme null rate is the dominant concern and strongly suggests this field is sparsely populated by design or data collection failure, not random missingness. Treatment: Investigate source of 77.78% nulls before use; if imputation is not justified, consider as a sparse auxiliary feature or drop depending on model tolerance for missingness. medium · anthropic:default

n: 54
nulls: 42 (77.8%)
unique: 12
min: 15,000
max: 41,000
mean: 2.383e+04
median: 21,500
std: 8178
q1: 17,750
q3: 28,000
iqr: 10,250
skew: 0.8718
kurtosis: -0.3627
n_outliers: 0
outlier_rate: 0
zero_rate: 0

total_personnel

numeric feature null_rate

This column represents total personnel counts, likely headcount figures for organizations, units, or facilities. The most striking issue is a null rate of 77.78% — only 12 of 54 rows have a value — making it severely under-populated and potentially unreliable for modelling. Among the 12 non-null values, only 12 unique values exist (suggesting no repeated counts), ranging from 27,000 to 82,000 with a mean of ~46,083 and mild positive skew (0.855). The distribution is platykurtic (kurtosis ≈ 0.006) and has no outliers, so the non-null values themselves are internally well-behaved. Treatment: Investigate source of 77.78% missingness before use; impute or drop depending on missingness mechanism, then consider log-transform given positive skew. medium · anthropic:default

n: 54
nulls: 42 (77.8%)
unique: 12
min: 27,000
max: 82,000
mean: 4.608e+04
median: 43,500
std: 1.621e+04
q1: 34,250
q3: 53,500
iqr: 19,250
skew: 0.8549
kurtosis: 0.005583
n_outliers: 0
outlier_rate: 0
zero_rate: 0

direct_jobs

numeric feature null_rate

This column records the count of direct jobs associated with each record — likely a project, investment, or contract — with values ranging from 8,500 to 25,000 and a mean of ~14,292. The most striking signal is an extremely high null rate of 77.78%, meaning only 12 of 54 rows carry a value, which severely limits its usability. Among the non-null values, only 12 distinct figures exist across those 12 populated rows (effectively all unique), suggesting manual entry or discrete reporting tiers rather than continuous measurement. Distribution is mildly right-skewed (skew 0.81) with no outliers detected. Treatment: Impute or flag missing values (77.78% null rate) before use; consider whether missingness is systematic before including in any model. medium · anthropic:default

n: 54
nulls: 42 (77.8%)
unique: 12
min: 8,500
max: 25,000
mean: 1.429e+04
median: 13,500
std: 4883
q1: 10,750
q3: 16,500
iqr: 5,750
skew: 0.8149
kurtosis: -0.06741
n_outliers: 0
outlier_rate: 0
zero_rate: 0

ffl_per_100k

numeric feature high_skew outliers

This column represents the number of Federal Firearms Licensees (FFLs) per 100,000 residents, almost certainly measured at the U.S. state (or territory) level given n=54. The distribution is severely right-skewed (skew=2.47, kurtosis=5.08): the median is only 12.37 but the mean is pulled to 49.44 by extreme outliers, with a maximum of 341.66 — over 27× the median. Eight observations (15.1%) are flagged as outliers, likely small-population states or territories where per-capita rates explode due to a low denominator. Treatment: Log-transform before modelling to reduce skew; investigate the 8 outliers for small-denominator population artifacts before inclusion. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
min: 1.377
max: 341.7
mean: 49.44
median: 12.37
std: 87.37
q1: 7.034
q3: 31.09
iqr: 24.05
skew: 2.474
kurtosis: 5.076
n_outliers: 8
outlier_rate: 0.1509
zero_rate: 0

active_duty_per_100k

numeric feature high_skew outliers

This column represents active-duty military personnel per 100,000 residents, almost certainly measured at the U.S. state/territory level (n=54 aligns with states plus D.C. and territories). The distribution is severely right-skewed (skew=2.91, kurtosis=7.22): the median is only 92.5 yet the mean is 610.7 and the max reaches 5,544, driven by 9 outliers (17% of rows) — almost certainly states with large military installations such as Hawaii, Alaska, Virginia, or small-population territories. The IQR of 301 versus a std of 1,389 further confirms extreme concentration at low values with a long heavy tail. Treatment: Log-transform (log1p) before regression or clustering to reduce skew; consider flagging or separately modelling the 9 outlier rows. high · anthropic:default

n: 54
nulls: 1 (1.9%)
unique: 49
min: 3.105
max: 5544
mean: 610.7
median: 92.49
std: 1389
q1: 21.96
q3: 323.3
iqr: 301.3
skew: 2.911
kurtosis: 7.217
n_outliers: 9
outlier_rate: 0.1698
zero_rate: 0