data-trove-us-military-veteran-analysis

Overview

Source: /home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv

Saturn profiled 54 rows across 23 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv",
    "--findings", "data-trove-us-military-veteran-analysis.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This is a 54-row, state-level dataset merging U.S. military and veteran demographics with firearm licensing, suicide rates, and installation-level economic data. The most striking signal is the veteran suicide rate (mean 35.6, range 24.9–52.3), which is roughly double the civilian suicide rate (mean 17.2, range 7.7–28.9), and the veteran_risk_ratio column directly quantifies this gap (mean 2.2x) across states. A second area worth scrutiny is the extreme right-skew in active_duty_per_100k (median 92, max 5,544) and ffl_per_100k (median 12, max 342), suggesting a handful of states—likely those hosting large installations—are pulling these distributions hard; about 22% of rows also carry heavy null rates on installation-level columns (county, installation, economic impact), meaning the installation-linked data covers only ~12 records. Analysts should examine how firearm density and military concentration interact with veteran mental health outcomes across states.

citing: veteran_suicide_rate.stats.mean · veteran_suicide_rate.stats.max · civilian_suicide_rate.stats.mean · veteran_risk_ratio.stats.mean · active_duty_per_100k.stats.median · active_duty_per_100k.stats.max · active_duty_per_100k.alerts · ffl_per_100k.stats.median · ffl_per_100k.stats.max · county.null_rate · installation.null_rate · ptsd_prevalence_pct.null_rate · row_count

Out[4]:

saturn.schema() · 23 columns

column	kind	n	null%	unique	alerts
NAME	categorical	54	1.9%	49	long_tail
state	categorical	54	0.0%	50	long_tail
veteran_population	numeric	54	1.9%	49
total_population	numeric	54	1.9%	49
veteran_percentage	numeric	54	1.9%	49	high_skew outliers
active_duty_personnel	numeric	54	0.0%	50	outliers
ownership_percentage	numeric	54	0.0%	48	outliers
ffl_count	numeric	54	0.0%	50	outliers
veteran_suicide_rate	numeric	54	0.0%	50
civilian_suicide_rate	numeric	54	0.0%	50
veteran_risk_ratio	numeric	54	0.0%	41
ptsd_prevalence_pct	numeric	54	46.3%	25	null_rate
va_users_with_ptsd_pct	numeric	54	46.3%	25	null_rate
spouse_unemployment_rate	numeric	54	64.8%	15	null_rate outliers
spouse_labor_force_participation	numeric	54	64.8%	15	null_rate
va_utilization_pct	numeric	54	0.0%	50
installation	categorical	54	77.8%	12	long_tail null_rate
county	categorical	54	77.8%	10	long_tail null_rate
annual_economic_impact_millions	numeric	54	77.8%	12	null_rate
total_personnel	numeric	54	77.8%	12	null_rate
direct_jobs	numeric	54	77.8%	12	null_rate
ffl_per_100k	numeric	54	1.9%	49	high_skew outliers
active_duty_per_100k	numeric	54	1.9%	49	high_skew outliers

Fig 1.

veteran_suicide_rate · Look for the spread and upper tail — rates above 40 in some states reveal where veteran suicide risk is most acute.

Show data table

Histogram bins for veteran_suicide_rate (median: 34.65).
bin	count
24.9 – 28.81	12
28.81 – 32.73	10
32.73 – 36.64	10
36.64 – 40.56	8
40.56 – 44.47	6
44.47 – 48.39	4
48.39 – 52.3	4

Fig 2.

civilian_suicide_rate · Compare this distribution against the veteran rate to visually confirm the roughly 2x risk gap across states.

Show data table

Histogram bins for civilian_suicide_rate (median: 17.1).
bin	count
7.7 – 10.73	10
10.73 – 13.76	9
13.76 – 16.79	7
16.79 – 19.81	9
19.81 – 22.84	7
22.84 – 25.87	7
25.87 – 28.9	5

Fig 3.

ffl_per_100k · A heavily right-skewed distribution with a max of 342 — identify the outlier states driving firearm density far above the median of 12.

Show data table

Histogram bins for ffl_per_100k (median: 12.368927310188392).
bin	count
1.377 – 49.99	41
49.99 – 98.6	4
98.6 – 147.2	2
147.2 – 195.8	1
195.8 – 244.4	2
244.4 – 293	0
293 – 341.7	3

Fig 4.

va_utilization_pct · VA utilization ranges from 14% to 42%, worth comparing against suicide rates to see if access to care correlates with outcomes.

Show data table

Histogram bins for va_utilization_pct (median: 26.2).
bin	count
13.8 – 17.87	8
17.87 – 21.94	9
21.94 – 26.01	10
26.01 – 30.09	7
30.09 – 34.16	8
34.16 – 38.23	7
38.23 – 42.3	5

Fig 5.

state · California, Texas, and North Carolina appear multiple times, reflecting multiple installations — check whether multi-row states skew aggregate metrics.

Show data table

Top values for state (20 unique shown, of 50 total).
value	count	share
California	3	5.6%
North Carolina	2	3.7%
Texas	2	3.7%
Alabama	1	1.9%
Alaska	1	1.9%
Arizona	1	1.9%
Arkansas	1	1.9%
Colorado	1	1.9%
Connecticut	1	1.9%
Delaware	1	1.9%
Florida	1	1.9%
Georgia	1	1.9%
Hawaii	1	1.9%
Idaho	1	1.9%
Illinois	1	1.9%
Indiana	1	1.9%
Iowa	1	1.9%
Kansas	1	1.9%
Kentucky	1	1.9%
Louisiana	1	1.9%

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
NAME	categorical	1.9%
state	categorical	0.0%
veteran_population	numeric	1.9%
total_population	numeric	1.9%
veteran_percentage	numeric	1.9%
active_duty_personnel	numeric	0.0%
ownership_percentage	numeric	0.0%
ffl_count	numeric	0.0%
veteran_suicide_rate	numeric	0.0%
civilian_suicide_rate	numeric	0.0%
veteran_risk_ratio	numeric	0.0%
ptsd_prevalence_pct	numeric	46.3%
va_users_with_ptsd_pct	numeric	46.3%
spouse_unemployment_rate	numeric	64.8%
spouse_labor_force_participation	numeric	64.8%
va_utilization_pct	numeric	0.0%
installation	categorical	77.8%
county	categorical	77.8%
annual_economic_impact_millions	numeric	77.8%
total_personnel	numeric	77.8%
direct_jobs	numeric	77.8%
ffl_per_100k	numeric	1.9%
active_duty_per_100k	numeric	1.9%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
	veteran_population	total_population	veteran_percentage	active_duty_personnel	ownership_percentage	ffl_count	veteran_suicide_rate	civilian_suicide_rate	veteran_risk_ratio	ptsd_prevalence_pct	va_users_with_ptsd_pct	spouse_unemployment_rate
veteran_population	+1.00	+0.37	+0.36	-0.79	+0.28	-0.73	+0.30	+0.34	-0.48	-0.26	-0.23	-0.35
total_population	+0.37	+1.00	-0.40	-0.62	+0.80	-0.63	+0.74	+0.74	-0.75	+0.51	+0.52	+0.38
veteran_percentage	+0.36	-0.40	+1.00	-0.31	-0.17	-0.36	-0.21	-0.21	+0.15	-0.20	-0.21	-0.35
active_duty_personnel	-0.79	-0.62	-0.31	+1.00	-0.53	+0.96	-0.60	-0.61	+0.70	-0.28	-0.29	-0.03
ownership_percentage	+0.28	+0.80	-0.17	-0.53	+1.00	-0.60	+0.87	+0.87	-0.89	+0.66	+0.67	+0.53
ffl_count	-0.73	-0.63	-0.36	+0.96	-0.60	+1.00	-0.60	-0.61	+0.68	-0.35	-0.37	-0.16
veteran_suicide_rate	+0.30	+0.74	-0.21	-0.60	+0.87	-0.60	+1.00	+1.00	-0.95	+0.56	+0.58	+0.67
civilian_suicide_rate	+0.34	+0.74	-0.21	-0.61	+0.87	-0.61	+1.00	+1.00	-0.97	+0.54	+0.56	+0.63
veteran_risk_ratio	-0.48	-0.75	+0.15	+0.70	-0.89	+0.68	-0.95	-0.97	+1.00	-0.54	-0.57	-0.48
ptsd_prevalence_pct	-0.26	+0.51	-0.20	-0.28	+0.66	-0.35	+0.56	+0.54	-0.54	+1.00	+1.00	+0.60
va_users_with_ptsd_pct	-0.23	+0.52	-0.21	-0.29	+0.67	-0.37	+0.58	+0.56	-0.57	+1.00	+1.00	+0.60
spouse_unemployment_rate	-0.35	+0.38	-0.35	-0.03	+0.53	-0.16	+0.67	+0.63	-0.48	+0.60	+0.60	+1.00

NAME categorical label

This column contains U.S. state names, functioning as a geographic label or identifier in a small dataset of 54 rows. With 49 unique values and an entropy ratio of 0.991, cardinality is near-maximal — almost every row has a distinct state name. The top value 'California' appears only 3 times (5.66%), and the long-tail alert confirms that most values appear just once, suggesting this may be a nearly one-per-state lookup table with a handful of duplicates.

Treatment: Use as a geographic join key or group-by label; deduplicate or aggregate the 5 repeated state entries (California ×3, North Carolina ×2, Texas ×2) before any state-level analysis.

anthropic:default · confidence high

Out[13]:

saturn.columns["NAME"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
top_value	California
top_rate	0.0566
cardinality	49
entropy	5.563
entropy_ratio	0.9907
alert: long_tail	46 singleton categories

Fig 8.

Top values for NAME.

Show data table

Top values for NAME (20 unique shown, of 49 total).
value	count	share
California	3	5.6%
North Carolina	2	3.7%
Texas	2	3.7%
Alabama	1	1.9%
Alaska	1	1.9%
Arizona	1	1.9%
Arkansas	1	1.9%
Colorado	1	1.9%
Connecticut	1	1.9%
Delaware	1	1.9%
Florida	1	1.9%
Georgia	1	1.9%
Hawaii	1	1.9%
Idaho	1	1.9%
Illinois	1	1.9%
Indiana	1	1.9%
Iowa	1	1.9%
Kansas	1	1.9%
Kentucky	1	1.9%
Louisiana	1	1.9%

state categorical label

This column contains U.S. state names, with 50 unique values across only 54 rows — nearly one entry per state, suggesting near-complete national coverage. California appears 3 times (top_rate 5.6%) and North Carolina and Texas appear twice each, while the remaining 47 states appear exactly once. The entropy ratio of 0.991 confirms an almost flat distribution, and the long_tail alert is technically triggered but is largely an artifact of the tiny dataset size rather than meaningful concentration.

Treatment: Use as a categorical grouping key; encode with target or ordinal encoding if modelling, or use as a join/filter dimension for geographic aggregation.

anthropic:default · confidence high

Out[16]:

saturn.columns["state"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
top_value	California
top_rate	0.05556
cardinality	50
entropy	5.593
entropy_ratio	0.9909
alert: long_tail	47 singleton categories

Fig 9.

Top values for state.

Show data table

Top values for state (20 unique shown, of 50 total).
value	count	share
California	3	5.6%
North Carolina	2	3.7%
Texas	2	3.7%
Alabama	1	1.9%
Alaska	1	1.9%
Arizona	1	1.9%
Arkansas	1	1.9%
Colorado	1	1.9%
Connecticut	1	1.9%
Delaware	1	1.9%
Florida	1	1.9%
Georgia	1	1.9%
Hawaii	1	1.9%
Idaho	1	1.9%
Illinois	1	1.9%
Indiana	1	1.9%
Iowa	1	1.9%
Kansas	1	1.9%
Kentucky	1	1.9%
Louisiana	1	1.9%

veteran_population numeric feature

This column represents veteran population counts, likely at the U.S. state or territory level given the 54 rows and the plausible range of 61,090 to 1,786,891. The distribution is remarkably symmetric (skew ≈ 0.009) and platykurtic (kurtosis ≈ −1.35), meaning values are broadly spread across the range with no sharp central peak and no outliers detected. The IQR of 988,439 relative to a mean of ~820,444 indicates substantial spread across geographies, consistent with large population differences between small and large states.

Treatment: Use as-is or normalize by total population to create a veteran share ratio before modelling.

anthropic:default · confidence high

Out[19]:

saturn.columns["veteran_population"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
min	61,090
max	1.787e+06
mean	8.204e+05
median	811,743
std	5.29e+05
q1	279,178
q3	1.268e+06
iqr	988,439
skew	0.009014
kurtosis	-1.347
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 10.

Distribution of veteran_population. Vertical dash marks the median.

Show data table

Histogram bins for veteran_population (median: 811743.0).
bin	count
6.109e+04 – 3.076e+05	15
3.076e+05 – 5.542e+05	4
5.542e+05 – 8.007e+05	6
8.007e+05 – 1.047e+06	7
1.047e+06 – 1.294e+06	9
1.294e+06 – 1.54e+06	8
1.54e+06 – 1.787e+06	4

total_population numeric feature

This column represents total population counts, almost certainly at a regional or state/province level given the value range (min 548,984 to max 39,227,468) and the small row count of 54 rows — consistent with US states or similar administrative units. The distribution is notably flat and near-uniform: kurtosis of -1.24 indicates lighter tails than normal, skew is near zero (0.093), and the IQR of 22,033,717 spans more than half the full range, confirming wide spread without outliers. There are 5 duplicate values among 54 rows (49 unique) and a 1.85% null rate (roughly 1 missing record) worth investigating.

Treatment: Use as-is or normalize per area/density for modelling; investigate the 1 null row and 5 duplicate values before joining or aggregating.

anthropic:default · confidence high

Out[22]:

saturn.columns["total_population"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
min	548,984
max	3.923e+07
mean	1.87e+07
median	1.958e+07
std	1.247e+07
q1	6.898e+06
q3	2.893e+07
iqr	2.203e+07
skew	0.09278
kurtosis	-1.243
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 11.

Distribution of total_population. Vertical dash marks the median.

Show data table

Histogram bins for total_population (median: 19582629.0).
bin	count
5.49e+05 – 6.074e+06	13
6.074e+06 – 1.16e+07	5
1.16e+07 – 1.713e+07	4
1.713e+07 – 2.265e+07	10
2.265e+07 – 2.818e+07	7
2.818e+07 – 3.37e+07	3
3.37e+07 – 3.923e+07	11

veteran_percentage numeric feature

This column represents veteran percentage, likely a share (%) of veterans within some geographic or demographic unit across 54 records. The distribution is severely right-skewed (skew=6.03, kurtosis=37.997) with a median of 4.85% but a mean of 14.09%, driven by 9 outliers (17% of rows) including a maximum of 277.05 — a value that cannot represent a valid percentage and almost certainly reflects a data quality issue such as a raw count, a decimal-point error, or a different unit entirely. The std of 38.90 dwarfs the IQR of 5.89, confirming extreme contamination from these outliers.

Treatment: Investigate and cap or correct values exceeding 100 (max=277.05 is impossible for a percentage), then consider log-transform or winsorization before modelling.

anthropic:default · confidence high

Out[25]:

saturn.columns["veteran_percentage"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
min	0.22
max	277.1
mean	14.09
median	4.85
std	38.9
q1	1.88
q3	7.77
iqr	5.89
skew	6.031
kurtosis	38
n_outliers	9
outlier_rate	0.1698
zero_rate	0
alert: high_skew	skew=+6.03
alert: outliers	17.0% rows beyond 1.5 IQR

Fig 12.

Distribution of veteran_percentage. Vertical dash marks the median.

Show data table

Histogram bins for veteran_percentage (median: 4.85).
bin	count
0.22 – 39.77	49
39.77 – 79.31	3
79.31 – 118.9	0
118.9 – 158.4	0
158.4 – 198	0
198 – 237.5	0
237.5 – 277.1	1

active_duty_personnel numeric feature

This column represents the count of active duty military personnel per record (likely per country or military branch), with 54 observations and no nulls. The distribution is heavily right-skewed (skew=1.59) with a median of 11,584 but a mean of 34,003, driven by a long upper tail stretching to 162,362. Eight records (14.8% of the dataset) are flagged as outliers, indicating a small number of entities with disproportionately large forces. The IQR of 38,504 vs. a std of 46,995 confirms the spread is dominated by extreme high-end values.

Treatment: Log-transform before regression or clustering to reduce skew impact from high-value outliers.

anthropic:default · confidence high

Out[28]:

saturn.columns["active_duty_personnel"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
min	1,166
max	162,362
mean	3.4e+04
median	11,584
std	4.7e+04
q1	3884
q3	4.239e+04
iqr	3.85e+04
skew	1.594
kurtosis	1.291
n_outliers	8
outlier_rate	0.1481
zero_rate	0
alert: outliers	14.8% rows beyond 1.5 IQR

Fig 13.

Distribution of active_duty_personnel. Vertical dash marks the median.

Show data table

Histogram bins for active_duty_personnel (median: 11584.0).
bin	count
1166 – 2.419e+04	37
2.419e+04 – 4.722e+04	4
4.722e+04 – 7.025e+04	3
7.025e+04 – 9.328e+04	2
9.328e+04 – 1.163e+05	2
1.163e+05 – 1.393e+05	3
1.393e+05 – 1.624e+05	3

ownership_percentage numeric feature

This column represents ownership percentage stakes, likely equity shareholdings in companies or assets, ranging from 14.7% to 66.3% across 54 records with no nulls. The distribution is moderately left-skewed (skew = -0.69) with values tightly clustered between Q1 40.05% and Q3 51.4%, suggesting most holdings hover around majority or near-majority control thresholds. Notably, 5 outliers (~9.3% of rows) pull the lower tail, and the max of 66.3% implies no full buyouts are present. The near-platykurtic shape (kurtosis ≈ 0) indicates an unusually flat, spread-out distribution rather than a peaked one.

Treatment: Use as-is or clip outliers at IQR boundaries before modelling; consider binning into control-threshold buckets (minority <50%, majority ≥50%).

anthropic:default · confidence high

Out[31]:

saturn.columns["ownership_percentage"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	48
min	14.7
max	66.3
mean	43.58
median	45.75
std	13.04
q1	40.05
q3	51.4
iqr	11.35
skew	-0.6887
kurtosis	-0.002081
n_outliers	5
outlier_rate	0.09259
zero_rate	0
alert: outliers	9.3% rows beyond 1.5 IQR

Fig 14.

Distribution of ownership_percentage. Vertical dash marks the median.

Show data table

Histogram bins for ownership_percentage (median: 45.75).
bin	count
14.7 – 22.07	5
22.07 – 29.44	5
29.44 – 36.81	3
36.81 – 44.19	7
44.19 – 51.56	20
51.56 – 58.93	10
58.93 – 66.3	4

ffl_count numeric feature

This column represents a count of Federal Firearms Licensees (FFL) — likely per geographic unit such as state or county — across 54 observations with no nulls. The distribution is right-skewed (skew = 1.66) with a wide IQR of 2496.5 and a standard deviation (2772.66) nearly equal to the mean (3073.11), signaling high dispersion. Five outliers (≈9.3% of rows) pull the tail toward the maximum of 10904, while the minimum is 220 and median only 2096.5, confirming a long upper tail. The kurtosis of 2.0 is moderate, suggesting the outliers are notable but not extreme relative to a normal distribution.

Treatment: Log-transform before regression or modelling to reduce right skew; investigate the 5 outlier rows for data integrity.

anthropic:default · confidence medium

Out[34]:

saturn.columns["ffl_count"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
min	220
max	10,904
mean	3073
median	2096
std	2773
q1	1300
q3	3,796
iqr	2496
skew	1.66
kurtosis	2.002
n_outliers	5
outlier_rate	0.09259
zero_rate	0
alert: outliers	9.3% rows beyond 1.5 IQR

Fig 15.

Distribution of ffl_count. Vertical dash marks the median.

Show data table

Histogram bins for ffl_count (median: 2096.5).
bin	count
220 – 1746	21
1746 – 3273	15
3273 – 4799	11
4799 – 6325	1
6325 – 7851	1
7851 – 9378	0
9378 – 1.09e+04	5

veteran_suicide_rate numeric numeric_target

This column contains veteran suicide rates, likely per 100,000 veterans, covering 54 observations (probably U.S. states plus a few territories or summary rows, given n=54 and 50 unique values). Values range from 24.9 to 52.3 with a mean of 35.6 and median of 34.65, indicating a relatively symmetric but mildly right-skewed distribution (skew=0.498). Noteworthy is the wide spread—an IQR of ~10.85 and max nearly double the min—highlighting substantial geographic disparity in veteran suicide rates, yet no statistical outliers were flagged.

Treatment: Use as-is or apply mild log-transform if residuals show heteroscedasticity; investigate the 4 duplicate values among 54 rows before modelling.

anthropic:default · confidence high

Out[37]:

saturn.columns["veteran_suicide_rate"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
min	24.9
max	52.3
mean	35.64
median	34.65
std	7.393
q1	29.8
q3	40.65
iqr	10.85
skew	0.4978
kurtosis	-0.6953
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 16.

Distribution of veteran_suicide_rate. Vertical dash marks the median.

Show data table

Histogram bins for veteran_suicide_rate (median: 34.65).
bin	count
24.9 – 28.81	12
28.81 – 32.73	10
32.73 – 36.64	10
36.64 – 40.56	8
40.56 – 44.47	6
44.47 – 48.39	4
48.39 – 52.3	4

civilian_suicide_rate numeric feature

This column represents a civilian suicide rate (likely per 100,000 population) across 54 records, with no nulls and no zeros. The distribution is notably well-behaved: near-zero skew (0.17), platykurtic shape (kurtosis −1.09), and no detected outliers, suggesting values spread broadly but uniformly between 7.7 and 28.9. The mean (17.22) and median (17.1) are nearly identical, and the IQR of 9.8 covers a substantial range, indicating genuine cross-unit variation rather than clustering.

Treatment: Use as-is in modelling; no transformation needed given near-symmetric distribution and absence of outliers.

anthropic:default · confidence high

Out[40]:

saturn.columns["civilian_suicide_rate"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
min	7.7
max	28.9
mean	17.22
median	17.1
std	6.026
q1	12.2
q3	22
iqr	9.8
skew	0.1737
kurtosis	-1.092
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 17.

Distribution of civilian_suicide_rate. Vertical dash marks the median.

Show data table

Histogram bins for civilian_suicide_rate (median: 17.1).
bin	count
7.7 – 10.73	10
10.73 – 13.76	9
13.76 – 16.79	7
16.79 – 19.81	9
19.81 – 22.84	7
22.84 – 25.87	7
25.87 – 28.9	5

veteran_risk_ratio numeric feature

This column represents a risk ratio specifically for a veteran population, with all 54 rows populated and no outliers detected. Values range from 1.8 to 3.23, with a mean of ~2.20 and median of 2.025, indicating veterans in this dataset are consistently at elevated risk (all values above 1.0 by a wide margin). The distribution is moderately right-skewed (skew ≈ 0.95) with a relatively tight IQR of 0.59, suggesting most observations cluster in the 1.85–2.44 range but a tail of higher-risk cases pulls the mean upward. The near-platykurtic shape (kurtosis ≈ -0.31) and 41 unique values out of 54 rows suggest this is a continuous derived metric rather than a categorised score.

Treatment: Use as-is or apply mild log-transform to reduce right skew before regression or classification modelling.

anthropic:default · confidence high

Out[43]:

saturn.columns["veteran_risk_ratio"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	41
min	1.8
max	3.23
mean	2.197
median	2.025
std	0.4101
q1	1.85
q3	2.44
iqr	0.59
skew	0.9459
kurtosis	-0.3143
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 18.

Distribution of veteran_risk_ratio. Vertical dash marks the median.

Show data table

Histogram bins for veteran_risk_ratio (median: 2.025).
bin	count
1.8 – 2.004	25
2.004 – 2.209	8
2.209 – 2.413	7
2.413 – 2.617	4
2.617 – 2.821	3
2.821 – 3.026	4
3.026 – 3.23	3

ptsd_prevalence_pct numeric feature

This column captures PTSD prevalence as a percentage, likely drawn from epidemiological or clinical survey data across 54 records. The most striking issue is a 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., tied to a subgroup or data source). Among the 29 non-null values, the distribution is fairly compact (min 6.3, max 12.3, IQR 3.0) with a near-flat kurtosis of −1.05, suggesting a roughly uniform spread rather than a peaked central cluster. Only 25 unique values across 29 non-null rows implies some repeated percentage figures, possibly due to rounding or grouped reporting.

Treatment: Investigate missingness mechanism before imputing; if MAR, impute with group-level median; if MNAR, model missingness as a separate binary indicator.

anthropic:default · confidence medium

Out[46]:

saturn.columns["ptsd_prevalence_pct"].stats

stat	value
n	54
nulls	25 (46.3%)
unique	25
min	6.3
max	12.3
mean	8.621
median	8.3
std	1.848
q1	7.1
q3	10.1
iqr	3
skew	0.4299
kurtosis	-1.049
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	46.3% null

Fig 19.

Distribution of ptsd_prevalence_pct. Vertical dash marks the median.

Show data table

Histogram bins for ptsd_prevalence_pct (median: 8.3).
bin	count
6.3 – 7.5	10
7.5 – 8.7	6
8.7 – 9.9	5
9.9 – 11.1	4
11.1 – 12.3	4

va_users_with_ptsd_pct numeric feature

This column represents the percentage of VA users diagnosed with PTSD, likely aggregated at a state or facility level given n=54 (matching U.S. states/territories). The most surprising signal is the 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., certain facility types or regions not reporting). Among observed values, the distribution is fairly uniform (kurtosis –1.13, near-zero skew of 0.32) ranging from 7.9% to 18.5% with no outliers, suggesting genuine geographic variation rather than data error.

Treatment: Investigate missingness pattern before use; impute or subset to complete cases, then use as-is (no transform needed given near-normal distribution).

anthropic:default · confidence medium

Out[49]:

saturn.columns["va_users_with_ptsd_pct"].stats

stat	value
n	54
nulls	25 (46.3%)
unique	25
min	7.9
max	18.5
mean	12.26
median	11.9
std	3.315
q1	9.5
q3	14.8
iqr	5.3
skew	0.3168
kurtosis	-1.132
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	46.3% null

Fig 20.

Distribution of va_users_with_ptsd_pct. Vertical dash marks the median.

Show data table

Histogram bins for va_users_with_ptsd_pct (median: 11.9).
bin	count
7.9 – 10.02	10
10.02 – 12.14	5
12.14 – 14.26	5
14.26 – 16.38	5
16.38 – 18.5	4

spouse_unemployment_rate numeric feature

This column records the unemployment rate of a respondent's spouse, expressed as a percentage. It is missing for 64.81% of records — almost certainly because many respondents have no spouse — making the high null rate structurally expected rather than a data quality failure. Among the 19 non-null observations, values cluster tightly between 7.35 and 16.28 with a mean of ~10.46 and median of ~10.34, suggesting a near-symmetric distribution (skew 0.77, kurtosis 0.08); one outlier at 16.28 sits just beyond the upper fence. Only 15 unique values across 19 observations hints the rate may be recorded at a coarse or categorical granularity.

Treatment: Model nulls as a separate binary indicator (has_spouse); impute or subset non-null rows for any spouse-specific analysis; investigate the single outlier at 16.28 before inclusion.

anthropic:default · confidence medium

Out[52]:

saturn.columns["spouse_unemployment_rate"].stats

stat	value
n	54
nulls	35 (64.8%)
unique	15
min	7.35
max	16.28
mean	10.46
median	10.34
std	2.382
q1	8.59
q3	11.45
iqr	2.86
skew	0.7742
kurtosis	0.07957
n_outliers	1
outlier_rate	0.05263
zero_rate	0
alert: null_rate	64.8% null
alert: outliers	5.3% rows beyond 1.5 IQR

Fig 21.

Distribution of spouse_unemployment_rate. Vertical dash marks the median.

Show data table

Histogram bins for spouse_unemployment_rate (median: 10.34).
bin	count
7.35 – 9.136	6
9.136 – 10.92	7
10.92 – 12.71	2
12.71 – 14.49	3
14.49 – 16.28	1

spouse_labor_force_participation numeric feature

This column represents the labor force participation rate (likely as a percentage) of spouses in a surveyed population. The most striking feature is a null rate of 64.81% — nearly two-thirds of the 54 rows are missing, triggering an alert and severely limiting usability. Among the 19 non-null observations, values are tightly clustered between 66.8 and 73.4 with a mean of ~69.7 and IQR of only 2.2, suggesting very low variance and minimal outlier risk within the observed subset.

Treatment: Investigate missingness mechanism before use; if MAR/MCAR, consider imputation with caution given only 19 valid observations across 15 unique values.

anthropic:default · confidence medium

Out[55]:

saturn.columns["spouse_labor_force_participation"].stats

stat	value
n	54
nulls	35 (64.8%)
unique	15
min	66.8
max	73.4
mean	69.69
median	69.8
std	1.68
q1	68.35
q3	70.55
iqr	2.2
skew	0.3659
kurtosis	-0.3936
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	64.8% null

Fig 22.

Distribution of spouse_labor_force_participation. Vertical dash marks the median.

Show data table

Histogram bins for spouse_labor_force_participation (median: 69.8).
bin	count
66.8 – 68.12	4
68.12 – 69.44	4
69.44 – 70.76	6
70.76 – 72.08	3
72.08 – 73.4	2

va_utilization_pct numeric feature

This column represents a utilization percentage for VA (likely Veterans Affairs) resources or capacity, expressed as a numeric rate across 54 records. The distribution is notably uniform and platykurtic (kurtosis ≈ −1.05), meaning values are spread broadly and flatly across the range 13.8–42.3 with no outliers and near-zero skew (0.165). The mean (27.04) and median (26.2) are tightly aligned, and the IQR of 12.48 spans a moderate but consistent band, suggesting this metric reflects genuine variation across units or time periods rather than a concentrated signal.

Treatment: Use as-is in modelling; no transformation needed given near-symmetric, outlier-free distribution.

anthropic:default · confidence high

Out[58]:

saturn.columns["va_utilization_pct"].stats

stat	value
n	54
nulls	0 (0.0%)
unique	50
min	13.8
max	42.3
mean	27.04
median	26.2
std	7.952
q1	21
q3	33.48
iqr	12.48
skew	0.1651
kurtosis	-1.051
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 23.

Distribution of va_utilization_pct. Vertical dash marks the median.

Show data table

Histogram bins for va_utilization_pct (median: 26.2).
bin	count
13.8 – 17.87	8
17.87 – 21.94	9
21.94 – 26.01	10
26.01 – 30.09	7
30.09 – 34.16	8
34.16 – 38.23	7
38.23 – 42.3	5

installation categorical label

This column records U.S. military installation names (bases and airfields) associated with records in the dataset. The most striking issue is that 77.78% of the 54 rows are null, leaving only 12 non-null values — each appearing exactly once, yielding a perfectly uniform distribution (entropy_ratio = 1.0) with no dominant installation. The long-tail alert is somewhat misleading given the uniformity; the real concern is the extreme missingness, which severely limits analytical utility.

Treatment: Investigate source of 77.78% nulls before use; if imputation is not feasible, treat as a sparse categorical feature or exclude from models dependent on this column.

anthropic:default · confidence high

Out[61]:

saturn.columns["installation"].stats

stat	value
n	54
nulls	42 (77.8%)
unique	12
top_value	Naval Base San Diego, CA
top_rate	0.08333
cardinality	12
entropy	3.585
entropy_ratio	1
alert: long_tail	12 singleton categories
alert: null_rate	77.8% null

Fig 24.

Top values for installation.

Show data table

Top values for installation (12 unique shown, of 12 total).
value	count	share
Naval Base San Diego, CA	1	1.9%
Travis AFB, CA	1	1.9%
Camp Pendleton, CA	1	1.9%
Eglin AFB, FL	1	1.9%
Fort Stewart, GA	1	1.9%
Fort Campbell, KY	1	1.9%
Fort Liberty, NC	1	1.9%
Fort Bragg (Liberty), NC	1	1.9%
Joint Base San Antonio, TX	1	1.9%
Fort Cavazos, TX	1	1.9%
Naval Station Norfolk, VA	1	1.9%
Naval Base Kitsap, WA	1	1.9%

county categorical metadata

This column represents a US county name associated with each record, likely geographic metadata for a location or address field. The most striking issue is a 77.78% null rate — only 12 of 54 rows have a value at all, rendering the column nearly empty. Among the 12 non-null values, cardinality is 10 with a near-uniform distribution (entropy ratio 0.979), meaning almost every populated entry is a distinct county, with only San Diego and Cumberland appearing twice.

Treatment: Investigate source of missing values before use; with 77.78% nulls the column is unreliable as a feature without significant imputation or enrichment.

anthropic:default · confidence high

Out[64]:

saturn.columns["county"].stats

stat	value
n	54
nulls	42 (77.8%)
unique	10
top_value	San Diego
top_rate	0.1667
cardinality	10
entropy	3.252
entropy_ratio	0.9788
alert: long_tail	8 singleton categories
alert: null_rate	77.8% null

Fig 25.

Top values for county.

Show data table

Top values for county (10 unique shown, of 10 total).
value	count	share
San Diego	2	3.7%
Cumberland	2	3.7%
Solano	1	1.9%
Okaloosa	1	1.9%
Liberty	1	1.9%
Christian	1	1.9%
Bexar	1	1.9%
Bell	1	1.9%
Norfolk	1	1.9%
Kitsap	1	1.9%

annual_economic_impact_millions numeric feature

This column records estimated annual economic impact in millions of currency units, but 77.78% of the 54 rows are null — meaning only 12 rows carry a value, and those 12 values collapse to just 12 unique entries (effectively no repeats among non-null rows). The non-null values span 15,000 to 41,000 with a mean of ~23,833 and an IQR of 10,250, indicating a wide but plausibly real spread across large-scale economic entities; skew is moderate (0.87) and no outliers are flagged. The extreme null rate is the dominant concern and strongly suggests this field is sparsely populated by design or data collection failure, not random missingness.

Treatment: Investigate source of 77.78% nulls before use; if imputation is not justified, consider as a sparse auxiliary feature or drop depending on model tolerance for missingness.

anthropic:default · confidence medium

Out[67]:

saturn.columns["annual_economic_impact_millions"].stats

stat	value
n	54
nulls	42 (77.8%)
unique	12
min	15,000
max	41,000
mean	2.383e+04
median	21,500
std	8178
q1	17,750
q3	28,000
iqr	10,250
skew	0.8718
kurtosis	-0.3627
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	77.8% null

Fig 26.

Distribution of annual_economic_impact_millions. Vertical dash marks the median.

Show data table

Histogram bins for annual_economic_impact_millions (median: 21500.0).
bin	count
1.5e+04 – 2.02e+04	5
2.02e+04 – 2.54e+04	3
2.54e+04 – 3.06e+04	1
3.06e+04 – 3.58e+04	2
3.58e+04 – 4.1e+04	1

total_personnel numeric feature

This column represents total personnel counts, likely headcount figures for organizations, units, or facilities. The most striking issue is a null rate of 77.78% — only 12 of 54 rows have a value — making it severely under-populated and potentially unreliable for modelling. Among the 12 non-null values, only 12 unique values exist (suggesting no repeated counts), ranging from 27,000 to 82,000 with a mean of ~46,083 and mild positive skew (0.855). The distribution is platykurtic (kurtosis ≈ 0.006) and has no outliers, so the non-null values themselves are internally well-behaved.

Treatment: Investigate source of 77.78% missingness before use; impute or drop depending on missingness mechanism, then consider log-transform given positive skew.

anthropic:default · confidence medium

Out[70]:

saturn.columns["total_personnel"].stats

stat	value
n	54
nulls	42 (77.8%)
unique	12
min	27,000
max	82,000
mean	4.608e+04
median	43,500
std	1.621e+04
q1	34,250
q3	53,500
iqr	19,250
skew	0.8549
kurtosis	0.005583
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	77.8% null

Fig 27.

Distribution of total_personnel. Vertical dash marks the median.

Show data table

Histogram bins for total_personnel (median: 43500.0).
bin	count
2.7e+04 – 3.8e+04	4
3.8e+04 – 4.9e+04	4
4.9e+04 – 6e+04	2
6e+04 – 7.1e+04	1
7.1e+04 – 8.2e+04	1

direct_jobs numeric feature

This column records the count of direct jobs associated with each record — likely a project, investment, or contract — with values ranging from 8,500 to 25,000 and a mean of ~14,292. The most striking signal is an extremely high null rate of 77.78%, meaning only 12 of 54 rows carry a value, which severely limits its usability. Among the non-null values, only 12 distinct figures exist across those 12 populated rows (effectively all unique), suggesting manual entry or discrete reporting tiers rather than continuous measurement. Distribution is mildly right-skewed (skew 0.81) with no outliers detected.

Treatment: Impute or flag missing values (77.78% null rate) before use; consider whether missingness is systematic before including in any model.

anthropic:default · confidence medium

Out[73]:

saturn.columns["direct_jobs"].stats

stat	value
n	54
nulls	42 (77.8%)
unique	12
min	8,500
max	25,000
mean	1.429e+04
median	13,500
std	4883
q1	10,750
q3	16,500
iqr	5,750
skew	0.8149
kurtosis	-0.06741
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	77.8% null

Fig 28.

Distribution of direct_jobs. Vertical dash marks the median.

Show data table

Histogram bins for direct_jobs (median: 13500.0).
bin	count
8500 – 1.18e+04	4
1.18e+04 – 1.51e+04	4
1.51e+04 – 1.84e+04	2
1.84e+04 – 2.17e+04	1
2.17e+04 – 2.5e+04	1

ffl_per_100k numeric feature

This column represents the number of Federal Firearms Licensees (FFLs) per 100,000 residents, almost certainly measured at the U.S. state (or territory) level given n=54. The distribution is severely right-skewed (skew=2.47, kurtosis=5.08): the median is only 12.37 but the mean is pulled to 49.44 by extreme outliers, with a maximum of 341.66 — over 27× the median. Eight observations (15.1%) are flagged as outliers, likely small-population states or territories where per-capita rates explode due to a low denominator.

Treatment: Log-transform before modelling to reduce skew; investigate the 8 outliers for small-denominator population artifacts before inclusion.

anthropic:default · confidence high

Out[76]:

saturn.columns["ffl_per_100k"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
min	1.377
max	341.7
mean	49.44
median	12.37
std	87.37
q1	7.034
q3	31.09
iqr	24.05
skew	2.474
kurtosis	5.076
n_outliers	8
outlier_rate	0.1509
zero_rate	0
alert: high_skew	skew=+2.47
alert: outliers	15.1% rows beyond 1.5 IQR

Fig 29.

Distribution of ffl_per_100k. Vertical dash marks the median.

Show data table

Histogram bins for ffl_per_100k (median: 12.368927310188392).
bin	count
1.377 – 49.99	41
49.99 – 98.6	4
98.6 – 147.2	2
147.2 – 195.8	1
195.8 – 244.4	2
244.4 – 293	0
293 – 341.7	3

active_duty_per_100k numeric feature

This column represents active-duty military personnel per 100,000 residents, almost certainly measured at the U.S. state/territory level (n=54 aligns with states plus D.C. and territories). The distribution is severely right-skewed (skew=2.91, kurtosis=7.22): the median is only 92.5 yet the mean is 610.7 and the max reaches 5,544, driven by 9 outliers (17% of rows) — almost certainly states with large military installations such as Hawaii, Alaska, Virginia, or small-population territories. The IQR of 301 versus a std of 1,389 further confirms extreme concentration at low values with a long heavy tail.

Treatment: Log-transform (log1p) before regression or clustering to reduce skew; consider flagging or separately modelling the 9 outlier rows.

anthropic:default · confidence high

Out[79]:

saturn.columns["active_duty_per_100k"].stats

stat	value
n	54
nulls	1 (1.9%)
unique	49
min	3.105
max	5544
mean	610.7
median	92.49
std	1389
q1	21.96
q3	323.3
iqr	301.3
skew	2.911
kurtosis	7.217
n_outliers	9
outlier_rate	0.1698
zero_rate	0
alert: high_skew	skew=+2.91
alert: outliers	17.0% rows beyond 1.5 IQR

Fig 30.

Distribution of active_duty_per_100k. Vertical dash marks the median.

Show data table

Histogram bins for active_duty_per_100k (median: 92.48724939519369).
bin	count
3.105 – 794.7	44
794.7 – 1586	3
1586 – 2378	2
2378 – 3170	0
3170 – 3961	0
3961 – 4753	1
4753 – 5544	3

data trove us military veteran analysis

Overview

Summary confidence: medium

NAME categorical label

state categorical label

veteran_population numeric feature

total_population numeric feature

veteran_percentage numeric feature

active_duty_personnel numeric feature

ownership_percentage numeric feature

ffl_count numeric feature

veteran_suicide_rate numeric numeric_target

civilian_suicide_rate numeric feature

veteran_risk_ratio numeric feature

ptsd_prevalence_pct numeric feature

va_users_with_ptsd_pct numeric feature

spouse_unemployment_rate numeric feature

spouse_labor_force_participation numeric feature

va_utilization_pct numeric feature

installation categorical label

county categorical metadata

annual_economic_impact_millions numeric feature

total_personnel numeric feature

direct_jobs numeric feature

ffl_per_100k numeric feature

active_duty_per_100k numeric feature

How to cite