saturn·

data trove us military veteran analysis

saturn notebook · generated 2026-06-22 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv

Saturn profiled 54 rows across 23 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv",
    "--findings", "data-trove-us-military-veteran-analysis.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This is a 54-row, state-level dataset merging U.S. military and veteran demographics with firearm licensing, suicide rates, and installation-level economic data. The most striking signal is the veteran suicide rate (mean 35.6, range 24.9–52.3), which is roughly double the civilian suicide rate (mean 17.2, range 7.7–28.9), and the veteran_risk_ratio column directly quantifies this gap (mean 2.2x) across states. A second area worth scrutiny is the extreme right-skew in active_duty_per_100k (median 92, max 5,544) and ffl_per_100k (median 12, max 342), suggesting a handful of states—likely those hosting large installations—are pulling these distributions hard; about 22% of rows also carry heavy null rates on installation-level columns (county, installation, economic impact), meaning the installation-linked data covers only ~12 records. Analysts should examine how firearm density and military concentration interact with veteran mental health outcomes across states.

citing: veteran_suicide_rate.stats.mean · veteran_suicide_rate.stats.max · civilian_suicide_rate.stats.mean · veteran_risk_ratio.stats.mean · active_duty_per_100k.stats.median · active_duty_per_100k.stats.max · active_duty_per_100k.alerts · ffl_per_100k.stats.median · ffl_per_100k.stats.max · county.null_rate · installation.null_rate · ptsd_prevalence_pct.null_rate · row_count

Out[4]:

saturn.schema() · 23 columns

column kind n null% unique alerts
NAME categorical 54 1.9% 49 long_tail
state categorical 54 0.0% 50 long_tail
veteran_population numeric 54 1.9% 49
total_population numeric 54 1.9% 49
veteran_percentage numeric 54 1.9% 49 high_skew outliers
active_duty_personnel numeric 54 0.0% 50 outliers
ownership_percentage numeric 54 0.0% 48 outliers
ffl_count numeric 54 0.0% 50 outliers
veteran_suicide_rate numeric 54 0.0% 50
civilian_suicide_rate numeric 54 0.0% 50
veteran_risk_ratio numeric 54 0.0% 41
ptsd_prevalence_pct numeric 54 46.3% 25 null_rate
va_users_with_ptsd_pct numeric 54 46.3% 25 null_rate
spouse_unemployment_rate numeric 54 64.8% 15 null_rate outliers
spouse_labor_force_participation numeric 54 64.8% 15 null_rate
va_utilization_pct numeric 54 0.0% 50
installation categorical 54 77.8% 12 long_tail null_rate
county categorical 54 77.8% 10 long_tail null_rate
annual_economic_impact_millions numeric 54 77.8% 12 null_rate
total_personnel numeric 54 77.8% 12 null_rate
direct_jobs numeric 54 77.8% 12 null_rate
ffl_per_100k numeric 54 1.9% 49 high_skew outliers
active_duty_per_100k numeric 54 1.9% 49 high_skew outliers
Fig 1.
veteran_suicide_rate · Look for the spread and upper tail — rates above 40 in some states reveal where veteran suicide risk is most acute.
Show data table
Histogram bins for veteran_suicide_rate (median: 34.65).
bincount
24.9 – 28.8112
28.81 – 32.7310
32.73 – 36.6410
36.64 – 40.568
40.56 – 44.476
44.47 – 48.394
48.39 – 52.34
Fig 2.
civilian_suicide_rate · Compare this distribution against the veteran rate to visually confirm the roughly 2x risk gap across states.
Show data table
Histogram bins for civilian_suicide_rate (median: 17.1).
bincount
7.7 – 10.7310
10.73 – 13.769
13.76 – 16.797
16.79 – 19.819
19.81 – 22.847
22.84 – 25.877
25.87 – 28.95
Fig 3.
ffl_per_100k · A heavily right-skewed distribution with a max of 342 — identify the outlier states driving firearm density far above the median of 12.
Show data table
Histogram bins for ffl_per_100k (median: 12.368927310188392).
bincount
1.377 – 49.9941
49.99 – 98.64
98.6 – 147.22
147.2 – 195.81
195.8 – 244.42
244.4 – 2930
293 – 341.73
Fig 4.
va_utilization_pct · VA utilization ranges from 14% to 42%, worth comparing against suicide rates to see if access to care correlates with outcomes.
Show data table
Histogram bins for va_utilization_pct (median: 26.2).
bincount
13.8 – 17.878
17.87 – 21.949
21.94 – 26.0110
26.01 – 30.097
30.09 – 34.168
34.16 – 38.237
38.23 – 42.35
Fig 5.
state · California, Texas, and North Carolina appear multiple times, reflecting multiple installations — check whether multi-row states skew aggregate metrics.
Show data table
Top values for state (20 unique shown, of 50 total).
valuecountshare
California35.6%
North Carolina23.7%
Texas23.7%
Alabama11.9%
Alaska11.9%
Arizona11.9%
Arkansas11.9%
Colorado11.9%
Connecticut11.9%
Delaware11.9%
Florida11.9%
Georgia11.9%
Hawaii11.9%
Idaho11.9%
Illinois11.9%
Indiana11.9%
Iowa11.9%
Kansas11.9%
Kentucky11.9%
Louisiana11.9%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
NAMEcategorical1.9%
statecategorical0.0%
veteran_populationnumeric1.9%
total_populationnumeric1.9%
veteran_percentagenumeric1.9%
active_duty_personnelnumeric0.0%
ownership_percentagenumeric0.0%
ffl_countnumeric0.0%
veteran_suicide_ratenumeric0.0%
civilian_suicide_ratenumeric0.0%
veteran_risk_rationumeric0.0%
ptsd_prevalence_pctnumeric46.3%
va_users_with_ptsd_pctnumeric46.3%
spouse_unemployment_ratenumeric64.8%
spouse_labor_force_participationnumeric64.8%
va_utilization_pctnumeric0.0%
installationcategorical77.8%
countycategorical77.8%
annual_economic_impact_millionsnumeric77.8%
total_personnelnumeric77.8%
direct_jobsnumeric77.8%
ffl_per_100knumeric1.9%
active_duty_per_100knumeric1.9%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
veteran_populationtotal_populationveteran_percentageactive_duty_personnelownership_percentageffl_countveteran_suicide_ratecivilian_suicide_rateveteran_risk_ratioptsd_prevalence_pctva_users_with_ptsd_pctspouse_unemployment_rate
veteran_population+1.00+0.37+0.36-0.79+0.28-0.73+0.30+0.34-0.48-0.26-0.23-0.35
total_population+0.37+1.00-0.40-0.62+0.80-0.63+0.74+0.74-0.75+0.51+0.52+0.38
veteran_percentage+0.36-0.40+1.00-0.31-0.17-0.36-0.21-0.21+0.15-0.20-0.21-0.35
active_duty_personnel-0.79-0.62-0.31+1.00-0.53+0.96-0.60-0.61+0.70-0.28-0.29-0.03
ownership_percentage+0.28+0.80-0.17-0.53+1.00-0.60+0.87+0.87-0.89+0.66+0.67+0.53
ffl_count-0.73-0.63-0.36+0.96-0.60+1.00-0.60-0.61+0.68-0.35-0.37-0.16
veteran_suicide_rate+0.30+0.74-0.21-0.60+0.87-0.60+1.00+1.00-0.95+0.56+0.58+0.67
civilian_suicide_rate+0.34+0.74-0.21-0.61+0.87-0.61+1.00+1.00-0.97+0.54+0.56+0.63
veteran_risk_ratio-0.48-0.75+0.15+0.70-0.89+0.68-0.95-0.97+1.00-0.54-0.57-0.48
ptsd_prevalence_pct-0.26+0.51-0.20-0.28+0.66-0.35+0.56+0.54-0.54+1.00+1.00+0.60
va_users_with_ptsd_pct-0.23+0.52-0.21-0.29+0.67-0.37+0.58+0.56-0.57+1.00+1.00+0.60
spouse_unemployment_rate-0.35+0.38-0.35-0.03+0.53-0.16+0.67+0.63-0.48+0.60+0.60+1.00

NAME categorical label

This column contains U.S. state names, functioning as a geographic label or identifier in a small dataset of 54 rows. With 49 unique values and an entropy ratio of 0.991, cardinality is near-maximal — almost every row has a distinct state name. The top value 'California' appears only 3 times (5.66%), and the long-tail alert confirms that most values appear just once, suggesting this may be a nearly one-per-state lookup table with a handful of duplicates.

Treatment: Use as a geographic join key or group-by label; deduplicate or aggregate the 5 repeated state entries (California ×3, North Carolina ×2, Texas ×2) before any state-level analysis.

anthropic:default · confidence high
Out[13]:

saturn.columns["NAME"].stats

statvalue
n54
nulls1 (1.9%)
unique49
top_value California
top_rate 0.0566
cardinality 49
entropy 5.563
entropy_ratio 0.9907
alert: long_tail46 singleton categories
Fig 8.
Top values for NAME.
Show data table
Top values for NAME (20 unique shown, of 49 total).
valuecountshare
California35.6%
North Carolina23.7%
Texas23.7%
Alabama11.9%
Alaska11.9%
Arizona11.9%
Arkansas11.9%
Colorado11.9%
Connecticut11.9%
Delaware11.9%
Florida11.9%
Georgia11.9%
Hawaii11.9%
Idaho11.9%
Illinois11.9%
Indiana11.9%
Iowa11.9%
Kansas11.9%
Kentucky11.9%
Louisiana11.9%

state categorical label

This column contains U.S. state names, with 50 unique values across only 54 rows — nearly one entry per state, suggesting near-complete national coverage. California appears 3 times (top_rate 5.6%) and North Carolina and Texas appear twice each, while the remaining 47 states appear exactly once. The entropy ratio of 0.991 confirms an almost flat distribution, and the long_tail alert is technically triggered but is largely an artifact of the tiny dataset size rather than meaningful concentration.

Treatment: Use as a categorical grouping key; encode with target or ordinal encoding if modelling, or use as a join/filter dimension for geographic aggregation.

anthropic:default · confidence high
Out[16]:

saturn.columns["state"].stats

statvalue
n54
nulls0 (0.0%)
unique50
top_value California
top_rate 0.05556
cardinality 50
entropy 5.593
entropy_ratio 0.9909
alert: long_tail47 singleton categories
Fig 9.
Top values for state.
Show data table
Top values for state (20 unique shown, of 50 total).
valuecountshare
California35.6%
North Carolina23.7%
Texas23.7%
Alabama11.9%
Alaska11.9%
Arizona11.9%
Arkansas11.9%
Colorado11.9%
Connecticut11.9%
Delaware11.9%
Florida11.9%
Georgia11.9%
Hawaii11.9%
Idaho11.9%
Illinois11.9%
Indiana11.9%
Iowa11.9%
Kansas11.9%
Kentucky11.9%
Louisiana11.9%

veteran_population numeric feature

This column represents veteran population counts, likely at the U.S. state or territory level given the 54 rows and the plausible range of 61,090 to 1,786,891. The distribution is remarkably symmetric (skew ≈ 0.009) and platykurtic (kurtosis ≈ −1.35), meaning values are broadly spread across the range with no sharp central peak and no outliers detected. The IQR of 988,439 relative to a mean of ~820,444 indicates substantial spread across geographies, consistent with large population differences between small and large states.

Treatment: Use as-is or normalize by total population to create a veteran share ratio before modelling.

anthropic:default · confidence high
Out[19]:

saturn.columns["veteran_population"].stats

statvalue
n54
nulls1 (1.9%)
unique49
min 61,090
max 1.787e+06
mean 8.204e+05
median 811,743
std 5.29e+05
q1 279,178
q3 1.268e+06
iqr 988,439
skew 0.009014
kurtosis -1.347
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of veteran_population. Vertical dash marks the median.
Show data table
Histogram bins for veteran_population (median: 811743.0).
bincount
6.109e+04 – 3.076e+0515
3.076e+05 – 5.542e+054
5.542e+05 – 8.007e+056
8.007e+05 – 1.047e+067
1.047e+06 – 1.294e+069
1.294e+06 – 1.54e+068
1.54e+06 – 1.787e+064

total_population numeric feature

This column represents total population counts, almost certainly at a regional or state/province level given the value range (min 548,984 to max 39,227,468) and the small row count of 54 rows — consistent with US states or similar administrative units. The distribution is notably flat and near-uniform: kurtosis of -1.24 indicates lighter tails than normal, skew is near zero (0.093), and the IQR of 22,033,717 spans more than half the full range, confirming wide spread without outliers. There are 5 duplicate values among 54 rows (49 unique) and a 1.85% null rate (roughly 1 missing record) worth investigating.

Treatment: Use as-is or normalize per area/density for modelling; investigate the 1 null row and 5 duplicate values before joining or aggregating.

anthropic:default · confidence high
Out[22]:

saturn.columns["total_population"].stats

statvalue
n54
nulls1 (1.9%)
unique49
min 548,984
max 3.923e+07
mean 1.87e+07
median 1.958e+07
std 1.247e+07
q1 6.898e+06
q3 2.893e+07
iqr 2.203e+07
skew 0.09278
kurtosis -1.243
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 11.
Distribution of total_population. Vertical dash marks the median.
Show data table
Histogram bins for total_population (median: 19582629.0).
bincount
5.49e+05 – 6.074e+0613
6.074e+06 – 1.16e+075
1.16e+07 – 1.713e+074
1.713e+07 – 2.265e+0710
2.265e+07 – 2.818e+077
2.818e+07 – 3.37e+073
3.37e+07 – 3.923e+0711

veteran_percentage numeric feature

This column represents veteran percentage, likely a share (%) of veterans within some geographic or demographic unit across 54 records. The distribution is severely right-skewed (skew=6.03, kurtosis=37.997) with a median of 4.85% but a mean of 14.09%, driven by 9 outliers (17% of rows) including a maximum of 277.05 — a value that cannot represent a valid percentage and almost certainly reflects a data quality issue such as a raw count, a decimal-point error, or a different unit entirely. The std of 38.90 dwarfs the IQR of 5.89, confirming extreme contamination from these outliers.

Treatment: Investigate and cap or correct values exceeding 100 (max=277.05 is impossible for a percentage), then consider log-transform or winsorization before modelling.

anthropic:default · confidence high
Out[25]:

saturn.columns["veteran_percentage"].stats

statvalue
n54
nulls1 (1.9%)
unique49
min 0.22
max 277.1
mean 14.09
median 4.85
std 38.9
q1 1.88
q3 7.77
iqr 5.89
skew 6.031
kurtosis 38
n_outliers 9
outlier_rate 0.1698
zero_rate 0
alert: high_skewskew=+6.03
alert: outliers17.0% rows beyond 1.5 IQR
Fig 12.
Distribution of veteran_percentage. Vertical dash marks the median.
Show data table
Histogram bins for veteran_percentage (median: 4.85).
bincount
0.22 – 39.7749
39.77 – 79.313
79.31 – 118.90
118.9 – 158.40
158.4 – 1980
198 – 237.50
237.5 – 277.11

active_duty_personnel numeric feature

This column represents the count of active duty military personnel per record (likely per country or military branch), with 54 observations and no nulls. The distribution is heavily right-skewed (skew=1.59) with a median of 11,584 but a mean of 34,003, driven by a long upper tail stretching to 162,362. Eight records (14.8% of the dataset) are flagged as outliers, indicating a small number of entities with disproportionately large forces. The IQR of 38,504 vs. a std of 46,995 confirms the spread is dominated by extreme high-end values.

Treatment: Log-transform before regression or clustering to reduce skew impact from high-value outliers.

anthropic:default · confidence high
Out[28]:

saturn.columns["active_duty_personnel"].stats

statvalue
n54
nulls0 (0.0%)
unique50
min 1,166
max 162,362
mean 3.4e+04
median 11,584
std 4.7e+04
q1 3884
q3 4.239e+04
iqr 3.85e+04
skew 1.594
kurtosis 1.291
n_outliers 8
outlier_rate 0.1481
zero_rate 0
alert: outliers14.8% rows beyond 1.5 IQR
Fig 13.
Distribution of active_duty_personnel. Vertical dash marks the median.
Show data table
Histogram bins for active_duty_personnel (median: 11584.0).
bincount
1166 – 2.419e+0437
2.419e+04 – 4.722e+044
4.722e+04 – 7.025e+043
7.025e+04 – 9.328e+042
9.328e+04 – 1.163e+052
1.163e+05 – 1.393e+053
1.393e+05 – 1.624e+053

ownership_percentage numeric feature

This column represents ownership percentage stakes, likely equity shareholdings in companies or assets, ranging from 14.7% to 66.3% across 54 records with no nulls. The distribution is moderately left-skewed (skew = -0.69) with values tightly clustered between Q1 40.05% and Q3 51.4%, suggesting most holdings hover around majority or near-majority control thresholds. Notably, 5 outliers (~9.3% of rows) pull the lower tail, and the max of 66.3% implies no full buyouts are present. The near-platykurtic shape (kurtosis ≈ 0) indicates an unusually flat, spread-out distribution rather than a peaked one.

Treatment: Use as-is or clip outliers at IQR boundaries before modelling; consider binning into control-threshold buckets (minority <50%, majority ≥50%).

anthropic:default · confidence high
Out[31]:

saturn.columns["ownership_percentage"].stats

statvalue
n54
nulls0 (0.0%)
unique48
min 14.7
max 66.3
mean 43.58
median 45.75
std 13.04
q1 40.05
q3 51.4
iqr 11.35
skew -0.6887
kurtosis -0.002081
n_outliers 5
outlier_rate 0.09259
zero_rate 0
alert: outliers9.3% rows beyond 1.5 IQR
Fig 14.
Distribution of ownership_percentage. Vertical dash marks the median.
Show data table
Histogram bins for ownership_percentage (median: 45.75).
bincount
14.7 – 22.075
22.07 – 29.445
29.44 – 36.813
36.81 – 44.197
44.19 – 51.5620
51.56 – 58.9310
58.93 – 66.34

ffl_count numeric feature

This column represents a count of Federal Firearms Licensees (FFL) — likely per geographic unit such as state or county — across 54 observations with no nulls. The distribution is right-skewed (skew = 1.66) with a wide IQR of 2496.5 and a standard deviation (2772.66) nearly equal to the mean (3073.11), signaling high dispersion. Five outliers (≈9.3% of rows) pull the tail toward the maximum of 10904, while the minimum is 220 and median only 2096.5, confirming a long upper tail. The kurtosis of 2.0 is moderate, suggesting the outliers are notable but not extreme relative to a normal distribution.

Treatment: Log-transform before regression or modelling to reduce right skew; investigate the 5 outlier rows for data integrity.

anthropic:default · confidence medium
Out[34]:

saturn.columns["ffl_count"].stats

statvalue
n54
nulls0 (0.0%)
unique50
min 220
max 10,904
mean 3073
median 2096
std 2773
q1 1300
q3 3,796
iqr 2496
skew 1.66
kurtosis 2.002
n_outliers 5
outlier_rate 0.09259
zero_rate 0
alert: outliers9.3% rows beyond 1.5 IQR
Fig 15.
Distribution of ffl_count. Vertical dash marks the median.
Show data table
Histogram bins for ffl_count (median: 2096.5).
bincount
220 – 174621
1746 – 327315
3273 – 479911
4799 – 63251
6325 – 78511
7851 – 93780
9378 – 1.09e+045

veteran_suicide_rate numeric numeric_target

This column contains veteran suicide rates, likely per 100,000 veterans, covering 54 observations (probably U.S. states plus a few territories or summary rows, given n=54 and 50 unique values). Values range from 24.9 to 52.3 with a mean of 35.6 and median of 34.65, indicating a relatively symmetric but mildly right-skewed distribution (skew=0.498). Noteworthy is the wide spread—an IQR of ~10.85 and max nearly double the min—highlighting substantial geographic disparity in veteran suicide rates, yet no statistical outliers were flagged.

Treatment: Use as-is or apply mild log-transform if residuals show heteroscedasticity; investigate the 4 duplicate values among 54 rows before modelling.

anthropic:default · confidence high
Out[37]:

saturn.columns["veteran_suicide_rate"].stats

statvalue
n54
nulls0 (0.0%)
unique50
min 24.9
max 52.3
mean 35.64
median 34.65
std 7.393
q1 29.8
q3 40.65
iqr 10.85
skew 0.4978
kurtosis -0.6953
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 16.
Distribution of veteran_suicide_rate. Vertical dash marks the median.
Show data table
Histogram bins for veteran_suicide_rate (median: 34.65).
bincount
24.9 – 28.8112
28.81 – 32.7310
32.73 – 36.6410
36.64 – 40.568
40.56 – 44.476
44.47 – 48.394
48.39 – 52.34

civilian_suicide_rate numeric feature

This column represents a civilian suicide rate (likely per 100,000 population) across 54 records, with no nulls and no zeros. The distribution is notably well-behaved: near-zero skew (0.17), platykurtic shape (kurtosis −1.09), and no detected outliers, suggesting values spread broadly but uniformly between 7.7 and 28.9. The mean (17.22) and median (17.1) are nearly identical, and the IQR of 9.8 covers a substantial range, indicating genuine cross-unit variation rather than clustering.

Treatment: Use as-is in modelling; no transformation needed given near-symmetric distribution and absence of outliers.

anthropic:default · confidence high
Out[40]:

saturn.columns["civilian_suicide_rate"].stats

statvalue
n54
nulls0 (0.0%)
unique50
min 7.7
max 28.9
mean 17.22
median 17.1
std 6.026
q1 12.2
q3 22
iqr 9.8
skew 0.1737
kurtosis -1.092
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 17.
Distribution of civilian_suicide_rate. Vertical dash marks the median.
Show data table
Histogram bins for civilian_suicide_rate (median: 17.1).
bincount
7.7 – 10.7310
10.73 – 13.769
13.76 – 16.797
16.79 – 19.819
19.81 – 22.847
22.84 – 25.877
25.87 – 28.95

veteran_risk_ratio numeric feature

This column represents a risk ratio specifically for a veteran population, with all 54 rows populated and no outliers detected. Values range from 1.8 to 3.23, with a mean of ~2.20 and median of 2.025, indicating veterans in this dataset are consistently at elevated risk (all values above 1.0 by a wide margin). The distribution is moderately right-skewed (skew ≈ 0.95) with a relatively tight IQR of 0.59, suggesting most observations cluster in the 1.85–2.44 range but a tail of higher-risk cases pulls the mean upward. The near-platykurtic shape (kurtosis ≈ -0.31) and 41 unique values out of 54 rows suggest this is a continuous derived metric rather than a categorised score.

Treatment: Use as-is or apply mild log-transform to reduce right skew before regression or classification modelling.

anthropic:default · confidence high
Out[43]:

saturn.columns["veteran_risk_ratio"].stats

statvalue
n54
nulls0 (0.0%)
unique41
min 1.8
max 3.23
mean 2.197
median 2.025
std 0.4101
q1 1.85
q3 2.44
iqr 0.59
skew 0.9459
kurtosis -0.3143
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 18.
Distribution of veteran_risk_ratio. Vertical dash marks the median.
Show data table
Histogram bins for veteran_risk_ratio (median: 2.025).
bincount
1.8 – 2.00425
2.004 – 2.2098
2.209 – 2.4137
2.413 – 2.6174
2.617 – 2.8213
2.821 – 3.0264
3.026 – 3.233

ptsd_prevalence_pct numeric feature

This column captures PTSD prevalence as a percentage, likely drawn from epidemiological or clinical survey data across 54 records. The most striking issue is a 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., tied to a subgroup or data source). Among the 29 non-null values, the distribution is fairly compact (min 6.3, max 12.3, IQR 3.0) with a near-flat kurtosis of −1.05, suggesting a roughly uniform spread rather than a peaked central cluster. Only 25 unique values across 29 non-null rows implies some repeated percentage figures, possibly due to rounding or grouped reporting.

Treatment: Investigate missingness mechanism before imputing; if MAR, impute with group-level median; if MNAR, model missingness as a separate binary indicator.

anthropic:default · confidence medium
Out[46]:

saturn.columns["ptsd_prevalence_pct"].stats

statvalue
n54
nulls25 (46.3%)
unique25
min 6.3
max 12.3
mean 8.621
median 8.3
std 1.848
q1 7.1
q3 10.1
iqr 3
skew 0.4299
kurtosis -1.049
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate46.3% null
Fig 19.
Distribution of ptsd_prevalence_pct. Vertical dash marks the median.
Show data table
Histogram bins for ptsd_prevalence_pct (median: 8.3).
bincount
6.3 – 7.510
7.5 – 8.76
8.7 – 9.95
9.9 – 11.14
11.1 – 12.34

va_users_with_ptsd_pct numeric feature

This column represents the percentage of VA users diagnosed with PTSD, likely aggregated at a state or facility level given n=54 (matching U.S. states/territories). The most surprising signal is the 46.3% null rate — nearly half the rows are missing, which severely limits usability and warrants investigation into whether missingness is systematic (e.g., certain facility types or regions not reporting). Among observed values, the distribution is fairly uniform (kurtosis –1.13, near-zero skew of 0.32) ranging from 7.9% to 18.5% with no outliers, suggesting genuine geographic variation rather than data error.

Treatment: Investigate missingness pattern before use; impute or subset to complete cases, then use as-is (no transform needed given near-normal distribution).

anthropic:default · confidence medium
Out[49]:

saturn.columns["va_users_with_ptsd_pct"].stats

statvalue
n54
nulls25 (46.3%)
unique25
min 7.9
max 18.5
mean 12.26
median 11.9
std 3.315
q1 9.5
q3 14.8
iqr 5.3
skew 0.3168
kurtosis -1.132
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate46.3% null
Fig 20.
Distribution of va_users_with_ptsd_pct. Vertical dash marks the median.
Show data table
Histogram bins for va_users_with_ptsd_pct (median: 11.9).
bincount
7.9 – 10.0210
10.02 – 12.145
12.14 – 14.265
14.26 – 16.385
16.38 – 18.54

spouse_unemployment_rate numeric feature

This column records the unemployment rate of a respondent's spouse, expressed as a percentage. It is missing for 64.81% of records — almost certainly because many respondents have no spouse — making the high null rate structurally expected rather than a data quality failure. Among the 19 non-null observations, values cluster tightly between 7.35 and 16.28 with a mean of ~10.46 and median of ~10.34, suggesting a near-symmetric distribution (skew 0.77, kurtosis 0.08); one outlier at 16.28 sits just beyond the upper fence. Only 15 unique values across 19 observations hints the rate may be recorded at a coarse or categorical granularity.

Treatment: Model nulls as a separate binary indicator (has_spouse); impute or subset non-null rows for any spouse-specific analysis; investigate the single outlier at 16.28 before inclusion.

anthropic:default · confidence medium
Out[52]:

saturn.columns["spouse_unemployment_rate"].stats

statvalue
n54
nulls35 (64.8%)
unique15
min 7.35
max 16.28
mean 10.46
median 10.34
std 2.382
q1 8.59
q3 11.45
iqr 2.86
skew 0.7742
kurtosis 0.07957
n_outliers 1
outlier_rate 0.05263
zero_rate 0
alert: null_rate64.8% null
alert: outliers5.3% rows beyond 1.5 IQR
Fig 21.
Distribution of spouse_unemployment_rate. Vertical dash marks the median.
Show data table
Histogram bins for spouse_unemployment_rate (median: 10.34).
bincount
7.35 – 9.1366
9.136 – 10.927
10.92 – 12.712
12.71 – 14.493
14.49 – 16.281

spouse_labor_force_participation numeric feature

This column represents the labor force participation rate (likely as a percentage) of spouses in a surveyed population. The most striking feature is a null rate of 64.81% — nearly two-thirds of the 54 rows are missing, triggering an alert and severely limiting usability. Among the 19 non-null observations, values are tightly clustered between 66.8 and 73.4 with a mean of ~69.7 and IQR of only 2.2, suggesting very low variance and minimal outlier risk within the observed subset.

Treatment: Investigate missingness mechanism before use; if MAR/MCAR, consider imputation with caution given only 19 valid observations across 15 unique values.

anthropic:default · confidence medium
Out[55]:

saturn.columns["spouse_labor_force_participation"].stats

statvalue
n54
nulls35 (64.8%)
unique15
min 66.8
max 73.4
mean 69.69
median 69.8
std 1.68
q1 68.35
q3 70.55
iqr 2.2
skew 0.3659
kurtosis -0.3936
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate64.8% null
Fig 22.
Distribution of spouse_labor_force_participation. Vertical dash marks the median.
Show data table
Histogram bins for spouse_labor_force_participation (median: 69.8).
bincount
66.8 – 68.124
68.12 – 69.444
69.44 – 70.766
70.76 – 72.083
72.08 – 73.42

va_utilization_pct numeric feature

This column represents a utilization percentage for VA (likely Veterans Affairs) resources or capacity, expressed as a numeric rate across 54 records. The distribution is notably uniform and platykurtic (kurtosis ≈ −1.05), meaning values are spread broadly and flatly across the range 13.8–42.3 with no outliers and near-zero skew (0.165). The mean (27.04) and median (26.2) are tightly aligned, and the IQR of 12.48 spans a moderate but consistent band, suggesting this metric reflects genuine variation across units or time periods rather than a concentrated signal.

Treatment: Use as-is in modelling; no transformation needed given near-symmetric, outlier-free distribution.

anthropic:default · confidence high
Out[58]:

saturn.columns["va_utilization_pct"].stats

statvalue
n54
nulls0 (0.0%)
unique50
min 13.8
max 42.3
mean 27.04
median 26.2
std 7.952
q1 21
q3 33.48
iqr 12.48
skew 0.1651
kurtosis -1.051
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 23.
Distribution of va_utilization_pct. Vertical dash marks the median.
Show data table
Histogram bins for va_utilization_pct (median: 26.2).
bincount
13.8 – 17.878
17.87 – 21.949
21.94 – 26.0110
26.01 – 30.097
30.09 – 34.168
34.16 – 38.237
38.23 – 42.35

installation categorical label

This column records U.S. military installation names (bases and airfields) associated with records in the dataset. The most striking issue is that 77.78% of the 54 rows are null, leaving only 12 non-null values — each appearing exactly once, yielding a perfectly uniform distribution (entropy_ratio = 1.0) with no dominant installation. The long-tail alert is somewhat misleading given the uniformity; the real concern is the extreme missingness, which severely limits analytical utility.

Treatment: Investigate source of 77.78% nulls before use; if imputation is not feasible, treat as a sparse categorical feature or exclude from models dependent on this column.

anthropic:default · confidence high
Out[61]:

saturn.columns["installation"].stats

statvalue
n54
nulls42 (77.8%)
unique12
top_value Naval Base San Diego, CA
top_rate 0.08333
cardinality 12
entropy 3.585
entropy_ratio 1
alert: long_tail12 singleton categories
alert: null_rate77.8% null
Fig 24.
Top values for installation.
Show data table
Top values for installation (12 unique shown, of 12 total).
valuecountshare
Naval Base San Diego, CA11.9%
Travis AFB, CA11.9%
Camp Pendleton, CA11.9%
Eglin AFB, FL11.9%
Fort Stewart, GA11.9%
Fort Campbell, KY11.9%
Fort Liberty, NC11.9%
Fort Bragg (Liberty), NC11.9%
Joint Base San Antonio, TX11.9%
Fort Cavazos, TX11.9%
Naval Station Norfolk, VA11.9%
Naval Base Kitsap, WA11.9%

county categorical metadata

This column represents a US county name associated with each record, likely geographic metadata for a location or address field. The most striking issue is a 77.78% null rate — only 12 of 54 rows have a value at all, rendering the column nearly empty. Among the 12 non-null values, cardinality is 10 with a near-uniform distribution (entropy ratio 0.979), meaning almost every populated entry is a distinct county, with only San Diego and Cumberland appearing twice.

Treatment: Investigate source of missing values before use; with 77.78% nulls the column is unreliable as a feature without significant imputation or enrichment.

anthropic:default · confidence high
Out[64]:

saturn.columns["county"].stats

statvalue
n54
nulls42 (77.8%)
unique10
top_value San Diego
top_rate 0.1667
cardinality 10
entropy 3.252
entropy_ratio 0.9788
alert: long_tail8 singleton categories
alert: null_rate77.8% null
Fig 25.
Top values for county.
Show data table
Top values for county (10 unique shown, of 10 total).
valuecountshare
San Diego23.7%
Cumberland23.7%
Solano11.9%
Okaloosa11.9%
Liberty11.9%
Christian11.9%
Bexar11.9%
Bell11.9%
Norfolk11.9%
Kitsap11.9%

annual_economic_impact_millions numeric feature

This column records estimated annual economic impact in millions of currency units, but 77.78% of the 54 rows are null — meaning only 12 rows carry a value, and those 12 values collapse to just 12 unique entries (effectively no repeats among non-null rows). The non-null values span 15,000 to 41,000 with a mean of ~23,833 and an IQR of 10,250, indicating a wide but plausibly real spread across large-scale economic entities; skew is moderate (0.87) and no outliers are flagged. The extreme null rate is the dominant concern and strongly suggests this field is sparsely populated by design or data collection failure, not random missingness.

Treatment: Investigate source of 77.78% nulls before use; if imputation is not justified, consider as a sparse auxiliary feature or drop depending on model tolerance for missingness.

anthropic:default · confidence medium
Out[67]:

saturn.columns["annual_economic_impact_millions"].stats

statvalue
n54
nulls42 (77.8%)
unique12
min 15,000
max 41,000
mean 2.383e+04
median 21,500
std 8178
q1 17,750
q3 28,000
iqr 10,250
skew 0.8718
kurtosis -0.3627
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate77.8% null
Fig 26.
Distribution of annual_economic_impact_millions. Vertical dash marks the median.
Show data table
Histogram bins for annual_economic_impact_millions (median: 21500.0).
bincount
1.5e+04 – 2.02e+045
2.02e+04 – 2.54e+043
2.54e+04 – 3.06e+041
3.06e+04 – 3.58e+042
3.58e+04 – 4.1e+041

total_personnel numeric feature

This column represents total personnel counts, likely headcount figures for organizations, units, or facilities. The most striking issue is a null rate of 77.78% — only 12 of 54 rows have a value — making it severely under-populated and potentially unreliable for modelling. Among the 12 non-null values, only 12 unique values exist (suggesting no repeated counts), ranging from 27,000 to 82,000 with a mean of ~46,083 and mild positive skew (0.855). The distribution is platykurtic (kurtosis ≈ 0.006) and has no outliers, so the non-null values themselves are internally well-behaved.

Treatment: Investigate source of 77.78% missingness before use; impute or drop depending on missingness mechanism, then consider log-transform given positive skew.

anthropic:default · confidence medium
Out[70]:

saturn.columns["total_personnel"].stats

statvalue
n54
nulls42 (77.8%)
unique12
min 27,000
max 82,000
mean 4.608e+04
median 43,500
std 1.621e+04
q1 34,250
q3 53,500
iqr 19,250
skew 0.8549
kurtosis 0.005583
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate77.8% null
Fig 27.
Distribution of total_personnel. Vertical dash marks the median.
Show data table
Histogram bins for total_personnel (median: 43500.0).
bincount
2.7e+04 – 3.8e+044
3.8e+04 – 4.9e+044
4.9e+04 – 6e+042
6e+04 – 7.1e+041
7.1e+04 – 8.2e+041

direct_jobs numeric feature

This column records the count of direct jobs associated with each record — likely a project, investment, or contract — with values ranging from 8,500 to 25,000 and a mean of ~14,292. The most striking signal is an extremely high null rate of 77.78%, meaning only 12 of 54 rows carry a value, which severely limits its usability. Among the non-null values, only 12 distinct figures exist across those 12 populated rows (effectively all unique), suggesting manual entry or discrete reporting tiers rather than continuous measurement. Distribution is mildly right-skewed (skew 0.81) with no outliers detected.

Treatment: Impute or flag missing values (77.78% null rate) before use; consider whether missingness is systematic before including in any model.

anthropic:default · confidence medium
Out[73]:

saturn.columns["direct_jobs"].stats

statvalue
n54
nulls42 (77.8%)
unique12
min 8,500
max 25,000
mean 1.429e+04
median 13,500
std 4883
q1 10,750
q3 16,500
iqr 5,750
skew 0.8149
kurtosis -0.06741
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate77.8% null
Fig 28.
Distribution of direct_jobs. Vertical dash marks the median.
Show data table
Histogram bins for direct_jobs (median: 13500.0).
bincount
8500 – 1.18e+044
1.18e+04 – 1.51e+044
1.51e+04 – 1.84e+042
1.84e+04 – 2.17e+041
2.17e+04 – 2.5e+041

ffl_per_100k numeric feature

This column represents the number of Federal Firearms Licensees (FFLs) per 100,000 residents, almost certainly measured at the U.S. state (or territory) level given n=54. The distribution is severely right-skewed (skew=2.47, kurtosis=5.08): the median is only 12.37 but the mean is pulled to 49.44 by extreme outliers, with a maximum of 341.66 — over 27× the median. Eight observations (15.1%) are flagged as outliers, likely small-population states or territories where per-capita rates explode due to a low denominator.

Treatment: Log-transform before modelling to reduce skew; investigate the 8 outliers for small-denominator population artifacts before inclusion.

anthropic:default · confidence high
Out[76]:

saturn.columns["ffl_per_100k"].stats

statvalue
n54
nulls1 (1.9%)
unique49
min 1.377
max 341.7
mean 49.44
median 12.37
std 87.37
q1 7.034
q3 31.09
iqr 24.05
skew 2.474
kurtosis 5.076
n_outliers 8
outlier_rate 0.1509
zero_rate 0
alert: high_skewskew=+2.47
alert: outliers15.1% rows beyond 1.5 IQR
Fig 29.
Distribution of ffl_per_100k. Vertical dash marks the median.
Show data table
Histogram bins for ffl_per_100k (median: 12.368927310188392).
bincount
1.377 – 49.9941
49.99 – 98.64
98.6 – 147.22
147.2 – 195.81
195.8 – 244.42
244.4 – 2930
293 – 341.73

active_duty_per_100k numeric feature

This column represents active-duty military personnel per 100,000 residents, almost certainly measured at the U.S. state/territory level (n=54 aligns with states plus D.C. and territories). The distribution is severely right-skewed (skew=2.91, kurtosis=7.22): the median is only 92.5 yet the mean is 610.7 and the max reaches 5,544, driven by 9 outliers (17% of rows) — almost certainly states with large military installations such as Hawaii, Alaska, Virginia, or small-population territories. The IQR of 301 versus a std of 1,389 further confirms extreme concentration at low values with a long heavy tail.

Treatment: Log-transform (log1p) before regression or clustering to reduce skew; consider flagging or separately modelling the 9 outlier rows.

anthropic:default · confidence high
Out[79]:

saturn.columns["active_duty_per_100k"].stats

statvalue
n54
nulls1 (1.9%)
unique49
min 3.105
max 5544
mean 610.7
median 92.49
std 1389
q1 21.96
q3 323.3
iqr 301.3
skew 2.911
kurtosis 7.217
n_outliers 9
outlier_rate 0.1698
zero_rate 0
alert: high_skewskew=+2.91
alert: outliers17.0% rows beyond 1.5 IQR
Fig 30.
Distribution of active_duty_per_100k. Vertical dash marks the median.
Show data table
Histogram bins for active_duty_per_100k (median: 92.48724939519369).
bincount
3.105 – 794.744
794.7 – 15863
1586 – 23782
2378 – 31700
3170 – 39610
3961 – 47531
4753 – 55443

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-us-military-veteran-analysis-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove us military veteran analysis},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-us-military-veteran-analysis}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove us military veteran analysis. Source: /home/coolhand/html/datavis/data_trove/demographic/veterans/military_firearm_merged_analysis.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-us-military-veteran-analysis