saturn·

fips county geology counties

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/geographic/fips_county/geology_counties.csv

Saturn profiled 3,235 rows across 9 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/geographic/fips_county/geology_counties.csv",
    "--findings", "fips_county-geology_counties.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset links 3,235 U.S. counties (by FIPS code) to their nearest geological mineral or fuel deposit, including the deposit's type, era, state, and distance. Coal dominates deposit_type at roughly 42% of rows, with Copper, Iron, and Oil rounding out the major categories — worth checking whether this reflects true geological prevalence or sampling bias. The distance_to_deposit column is heavily right-skewed (skew ~7.5, max 5652 vs. median 152), so a small number of remote counties pull the mean far above typical values and deserve a closer look. Deposit eras span nine geological periods led by Pennsylvanian (~23%), and deposit_state concentrates in Missouri, Ohio, and Alabama even though counties themselves are spread across all 56 state codes.

citing: row_count · column_count · deposit_type.top_values · deposit_type.top_rate · distance_to_deposit.skew · distance_to_deposit.median · distance_to_deposit.max · distance_to_deposit.mean · deposit_era.top_values · deposit_era.top_rate · deposit_state.top_values · state_name.top_values · state_name.cardinality

Fig 1.
deposit_type · Shows how heavily Coal dominates the deposit mix relative to metals and hydrocarbons.
Show data table
Top values for deposit_type (10 unique shown, of 10 total).
valuecountshare
Coal134541.6%
Copper48515.0%
Iron40312.5%
Oil40012.4%
Natural Gas2357.3%
Lead1705.3%
Phosphate812.5%
Gold722.2%
Zinc230.7%
Silver210.6%
Fig 2.
distance_to_deposit · Highlights the strong right skew and the long tail of counties far from any deposit.
Show data table
Histogram bins for distance_to_deposit (median: 152.0).
bincount
1.8 – 143.11519
143.1 – 284.31181
284.3 – 425.6343
425.6 – 566.964
566.9 – 708.14
708.1 – 849.42
849.4 – 990.74
990.7 – 11322
1132 – 12730
1273 – 14143
1414 – 15568
1556 – 169782
1697 – 18383
1838 – 19803
1980 – 21211
2121 – 22620
2262 – 24030
2403 – 25455
2545 – 26861
2686 – 28270
2827 – 29680
2968 – 31100
3110 – 32510
3251 – 33920
3392 – 35330
3533 – 36750
3675 – 38160
3816 – 39570
3957 – 40980
4098 – 42400
4240 – 43810
4381 – 45220
4522 – 46640
4664 – 48053
4805 – 49462
4946 – 50870
5087 – 52290
5229 – 53700
5370 – 55112
5511 – 56523
Fig 3.
deposit_era · Compares the nine geological eras, with Pennsylvanian leading and Permian trailing.
Show data table
Top values for deposit_era (9 unique shown, of 9 total).
valuecountshare
Pennsylvanian73222.6%
Devonian42213.0%
Paleozoic41913.0%
Tertiary40112.4%
Mississippian40112.4%
Precambrian32710.1%
Cretaceous2898.9%
Miocene1494.6%
Permian952.9%
Fig 4.
deposit_state · Reveals which states host the most deposits feeding nearby counties, led by Missouri and Ohio.
Show data table
Top values for deposit_state (20 unique shown, of 25 total).
valuecountshare
Missouri47814.8%
Ohio44813.8%
Alabama43413.4%
Indiana2638.1%
Arkansas2577.9%
South Dakota2106.5%
New Jersey1795.5%
Texas1705.3%
Colorado1444.5%
Louisiana1153.6%
New York993.1%
Oregon712.2%
California682.1%
Idaho541.7%
New Mexico511.6%
Washington471.5%
Rhode Island431.3%
Montana371.1%
Utah300.9%
Arizona160.5%
Fig 5.
state_name · Confirms broad geographic coverage across states, with Texas and Georgia contributing the most county rows.
Show data table
Top values for state_name (20 unique shown, of 56 total).
valuecountshare
Texas2547.9%
Georgia1594.9%
Virginia1334.1%
Kentucky1203.7%
Missouri1153.6%
Kansas1053.2%
Illinois1023.2%
North Carolina1003.1%
Iowa993.1%
Tennessee952.9%
Nebraska932.9%
Indiana922.8%
Ohio882.7%
Minnesota872.7%
Michigan832.6%
Mississippi822.5%
Puerto Rico782.4%
Oklahoma772.4%
Arkansas752.3%
Wisconsin722.2%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
fipsnumeric0.0%
county_nametext0.0%
statecategorical0.0%
state_namecategorical0.0%
distance_to_depositnumeric0.0%
nearest_depositcategorical0.0%
deposit_typecategorical0.0%
deposit_eracategorical0.0%
deposit_statecategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
fipsdistance_to_deposit
fips+1.00+0.30
distance_to_deposit+0.30+1.00

fips numeric identifier

This is the FIPS code identifying U.S. counties (or equivalent geographies), with all 3235 values unique and no nulls. Values span 1001 to 78030, consistent with state-prefixed county codes, and the distribution is broad (IQR 27090) rather than meaningfully skewed (skew 0.17). Treat the numeric stats as incidental — magnitude has no quantitative meaning here.

Treatment: Cast to string and use as a join key to county-level reference data; do not model as numeric.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["fips"].stats

statvalue
n3,235
nulls0 (0.0%)
unique3,235
min 1,001
max 78,030
mean 3.152e+04
median 30,035
std 1.643e+04
q1 19,036
q3 46,126
iqr 27,090
skew 0.1738
kurtosis -0.6075
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of fips. Vertical dash marks the median.
Show data table
Histogram bins for fips (median: 30035.0).
bincount
1001 – 292797
2927 – 485215
4852 – 6778133
6778 – 870464
8704 – 1.063e+0412
1.063e+04 – 1.256e+0468
1.256e+04 – 1.448e+04159
1.448e+04 – 1.641e+0449
1.641e+04 – 1.833e+04194
1.833e+04 – 2.026e+04204
2.026e+04 – 2.218e+04184
2.218e+04 – 2.411e+0439
2.411e+04 – 2.604e+0433
2.604e+04 – 2.796e+04152
2.796e+04 – 2.989e+04197
2.989e+04 – 3.181e+04149
3.181e+04 – 3.374e+0427
3.374e+04 – 3.566e+0454
3.566e+04 – 3.759e+04162
3.759e+04 – 3.952e+04141
3.952e+04 – 4.144e+04113
4.144e+04 – 4.337e+0467
4.337e+04 – 4.529e+0451
4.529e+04 – 4.722e+04161
4.722e+04 – 4.914e+04283
4.914e+04 – 5.107e+0448
5.107e+04 – 5.3e+0499
5.3e+04 – 5.492e+0494
5.492e+04 – 5.685e+0495
5.685e+04 – 5.877e+040
5.877e+04 – 6.07e+045
6.07e+04 – 6.262e+040
6.262e+04 – 6.455e+040
6.455e+04 – 6.648e+041
6.648e+04 – 6.84e+040
6.84e+04 – 7.033e+044
7.033e+04 – 7.225e+0478
7.225e+04 – 7.418e+040
7.418e+04 – 7.61e+040
7.61e+04 – 7.803e+043

county_name text metadata

This column holds US county-level place names, with 1,973 unique values across 3,235 rows and almost every entry containing the word 'county' (2,999 occurrences) alongside Louisiana 'parish' (64) and Puerto Rico 'municipio' (78) variants. Names repeat heavily — duplicate rate is 39% with classics like 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24) topping the list, which is expected since the same county name recurs across states. Entries are short (mean 14.2 chars, ~2 words) and there are no nulls or empties.

Treatment: Pair with a state column to form a unique geographic key before joining or aggregating.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["county_name"].stats

statvalue
n3,235
nulls0 (0.0%)
unique1,973
len_min 4
len_max 46
len_mean 14.18
len_median 14
len_p95 18
word_mean 2.084
word_median 2
n_empty 0
n_duplicates 1,262
duplicate_rate 0.3901
vocab_size 1,973
readability_flesch_mean 33.65
emoji_rate 0
url_rate 0
one_word_rate 0.0003091
allcaps_rate 0
boilerplate_rate 0
alert: short_text95th-percentile length under 20 chars
alert: duplicates39.0% duplicate strings
Fig 9.
Character-length distribution for county_name.
Show data table
Character-length distribution for county_name (mean: 14.179289026275116).
charscount
4 – 51
5 – 60
6 – 70
7 – 80
8 – 90
9 – 1029
10 – 11256
11 – 12465
12 – 13683
13 – 14588
14 – 16495
16 – 17294
17 – 18221
18 – 1967
19 – 2051
20 – 2123
21 – 2216
22 – 2314
23 – 248
24 – 254
25 – 267
26 – 271
27 – 281
28 – 291
29 – 300
30 – 312
31 – 321
32 – 331
33 – 341
34 – 361
36 – 370
37 – 380
38 – 390
39 – 400
40 – 412
41 – 421
42 – 430
43 – 440
44 – 450
45 – 461

state categorical feature

This is a US state code column with 56 unique values — more than the 50 states, suggesting territories or codes like DC, PR, or military designations are included. The distribution is fairly even (entropy ratio 0.92), with TX leading at 7.9% (254 of 3235 rows) followed by GA, VA, KY, and MO, consistent with a county- or jurisdiction-level dataset where larger states contribute more rows. No nulls.

Treatment: One-hot or target-encode for modelling; verify the 6 extra codes beyond 50 states.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["state"].stats

statvalue
n3,235
nulls0 (0.0%)
unique56
top_value TX
top_rate 0.07852
cardinality 56
entropy 5.338
entropy_ratio 0.9192
Fig 10.
Top values for state.
Show data table
Top values for state (20 unique shown, of 56 total).
valuecountshare
TX2547.9%
GA1594.9%
VA1334.1%
KY1203.7%
MO1153.6%
KS1053.2%
IL1023.2%
NC1003.1%
IA993.1%
TN952.9%
NE932.9%
IN922.8%
OH882.7%
MN872.7%
MI832.6%
MS822.5%
PR782.4%
OK772.4%
AR752.3%
WI722.2%

state_name categorical feature

This column holds U.S. state names, almost certainly one row per county or county-equivalent given the 3,235 total rows and 56 distinct values (the 50 states plus territories/DC). Texas dominates at 254 rows (7.85%), followed by Georgia (159) and Virginia (133), which matches the known county-count ranking. Distribution is highly even across categories (entropy ratio 0.92) with no nulls.

Treatment: Use as a categorical grouping key; one-hot or target-encode for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["state_name"].stats

statvalue
n3,235
nulls0 (0.0%)
unique56
top_value Texas
top_rate 0.07852
cardinality 56
entropy 5.338
entropy_ratio 0.9192
Fig 11.
Top values for state_name.
Show data table
Top values for state_name (20 unique shown, of 56 total).
valuecountshare
Texas2547.9%
Georgia1594.9%
Virginia1334.1%
Kentucky1203.7%
Missouri1153.6%
Kansas1053.2%
Illinois1023.2%
North Carolina1003.1%
Iowa993.1%
Tennessee952.9%
Nebraska932.9%
Indiana922.8%
Ohio882.7%
Minnesota872.7%
Michigan832.6%
Mississippi822.5%
Puerto Rico782.4%
Oklahoma772.4%
Arkansas752.3%
Wisconsin722.2%

distance_to_deposit numeric feature

Numeric feature measuring distance to a deposit, likely in metres, with all 3235 rows populated and 2202 distinct values. The distribution is severely right-skewed (skew 7.51, kurtosis 77.6): the median is 152.0 while the mean is 230.12 and the max stretches to 5652.4, more than 14x the Q3 of 235.75. About 4.9% of rows (159) flag as outliers, and there are no zeros or nulls.

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["distance_to_deposit"].stats

statvalue
n3,235
nulls0 (0.0%)
unique2,202
min 1.8
max 5652
mean 230.1
median 152
std 399.9
q1 85.5
q3 235.8
iqr 150.2
skew 7.511
kurtosis 77.6
n_outliers 159
outlier_rate 0.04915
zero_rate 0
alert: high_skewskew=+7.51
Fig 12.
Distribution of distance_to_deposit. Vertical dash marks the median.
Show data table
Histogram bins for distance_to_deposit (median: 152.0).
bincount
1.8 – 143.11519
143.1 – 284.31181
284.3 – 425.6343
425.6 – 566.964
566.9 – 708.14
708.1 – 849.42
849.4 – 990.74
990.7 – 11322
1132 – 12730
1273 – 14143
1414 – 15568
1556 – 169782
1697 – 18383
1838 – 19803
1980 – 21211
2121 – 22620
2262 – 24030
2403 – 25455
2545 – 26861
2686 – 28270
2827 – 29680
2968 – 31100
3110 – 32510
3251 – 33920
3392 – 35330
3533 – 36750
3675 – 38160
3816 – 39570
3957 – 40980
4098 – 42400
4240 – 43810
4381 – 45220
4522 – 46640
4664 – 48053
4805 – 49462
4946 – 50870
5087 – 52290
5229 – 53700
5370 – 55112
5511 – 56523

nearest_deposit categorical feature

This column names the nearest mineral deposit for each record, with 97 distinct sites across 3,235 rows and no nulls. Distribution is moderately concentrated: "Hatchet Creek Copper" alone accounts for 13.4% (434 rows), and the top three deposits cover roughly 30% of the data, yet entropy ratio of 0.76 indicates the long tail still carries meaningful spread. Names mix mine types (copper, clay, sulfur), pits, banks, quads, and districts, suggesting heterogeneous source nomenclature rather than a clean controlled vocabulary.

Treatment: Treat as a high-cardinality categorical: target- or frequency-encode, and consider grouping rare deposits into an 'other' bucket.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["nearest_deposit"].stats

statvalue
n3,235
nulls0 (0.0%)
unique97
top_value Hatchet Creek Copper
top_rate 0.1342
cardinality 97
entropy 4.999
entropy_ratio 0.7574
Fig 13.
Top values for nearest_deposit.
Show data table
Top values for nearest_deposit (20 unique shown, of 97 total).
valuecountshare
Hatchet Creek Copper43413.4%
Chaney No 1 Clay Mine3029.3%
Cardonia Pit2638.1%
Hager Mine1795.5%
Lodgepole Quad1715.3%
Cooper Mine1645.1%
Stewart May1615.0%
Main Pass Sulfur Mine1153.6%
Dunn Bank1013.1%
Batesville District963.0%
Unknown - Coal & Zn902.8%
Tole and Thorp Fireclay Mine892.8%
Ventech Gas Processors Sulfur Plant842.6%
Midland Farms Sulfur Plant662.0%
Belden Pit652.0%
Afc Pit451.4%
Iron Mine Hill Deposit431.3%
Butte Valley, Alamo #1421.3%
Santa Rosa Tar Sands411.3%
Old Leyden Mine391.2%

deposit_type categorical label

Categorical label identifying the type of mineral or fuel deposit, with 10 distinct values across 3235 rows and no nulls. Coal dominates at 41.6% (1345 rows), followed by Copper, Iron, and Oil, while Zinc (23) and Silver (21) are rare. Entropy ratio of 0.76 indicates a moderately concentrated distribution skewed toward fossil/base resources rather than precious metals.

Treatment: One-hot encode; consider grouping rare classes (Zinc, Silver) if used as a target.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["deposit_type"].stats

statvalue
n3,235
nulls0 (0.0%)
unique10
top_value Coal
top_rate 0.4158
cardinality 10
entropy 2.536
entropy_ratio 0.7633
Fig 14.
Top values for deposit_type.
Show data table
Top values for deposit_type (10 unique shown, of 10 total).
valuecountshare
Coal134541.6%
Copper48515.0%
Iron40312.5%
Oil40012.4%
Natural Gas2357.3%
Lead1705.3%
Phosphate812.5%
Gold722.2%
Zinc230.7%
Silver210.6%

deposit_era categorical feature

Categorical geological era/period label for deposits, spanning 9 distinct values across 3235 complete rows. Distribution is unusually flat for a categorical (entropy_ratio 0.945) — Pennsylvanian leads at only 22.6% (732 rows) and even the smallest, Permian, holds 95 rows. Note the mixed granularity: broad eras (Paleozoic, Precambrian) sit alongside specific periods (Devonian, Miocene), so categories are not mutually exclusive in geological time.

Treatment: One-hot encode, but consider reconciling overlapping era/period granularity before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["deposit_era"].stats

statvalue
n3,235
nulls0 (0.0%)
unique9
top_value Pennsylvanian
top_rate 0.2263
cardinality 9
entropy 2.997
entropy_ratio 0.9453
Fig 15.
Top values for deposit_era.
Show data table
Top values for deposit_era (9 unique shown, of 9 total).
valuecountshare
Pennsylvanian73222.6%
Devonian42213.0%
Paleozoic41913.0%
Tertiary40112.4%
Mississippian40112.4%
Precambrian32710.1%
Cretaceous2898.9%
Miocene1494.6%
Permian952.9%

deposit_state categorical feature

`deposit_state` is a categorical US-state field with 25 distinct values across 3,235 rows and no nulls. Distribution is fairly even (entropy ratio 0.83); the top state Missouri accounts for only 14.8%, followed closely by Ohio (448) and Alabama (434). Coverage is partial — only half the US states appear — so this is not a nationwide sample.

Treatment: One-hot or target-encode for modelling; verify whether the 25-state coverage is intentional.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["deposit_state"].stats

statvalue
n3,235
nulls0 (0.0%)
unique25
top_value Missouri
top_rate 0.1478
cardinality 25
entropy 3.85
entropy_ratio 0.829
Fig 16.
Top values for deposit_state.
Show data table
Top values for deposit_state (20 unique shown, of 25 total).
valuecountshare
Missouri47814.8%
Ohio44813.8%
Alabama43413.4%
Indiana2638.1%
Arkansas2577.9%
South Dakota2106.5%
New Jersey1795.5%
Texas1705.3%
Colorado1444.5%
Louisiana1153.6%
New York993.1%
Oregon712.2%
California682.1%
Idaho541.7%
New Mexico511.6%
Washington471.5%
Rhode Island431.3%
Montana371.1%
Utah300.9%
Arizona160.5%

How to cite

click to copy

BibTeX
@misc{saturn-fips-county-geology-counties-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: fips county geology counties},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/fips_county-geology_counties}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: fips county geology counties. Source: /home/coolhand/html/datavis/data_trove/geographic/fips_county/geology_counties.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/fips_county-geology_counties