saturn

/home/coolhand/html/datavis/data_trove/geographic/fips_county/geology_counties.csv 3,235 rows sample n=3,235 seed 42 2026-05-01T17:27:52+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/geographic/fips_county/geology_counties.csv
Total rows3,235
Profiled sample3,235
Columns9
Generated2026-05-01T17:27:52+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset links 3,235 U.S. counties (by FIPS code) to their nearest geological mineral or fuel deposit, including the deposit's type, era, state, and distance. Coal dominates deposit_type at roughly 42% of rows, with Copper, Iron, and Oil rounding out the major categories — worth checking whether this reflects true geological prevalence or sampling bias. The distance_to_deposit column is heavily right-skewed (skew ~7.5, max 5652 vs. median 152), so a small number of remote counties pull the mean far above typical values and deserve a closer look. Deposit eras span nine geological periods led by Pennsylvanian (~23%), and deposit_state concentrates in Missouri, Ohio, and Alabama even though counties themselves are spread across all 56 state codes.

fips high anthropic:claude-opus-4-7

This is the FIPS code identifying U.S. counties (or equivalent geographies), with all 3235 values unique and no nulls. Values span 1001 to 78030, consistent with state-prefixed county codes, and the distribution is broad (IQR 27090) rather than meaningfully skewed (skew 0.17). Treat the numeric stats as incidental — magnitude has no quantitative meaning here.

county_name high anthropic:claude-opus-4-7

This column holds US county-level place names, with 1,973 unique values across 3,235 rows and almost every entry containing the word 'county' (2,999 occurrences) alongside Louisiana 'parish' (64) and Puerto Rico 'municipio' (78) variants. Names repeat heavily — duplicate rate is 39% with classics like 'Washington County' (30), 'Jefferson County' (25), and 'Franklin County' (24) topping the list, which is expected since the same county name recurs across states. Entries are short (mean 14.2 chars, ~2 words) and there are no nulls or empties.

state high anthropic:claude-opus-4-7

This is a US state code column with 56 unique values — more than the 50 states, suggesting territories or codes like DC, PR, or military designations are included. The distribution is fairly even (entropy ratio 0.92), with TX leading at 7.9% (254 of 3235 rows) followed by GA, VA, KY, and MO, consistent with a county- or jurisdiction-level dataset where larger states contribute more rows. No nulls.

state_name high anthropic:claude-opus-4-7

This column holds U.S. state names, almost certainly one row per county or county-equivalent given the 3,235 total rows and 56 distinct values (the 50 states plus territories/DC). Texas dominates at 254 rows (7.85%), followed by Georgia (159) and Virginia (133), which matches the known county-count ranking. Distribution is highly even across categories (entropy ratio 0.92) with no nulls.

distance_to_deposit high anthropic:claude-opus-4-7

Numeric feature measuring distance to a deposit, likely in metres, with all 3235 rows populated and 2202 distinct values. The distribution is severely right-skewed (skew 7.51, kurtosis 77.6): the median is 152.0 while the mean is 230.12 and the max stretches to 5652.4, more than 14x the Q3 of 235.75. About 4.9% of rows (159) flag as outliers, and there are no zeros or nulls.

nearest_deposit high anthropic:claude-opus-4-7

This column names the nearest mineral deposit for each record, with 97 distinct sites across 3,235 rows and no nulls. Distribution is moderately concentrated: "Hatchet Creek Copper" alone accounts for 13.4% (434 rows), and the top three deposits cover roughly 30% of the data, yet entropy ratio of 0.76 indicates the long tail still carries meaningful spread. Names mix mine types (copper, clay, sulfur), pits, banks, quads, and districts, suggesting heterogeneous source nomenclature rather than a clean controlled vocabulary.

deposit_type high anthropic:claude-opus-4-7

Categorical label identifying the type of mineral or fuel deposit, with 10 distinct values across 3235 rows and no nulls. Coal dominates at 41.6% (1345 rows), followed by Copper, Iron, and Oil, while Zinc (23) and Silver (21) are rare. Entropy ratio of 0.76 indicates a moderately concentrated distribution skewed toward fossil/base resources rather than precious metals.

deposit_era high anthropic:claude-opus-4-7

Categorical geological era/period label for deposits, spanning 9 distinct values across 3235 complete rows. Distribution is unusually flat for a categorical (entropy_ratio 0.945) — Pennsylvanian leads at only 22.6% (732 rows) and even the smallest, Permian, holds 95 rows. Note the mixed granularity: broad eras (Paleozoic, Precambrian) sit alongside specific periods (Devonian, Miocene), so categories are not mutually exclusive in geological time.

deposit_state high anthropic:claude-opus-4-7

`deposit_state` is a categorical US-state field with 25 distinct values across 3,235 rows and no nulls. Distribution is fairly even (entropy ratio 0.83); the top state Missouri accounts for only 14.8%, followed closely by Ohio (448) and Alabama (434). Coverage is partial — only half the US states appear — so this is not a nationwide sample.

Numeric correlation

fips numeric

rows3,235
null0 (0.0%)
unique3,235
min1,001
max78,030
mean31,523
median30,035
std16,432
q119,036
q346,126
iqr27,090
skew0.174
kurtosis-0.608
n_outliers0
outlier_rate0.000
zero_rate0.000

county_name text

95th-percentile length under 20 chars 39.0% duplicate strings
rows3,235
null0 (0.0%)
unique1,973
len_min4
len_max46
len_mean14.179
len_median14.000
len_p9518.000
word_mean2.084
word_median2.000
n_empty0
n_duplicates1,262
duplicate_rate0.390
vocab_size1,973
readability_flesch_mean33.650
emoji_rate0.000
url_rate0.000
one_word_rate3.09e-04
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Pickens County
  2. Knox County
  3. Alpine County
  4. Ozaukee County
  5. Keokuk County
  6. Rush County
  7. DeKalb County
  8. Butler County
  9. Dewey County
  10. Hartley County

state categorical

rows3,235
null0 (0.0%)
unique56
top_valueTX
top_rate0.079
cardinality56
entropy5.338
entropy_ratio0.919
Top values (rank 1–20)
  1. TX — 254
  2. GA — 159
  3. VA — 133
  4. KY — 120
  5. MO — 115
  6. KS — 105
  7. IL — 102
  8. NC — 100
  9. IA — 99
  10. TN — 95
  11. NE — 93
  12. IN — 92
  13. OH — 88
  14. MN — 87
  15. MI — 83
  16. MS — 82
  17. PR — 78
  18. OK — 77
  19. AR — 75
  20. WI — 72

state_name categorical

rows3,235
null0 (0.0%)
unique56
top_valueTexas
top_rate0.079
cardinality56
entropy5.338
entropy_ratio0.919
Top values (rank 1–20)
  1. Texas — 254
  2. Georgia — 159
  3. Virginia — 133
  4. Kentucky — 120
  5. Missouri — 115
  6. Kansas — 105
  7. Illinois — 102
  8. North Carolina — 100
  9. Iowa — 99
  10. Tennessee — 95
  11. Nebraska — 93
  12. Indiana — 92
  13. Ohio — 88
  14. Minnesota — 87
  15. Michigan — 83
  16. Mississippi — 82
  17. Puerto Rico — 78
  18. Oklahoma — 77
  19. Arkansas — 75
  20. Wisconsin — 72

distance_to_deposit numeric

skew=+7.51
rows3,235
null0 (0.0%)
unique2,202
min1.800
max5,652
mean230.120
median152.000
std399.854
q185.500
q3235.750
iqr150.250
skew7.511
kurtosis77.601
n_outliers159
outlier_rate0.049
zero_rate0.000

nearest_deposit categorical

rows3,235
null0 (0.0%)
unique97
top_valueHatchet Creek Copper
top_rate0.134
cardinality97
entropy4.999
entropy_ratio0.757
Top values (rank 1–20)
  1. Hatchet Creek Copper — 434
  2. Chaney No 1 Clay Mine — 302
  3. Cardonia Pit — 263
  4. Hager Mine — 179
  5. Lodgepole Quad — 171
  6. Cooper Mine — 164
  7. Stewart May — 161
  8. Main Pass Sulfur Mine — 115
  9. Dunn Bank — 101
  10. Batesville District — 96
  11. Unknown - Coal & Zn — 90
  12. Tole and Thorp Fireclay Mine — 89
  13. Ventech Gas Processors Sulfur Plant — 84
  14. Midland Farms Sulfur Plant — 66
  15. Belden Pit — 65
  16. Afc Pit — 45
  17. Iron Mine Hill Deposit — 43
  18. Butte Valley, Alamo #1 — 42
  19. Santa Rosa Tar Sands — 41
  20. Old Leyden Mine — 39

deposit_type categorical

rows3,235
null0 (0.0%)
unique10
top_valueCoal
top_rate0.416
cardinality10
entropy2.536
entropy_ratio0.763
Top values (rank 1–20)
  1. Coal — 1,345
  2. Copper — 485
  3. Iron — 403
  4. Oil — 400
  5. Natural Gas — 235
  6. Lead — 170
  7. Phosphate — 81
  8. Gold — 72
  9. Zinc — 23
  10. Silver — 21

deposit_era categorical

rows3,235
null0 (0.0%)
unique9
top_valuePennsylvanian
top_rate0.226
cardinality9
entropy2.997
entropy_ratio0.945
Top values (rank 1–20)
  1. Pennsylvanian — 732
  2. Devonian — 422
  3. Paleozoic — 419
  4. Tertiary — 401
  5. Mississippian — 401
  6. Precambrian — 327
  7. Cretaceous — 289
  8. Miocene — 149
  9. Permian — 95

deposit_state categorical

rows3,235
null0 (0.0%)
unique25
top_valueMissouri
top_rate0.148
cardinality25
entropy3.850
entropy_ratio0.829
Top values (rank 1–20)
  1. Missouri — 478
  2. Ohio — 448
  3. Alabama — 434
  4. Indiana — 263
  5. Arkansas — 257
  6. South Dakota — 210
  7. New Jersey — 179
  8. Texas — 170
  9. Colorado — 144
  10. Louisiana — 115
  11. New York — 99
  12. Oregon — 71
  13. California — 68
  14. Idaho — 54
  15. New Mexico — 51
  16. Washington — 47
  17. Rhode Island — 43
  18. Montana — 37
  19. Utah — 30
  20. Arizona — 16