saturn·

bigfoot listings 20260210

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/bigfoot/listings_20260210.json

Saturn profiled 5,411 rows across 9 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/bigfoot/listings_20260210.json",
    "--findings", "bigfoot-listings_20260210.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 5,411 Bigfoot sighting reports from BFRO, with 9 columns covering location (state, county), timing (year, month), a classification grade, a short description, and a source URL. Sightings are concentrated in Washington, California, Ohio and Florida, and cluster heavily in late-summer and early-fall months (August, October, July). Classification is dominated by Class B (2,722) and Class A (2,655), with Class C barely represented (34) — worth flagging if you plan to filter by report quality. The year distribution is left-skewed with a median of 2001 and a long tail back to 1870, so most activity is recent. Note that the county field has 338 empty values and an 81% duplicate rate (expected, since counties repeat across reports).

citing: row_count · column_count · columns.state.top_values · columns.month.top_values · columns.classification.top_values · columns.year.stats · columns.county.stats · columns.description.stats

Out[4]:

saturn.schema() · 9 columns

column kind n null% unique alerts
id numeric 5,411 0.0% 5,411
state categorical 5,411 0.0% 53
state_code categorical 5,411 0.0% 53
county text 5,411 0.0% 1,022 one_word short_text duplicates
url text 5,411 0.0% 5,411 near_unique one_word url_heavy
month categorical 5,411 3.0% 32
year numeric 5,411 1.1% 99
classification categorical 5,411 0.0% 3
description text 5,411 0.0% 5,407 near_unique
Fig 1.
state · Top states for sightings — Washington and California lead by a wide margin.
Show data table
Top values for state (20 unique shown, of 53 total).
valuecountshare
Washington63111.7%
California4318.0%
Ohio3175.9%
Florida3145.8%
Oregon2534.7%
Illinois2394.4%
Texas2384.4%
Michigan2174.0%
Missouri1613.0%
Georgia1352.5%
Colorado1282.4%
Pennsylvania1252.3%
British Columbia1222.3%
New York1162.1%
Kentucky1152.1%
Arkansas1041.9%
Tennessee1041.9%
West Virginia1041.9%
Oklahoma1011.9%
Idaho991.8%
Fig 2.
classification · Report quality is split almost evenly between Class A and B, with Class C negligible.
Show data table
Top values for classification (3 unique shown, of 3 total).
valuecountshare
Class B272250.3%
Class A265549.1%
Class C340.6%
Fig 3.
month · Seasonality of sightings — peaks in August, October, and July.
Show data table
Top values for month (20 unique shown, of 32 total).
valuecountshare
August63411.7%
October63211.7%
July61811.4%
September5159.5%
June4688.6%
November4588.5%
May3035.6%
April2594.8%
December2334.3%
January2284.2%
Summer2174.0%
March2013.7%
February1633.0%
Fall1292.4%
Spring961.8%
Winter571.1%
Late60.1%
about60.1%
mid50.1%
or50.1%
Fig 4.
year · Reports skew heavily toward recent decades, centered around 2001.
Show data table
Histogram bins for year (median: 2001.0).
bincount
1870 – 18741
1874 – 18780
1878 – 18820
1882 – 18860
1886 – 18890
1889 – 18931
1893 – 18970
1897 – 19010
1901 – 19050
1905 – 19091
1909 – 19131
1913 – 19160
1916 – 19202
1920 – 19242
1924 – 19282
1928 – 19322
1932 – 19364
1936 – 19402
1940 – 19445
1944 – 19484
1948 – 195115
1951 – 195513
1955 – 195918
1959 – 196324
1963 – 196753
1967 – 1971120
1971 – 1975158
1975 – 1978331
1978 – 1982307
1982 – 1986257
1986 – 1990224
1990 – 1994195
1994 – 1998380
1998 – 2002610
2002 – 2006679
2006 – 2010622
2010 – 2013616
2013 – 2017355
2017 – 2021220
2021 – 2025130
Fig 5.
description · Descriptions are short (median 10 words), suggesting summary-level rather than full narrative text.
Show data table
Character-length distribution for description (mean: 67.04213638883755).
charscount
10 – 152
15 – 214
21 – 2621
26 – 3156
31 – 36108
36 – 42185
42 – 47376
47 – 52551
52 – 57525
57 – 63568
63 – 68692
68 – 73495
73 – 79486
79 – 84369
84 – 89330
89 – 94196
94 – 100135
100 – 10599
105 – 11074
110 – 11642
116 – 12126
121 – 12623
126 – 13110
131 – 1379
137 – 1426
142 – 1474
147 – 1526
152 – 1582
158 – 1631
163 – 1682
168 – 1743
174 – 1790
179 – 1840
184 – 1892
189 – 1951
195 – 2000
200 – 2050
205 – 2100
210 – 2160
216 – 2212
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
idnumeric0.0%
statecategorical0.0%
state_codecategorical0.0%
countytext0.0%
urltext0.0%
monthcategorical3.0%
yearnumeric1.1%
classificationcategorical0.0%
descriptiontext0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
idyear
id+1.00+0.12
year+0.12+1.00

id numeric identifier

This column is almost certainly a row identifier: all 5411 values are unique, none are null, and they span a wide integer range from 60 to 79711. The distribution is right-skewed (skew 0.91) with no outliers flagged, consistent with sparsely allocated record IDs rather than a measured quantity.

Treatment: Drop from modelling features; retain only as a join key.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["id"].stats

statvalue
n5,411
nulls0 (0.0%)
unique5,411
min 60
max 79,711
mean 2.329e+04
median 16,598
std 2.138e+04
q1 4898
q3 3.636e+04
iqr 31,464
skew 0.9109
kurtosis -0.151
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of id. Vertical dash marks the median.
Show data table
Histogram bins for id (median: 16598.0).
bincount
60 – 2051743
2051 – 4043469
4043 – 6034305
6034 – 8025306
8025 – 1.002e+04268
1.002e+04 – 1.201e+04202
1.201e+04 – 1.4e+04198
1.4e+04 – 1.599e+04176
1.599e+04 – 1.798e+04119
1.798e+04 – 1.997e+0481
1.997e+04 – 2.196e+0489
2.196e+04 – 2.396e+04146
2.396e+04 – 2.595e+04254
2.595e+04 – 2.794e+04215
2.794e+04 – 2.993e+04191
2.993e+04 – 3.192e+04105
3.192e+04 – 3.391e+0477
3.391e+04 – 3.59e+0485
3.59e+04 – 3.789e+0498
3.789e+04 – 3.989e+0491
3.989e+04 – 4.188e+04113
4.188e+04 – 4.387e+0490
4.387e+04 – 4.586e+0490
4.586e+04 – 4.785e+0484
4.785e+04 – 4.984e+0471
4.984e+04 – 5.183e+0480
5.183e+04 – 5.382e+0410
5.382e+04 – 5.582e+0433
5.582e+04 – 5.781e+0470
5.781e+04 – 5.98e+0492
5.98e+04 – 6.179e+0418
6.179e+04 – 6.378e+0478
6.378e+04 – 6.577e+0447
6.577e+04 – 6.776e+0465
6.776e+04 – 6.975e+0442
6.975e+04 – 7.175e+048
7.175e+04 – 7.374e+0433
7.374e+04 – 7.573e+0445
7.573e+04 – 7.772e+0450
7.772e+04 – 7.971e+0474

state categorical feature

US state names across 5411 rows with 53 unique values (slightly above the 50 states, suggesting DC, territories, or stray entries) and no nulls. Distribution is fairly even (entropy ratio 0.877) but Washington leads at 11.66% with 631 rows, ahead of California (431) and Ohio (317), which is unusual since California typically dominates US samples.

Treatment: One-hot or target-encode; investigate the 3 extra categories beyond 50 states.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["state"].stats

statvalue
n5,411
nulls0 (0.0%)
unique53
top_value Washington
top_rate 0.1166
cardinality 53
entropy 5.025
entropy_ratio 0.8773
Fig 9.
Top values for state.
Show data table
Top values for state (20 unique shown, of 53 total).
valuecountshare
Washington63111.7%
California4318.0%
Ohio3175.9%
Florida3145.8%
Oregon2534.7%
Illinois2394.4%
Texas2384.4%
Michigan2174.0%
Missouri1613.0%
Georgia1352.5%
Colorado1282.4%
Pennsylvania1252.3%
British Columbia1222.3%
New York1162.1%
Kentucky1152.1%
Arkansas1041.9%
Tennessee1041.9%
West Virginia1041.9%
Oklahoma1011.9%
Idaho991.8%

state_code categorical feature

Two-letter US state codes (53 distinct values, suggesting states plus territories or DC). Distribution is fairly even — entropy ratio 0.877 — but Washington leads at 11.7% (631 rows), with CA, OH, and FL also prominent rather than a population-weighted ranking. No nulls.

Treatment: one-hot or target-encode for modelling; safe to use as-is since complete and low-cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["state_code"].stats

statvalue
n5,411
nulls0 (0.0%)
unique53
top_value wa
top_rate 0.1166
cardinality 53
entropy 5.025
entropy_ratio 0.8773
Fig 10.
Top values for state_code.
Show data table
Top values for state_code (20 unique shown, of 53 total).
valuecountshare
wa63111.7%
ca4318.0%
oh3175.9%
fl3145.8%
or2534.7%
il2394.4%
tx2384.4%
mi2174.0%
mo1613.0%
ga1352.5%
co1282.4%
pa1252.3%
ca-bc1222.3%
ny1162.1%
ky1152.1%
ar1041.9%
tn1041.9%
wv1041.9%
ok1011.9%
id991.8%

county text feature

Single-word US county names (Pierce, Jefferson, Lewis, Snohomish, Skamania suggest a Pacific Northwest tilt), with 1,022 unique values across 5,411 rows. Duplicates dominate at 81.1% (4,389 repeats) which is expected for a categorical, but 338 rows are empty strings rather than nulls — null_rate reads 0.0 only because the blanks aren't typed as null.

Treatment: Coerce empty strings to null, then treat as a categorical (target/frequency encode for modelling).

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["county"].stats

statvalue
n5,411
nulls0 (0.0%)
unique1,022
len_min 0
len_max 23
len_mean 6.621
len_median 7
len_p95 10
word_mean 1
word_median 1
n_empty 338
n_duplicates 4,389
duplicate_rate 0.8111
vocab_size 1,020
readability_flesch_mean 16.9
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: short_text95th-percentile length under 20 chars
alert: duplicates81.1% duplicate strings
Fig 11.
Character-length distribution for county.
Show data table
Character-length distribution for county (mean: 6.620957309184994).
charscount
0 – 1338
1 – 10
1 – 20
2 – 20
2 – 30
3 – 328
3 – 4457
4 – 50
5 – 5640
5 – 60
6 – 61110
6 – 70
7 – 7802
7 – 8916
8 – 90
9 – 9608
9 – 100
10 – 10301
10 – 110
11 – 1262
12 – 1294
12 – 130
13 – 135
13 – 140
14 – 1424
14 – 150
15 – 1616
16 – 163
16 – 170
17 – 173
17 – 180
18 – 180
18 – 190
19 – 203
20 – 200
20 – 210
21 – 210
21 – 220
22 – 220
22 – 231

url text identifier

This column holds a unique BFRO (Bigfoot Field Researchers Organization) report URL for each of the 5411 rows, all following the pattern https://www.bfro.net/gdb/show_report.asp?id=. Every value is unique (n_unique=5411, duplicate_rate=0.0), non-null, and url_rate=1.0, so it functions as a per-row identifier rather than a feature. Lengths cluster tightly between 46 and 49 characters, consistent with the report id being the only varying segment.

Treatment: Drop from modelling; retain as a row-level link or extract the numeric report id as a key.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["url"].stats

statvalue
n5,411
nulls0 (0.0%)
unique5,411
len_min 46
len_max 49
len_mean 48.56
len_median 49
len_p95 49
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 5,411
readability_flesch_mean -301.8
emoji_rate 0
url_rate 1
one_word_rate 1
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: url_heavy100.0% rows contain a URL
Fig 12.
Character-length distribution for url.
Show data table
Character-length distribution for url (mean: 48.55682868231381).
charscount
46 – 4611
46 – 460
46 – 460
46 – 460
46 – 460
46 – 460
46 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 47288
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 481789
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 490
49 – 490
49 – 490
49 – 490
49 – 490
49 – 490
49 – 493323

month categorical feature

Column of month names, presumably the month a record was created or observed. Distribution is seasonal-skewed, with summer/autumn months (August 12.07%, October, July) dominating and winter months trailing. Cardinality is 32, far above the expected 12, which suggests dirty values (typos, abbreviations, or non-month strings) alongside a 2.96% null rate.

Treatment: Normalize to the 12 canonical months (resolve the 20 extra categories) and impute or flag nulls before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["month"].stats

statvalue
n5,411
nulls160 (3.0%)
unique32
top_value August
top_rate 0.1207
cardinality 32
entropy 3.807
entropy_ratio 0.7614
Fig 13.
Top values for month.
Show data table
Top values for month (20 unique shown, of 32 total).
valuecountshare
August63411.7%
October63211.7%
July61811.4%
September5159.5%
June4688.6%
November4588.5%
May3035.6%
April2594.8%
December2334.3%
January2284.2%
Summer2174.0%
March2013.7%
February1633.0%
Fall1292.4%
Spring961.8%
Winter571.1%
Late60.1%
about60.1%
mid50.1%
or50.1%

year numeric timestamp

This column is a year value (likely publication, release, or event year) spanning 1870 to 2025 with a median of 2001 and IQR of 22 years. The distribution is left-skewed (skew -0.97) with a long tail of older entries, and 49 outliers (0.9%) sit on the early end. Null rate is low at 1.05% and there are 99 distinct years.

Treatment: Treat as a temporal feature; consider bucketing by decade or computing age relative to a reference year.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["year"].stats

statvalue
n5,411
nulls57 (1.1%)
unique99
min 1,870
max 2,025
mean 1998
median 2,001
std 15.79
q1 1,987
q3 2,009
iqr 22
skew -0.9738
kurtosis 1.997
n_outliers 49
outlier_rate 0.009152
zero_rate 0
Fig 14.
Distribution of year. Vertical dash marks the median.
Show data table
Histogram bins for year (median: 2001.0).
bincount
1870 – 18741
1874 – 18780
1878 – 18820
1882 – 18860
1886 – 18890
1889 – 18931
1893 – 18970
1897 – 19010
1901 – 19050
1905 – 19091
1909 – 19131
1913 – 19160
1916 – 19202
1920 – 19242
1924 – 19282
1928 – 19322
1932 – 19364
1936 – 19402
1940 – 19445
1944 – 19484
1948 – 195115
1951 – 195513
1955 – 195918
1959 – 196324
1963 – 196753
1967 – 1971120
1971 – 1975158
1975 – 1978331
1978 – 1982307
1982 – 1986257
1986 – 1990224
1990 – 1994195
1994 – 1998380
1998 – 2002610
2002 – 2006679
2006 – 2010622
2010 – 2013616
2013 – 2017355
2017 – 2021220
2021 – 2025130

classification categorical label

A 3-level categorical label, almost certainly the target or stratification class. Class B (2722) and Class A (2655) split the data nearly 50/50, while Class C appears only 34 times — a severe minority that will distort accuracy-style metrics. No nulls across 5411 rows.

Treatment: Use as classification target with stratified splits and class-weighting to handle the Class C minority.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["classification"].stats

statvalue
n5,411
nulls0 (0.0%)
unique3
top_value Class B
top_rate 0.503
cardinality 3
entropy 1.049
entropy_ratio 0.6616
Fig 15.
Top values for classification.
Show data table
Top values for classification (3 unique shown, of 3 total).
valuecountshare
Class B272250.3%
Class A265549.1%
Class C340.6%

description text free_text

Short free-text descriptions, averaging 10.6 words (median 10) and 67 characters, almost certainly capturing sighting reports — top tokens include 'sighting' (1436), 'possible' (1117), 'near' (2283). Values are nearly unique (5407 distinct out of 5411) with only 4 duplicates and no nulls or empties, and Flesch readability of 55.7 suggests fairly plain prose. Vocabulary of 7169 words across this small corpus indicates rich lexical variety rather than templated text.

Treatment: Tokenize and embed (or extract entities) before modelling; do not treat as a categorical.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["description"].stats

statvalue
n5,411
nulls0 (0.0%)
unique5,407
len_min 10
len_max 221
len_mean 67.04
len_median 65
len_p95 101.5
word_mean 10.62
word_median 10
n_empty 0
n_duplicates 4
duplicate_rate 0.0007392
vocab_size 7,169
readability_flesch_mean 55.71
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0.0001848
alert: near_unique99.9% of rows are unique strings
Fig 16.
Character-length distribution for description.
Show data table
Character-length distribution for description (mean: 67.04213638883755).
charscount
10 – 152
15 – 214
21 – 2621
26 – 3156
31 – 36108
36 – 42185
42 – 47376
47 – 52551
52 – 57525
57 – 63568
63 – 68692
68 – 73495
73 – 79486
79 – 84369
84 – 89330
89 – 94196
94 – 100135
100 – 10599
105 – 11074
110 – 11642
116 – 12126
121 – 12623
126 – 13110
131 – 1379
137 – 1426
142 – 1474
147 – 1526
152 – 1582
158 – 1631
163 – 1682
168 – 1743
174 – 1790
179 – 1840
184 – 1892
189 – 1951
195 – 2000
200 – 2050
205 – 2100
210 – 2160
216 – 2212

How to cite

click to copy

BibTeX
@misc{saturn-bigfoot-listings-20260210-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: bigfoot listings 20260210},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/bigfoot-listings_20260210}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: bigfoot listings 20260210. Source: /home/coolhand/html/datavis/data_trove/cache/bigfoot/listings_20260210.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/bigfoot-listings_20260210