saturn

/home/coolhand/html/datavis/data_trove/data/wild/bigfoot_sightings.json 5,411 rows sample n=5,411 seed 42 2026-06-22T00:50:12+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/wild/bigfoot_sightings.json
Total rows5,411
Profiled sample5,411
Columns9
Generated2026-06-22T00:50:12+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
idnumeric0.0%
statecategorical0.0%
state_codecategorical0.0%
countytext0.0%
urltext0.0%
monthcategorical3.0%
yearnumeric1.1%
classificationcategorical0.0%
descriptiontext0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset contains 5,411 Bigfoot sighting reports sourced from the Bigfoot Field Researchers Organization (BFRO), covering sightings across 53 U.S. states and territories with attributes including location, date, classification, and a short description. Washington state dominates with 631 reports (about 12% of all sightings), followed by California and Ohio, suggesting strong geographic clustering worth examining. The temporal distribution is skewed toward more recent decades — median year is 2001 with records stretching back to 1870 — raising questions about whether sightings are truly increasing or simply better reported. Sighting classifications split almost evenly between Class A (direct sightings, 2,655) and Class B (indirect evidence, 2,722), with Class C being rare at just 34 reports.

county high anthropic:default

This column contains US county names, functioning as a categorical geographic label with 1,022 unique values across 5,411 rows. The duplicate rate is high at 81.1%, which is expected for a county field where many records share the same geography. Notably, 338 rows (6.2%) have empty strings rather than nulls, masking true missingness since null_rate reports 0.0. The top counties — Pierce, Jefferson, Lewis, Washington, Snohomish — suggest a Pacific Northwest-heavy dataset, but the presence of Humboldt and Polk hints at multi-state coverage.

url high anthropic:default

This column contains unique URLs pointing to individual report pages on bfro.net (the Bigfoot Field Researchers Organization database), all following the pattern `https://www.bfro.net/gdb/show_report.asp?id=`. Every one of the 5,411 rows holds a distinct URL (duplicate_rate: 0.0, n_unique: 5411), with lengths tightly clustered between 46 and 49 characters (len_mean: 48.56, len_median: 49.0), reflecting only variation in the numeric report ID. This column is effectively a primary or foreign key into the BFRO report database — not a content feature — and carries no modelling signal on its own.

description high anthropic:default

This column contains short free-text descriptions of reported sightings — most likely UFO or wildlife sighting reports, inferred from the high-frequency terms 'sighting', 'near', and 'possible'. With 5,407 unique values out of 5,411 rows and zero nulls, entries are nearly all distinct; the 4 duplicates (duplicate_rate 0.00074) are negligible. Mean length of ~67 characters and ~10 words per entry suggests structured-but-natural one-line summaries rather than long narratives. Flesch readability of 55.7 indicates plain, accessible prose with a vocabulary of 7,169 unique tokens across the corpus.

id high anthropic:default

This column is a numeric row identifier: all 5,411 values are unique, there are no nulls, and the zero rate is 0.0, consistent with a primary key or surrogate ID. The IDs are not sequential (range 60–79,711 with a mean of ~23,288 and median of ~16,598), suggesting they originate from a larger parent table or were assigned non-contiguously. Mild positive skew (0.91) indicates more records cluster in lower ID ranges, but the near-zero kurtosis (−0.15) and absence of outliers confirm a broadly spread, roughly uniform-ish distribution rather than a tightly clustered one.

year high anthropic:default

This column represents calendar years for records in the dataset, spanning 1870 to 2025 with 99 distinct values. The distribution is left-skewed (skew = -0.974) with a mean of ~1998 and IQR of 22 years (1987–2009), meaning the bulk of records cluster in the late 20th to early 21st century while a thin tail extends back to 1870. The 49 outliers (0.9%) likely correspond to those historically distant records, and analysts should verify whether pre-20th-century entries are genuine or data quality issues.

classification high anthropic:default

This column is a three-level ordinal or nominal classification label applied to all 5,411 rows with no nulls. The distribution is nearly balanced between 'Class B' (2,722; 50.3%) and 'Class A' (2,655; 49.1%), but 'Class C' is severely underrepresented with only 34 instances (~0.6%), which would surprise any analyst expecting a balanced multi-class target and will require oversampling or class-weight adjustments if used as a target variable.

state high anthropic:default

This column contains US state names, with all 50 states likely represented plus Washington D.C. and possibly territories (53 unique values, 0 nulls across 5,411 rows). Washington dominates at 11.7% (631 records), roughly 1.5× California's share (431), suggesting a dataset with geographic bias toward the Pacific Northwest. Entropy ratio of 0.877 indicates reasonably broad distribution across states, though concentration in a handful of large/coastal states is apparent.

state_code high anthropic:default

This column contains US state abbreviations (plus possibly DC and territories, explaining the 53 distinct values vs. 50 states). Washington ('wa') is notably over-represented at 11.7% of 5,411 rows — roughly 1.5× California ('ca') and nearly 2× Ohio ('oh') — suggesting a geographic skew toward the Pacific Northwest rather than a nationally representative sample. Entropy ratio of 0.877 indicates reasonably broad distribution across states, but the top-heavy concentration in 'wa' is worth flagging.

month medium anthropic:default

This column represents calendar month names, but with a cardinality of 32 instead of the expected 12, there are clearly duplicate or variant entries beyond the standard month labels — likely encoding errors, alternate spellings, or appended year/year-month combinations. The distribution is notably skewed toward summer and early-autumn months (August 634, October 632, July 618), with winter months dramatically underrepresented (December 233, January 228), suggesting seasonal bias in data collection. The entropy_ratio of 0.761 across 32 unique values rather than 12 is a strong flag that this field is dirty and needs normalisation before use.

Numeric correlation

Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
idyear
id+1.00+0.12
year+0.12+1.00

id numeric

rows5,411
null0 (0.0%)
unique5,411
min60.000
max79,711
mean23,288
median16,598
std21,383
q14,898
q336,362
iqr31,464
skew0.911
kurtosis-0.151
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for id (median: 16598.0).
bincount
60 – 2051743
2051 – 4043469
4043 – 6034305
6034 – 8025306
8025 – 1.002e+04268
1.002e+04 – 1.201e+04202
1.201e+04 – 1.4e+04198
1.4e+04 – 1.599e+04176
1.599e+04 – 1.798e+04119
1.798e+04 – 1.997e+0481
1.997e+04 – 2.196e+0489
2.196e+04 – 2.396e+04146
2.396e+04 – 2.595e+04254
2.595e+04 – 2.794e+04215
2.794e+04 – 2.993e+04191
2.993e+04 – 3.192e+04105
3.192e+04 – 3.391e+0477
3.391e+04 – 3.59e+0485
3.59e+04 – 3.789e+0498
3.789e+04 – 3.989e+0491
3.989e+04 – 4.188e+04113
4.188e+04 – 4.387e+0490
4.387e+04 – 4.586e+0490
4.586e+04 – 4.785e+0484
4.785e+04 – 4.984e+0471
4.984e+04 – 5.183e+0480
5.183e+04 – 5.382e+0410
5.382e+04 – 5.582e+0433
5.582e+04 – 5.781e+0470
5.781e+04 – 5.98e+0492
5.98e+04 – 6.179e+0418
6.179e+04 – 6.378e+0478
6.378e+04 – 6.577e+0447
6.577e+04 – 6.776e+0465
6.776e+04 – 6.975e+0442
6.975e+04 – 7.175e+048
7.175e+04 – 7.374e+0433
7.374e+04 – 7.573e+0445
7.573e+04 – 7.772e+0450
7.772e+04 – 7.971e+0474

state categorical

rows5,411
null0 (0.0%)
unique53
top_valueWashington
top_rate0.117
cardinality53
entropy5.025
entropy_ratio0.877
Show data table
Top values for state (20 unique shown, of 53 total).
valuecountshare
Washington63111.7%
California4318.0%
Ohio3175.9%
Florida3145.8%
Oregon2534.7%
Illinois2394.4%
Texas2384.4%
Michigan2174.0%
Missouri1613.0%
Georgia1352.5%
Colorado1282.4%
Pennsylvania1252.3%
British Columbia1222.3%
New York1162.1%
Kentucky1152.1%
Arkansas1041.9%
Tennessee1041.9%
West Virginia1041.9%
Oklahoma1011.9%
Idaho991.8%
Top values (rank 1–20)
  1. Washington — 631
  2. California — 431
  3. Ohio — 317
  4. Florida — 314
  5. Oregon — 253
  6. Illinois — 239
  7. Texas — 238
  8. Michigan — 217
  9. Missouri — 161
  10. Georgia — 135
  11. Colorado — 128
  12. Pennsylvania — 125
  13. British Columbia — 122
  14. New York — 116
  15. Kentucky — 115
  16. Arkansas — 104
  17. Tennessee — 104
  18. West Virginia — 104
  19. Oklahoma — 101
  20. Idaho — 99

state_code categorical

rows5,411
null0 (0.0%)
unique53
top_valuewa
top_rate0.117
cardinality53
entropy5.025
entropy_ratio0.877
Show data table
Top values for state_code (20 unique shown, of 53 total).
valuecountshare
wa63111.7%
ca4318.0%
oh3175.9%
fl3145.8%
or2534.7%
il2394.4%
tx2384.4%
mi2174.0%
mo1613.0%
ga1352.5%
co1282.4%
pa1252.3%
ca-bc1222.3%
ny1162.1%
ky1152.1%
ar1041.9%
tn1041.9%
wv1041.9%
ok1011.9%
id991.8%
Top values (rank 1–20)
  1. wa — 631
  2. ca — 431
  3. oh — 317
  4. fl — 314
  5. or — 253
  6. il — 239
  7. tx — 238
  8. mi — 217
  9. mo — 161
  10. ga — 135
  11. co — 128
  12. pa — 125
  13. ca-bc — 122
  14. ny — 116
  15. ky — 115
  16. ar — 104
  17. tn — 104
  18. wv — 104
  19. ok — 101
  20. id — 99

county text

100.0% rows are a single word 95th-percentile length under 20 chars 81.1% duplicate strings
rows5,411
null0 (0.0%)
unique1,022
len_min0
len_max23
len_mean6.621
len_median7.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty338
n_duplicates4,389
duplicate_rate0.811
vocab_size1,020
readability_flesch_mean16.897
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for county (mean: 6.620957309184994).
charscount
0 – 1338
1 – 10
1 – 20
2 – 20
2 – 30
3 – 328
3 – 4457
4 – 50
5 – 5640
5 – 60
6 – 61110
6 – 70
7 – 7802
7 – 8916
8 – 90
9 – 9608
9 – 100
10 – 10301
10 – 110
11 – 1262
12 – 1294
12 – 130
13 – 135
13 – 140
14 – 1424
14 – 150
15 – 1616
16 – 163
16 – 170
17 – 173
17 – 180
18 – 180
18 – 190
19 – 203
20 – 200
20 – 210
21 – 210
21 – 220
22 – 220
22 – 231
Sample values (first 10)
  1. Bibb
  2. Houston
  3. Lewis
  4. Cowlitz
  5. Montmorency
  6. Cass
  7. Skamania
  8. Ferry
  9. Navajo

url text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows contain a URL
rows5,411
null0 (0.0%)
unique5,411
len_min46
len_max49
len_mean48.557
len_median49.000
len_p9549.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size5,411
readability_flesch_mean-301.780
emoji_rate0.000
url_rate1.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for url (mean: 48.55682868231381).
charscount
46 – 4611
46 – 460
46 – 460
46 – 460
46 – 460
46 – 460
46 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 47288
47 – 470
47 – 470
47 – 470
47 – 470
47 – 470
47 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 481789
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 480
48 – 490
49 – 490
49 – 490
49 – 490
49 – 490
49 – 490
49 – 493323
Sample values (first 10)
  1. https://www.bfro.net/GDB/show_report.asp?id=21714
  2. https://www.bfro.net/GDB/show_report.asp?id=7604
  3. https://www.bfro.net/GDB/show_report.asp?id=12803
  4. https://www.bfro.net/GDB/show_report.asp?id=55658
  5. https://www.bfro.net/GDB/show_report.asp?id=27009
  6. https://www.bfro.net/GDB/show_report.asp?id=43994
  7. https://www.bfro.net/GDB/show_report.asp?id=14942
  8. https://www.bfro.net/GDB/show_report.asp?id=21473
  9. https://www.bfro.net/GDB/show_report.asp?id=25722
  10. https://www.bfro.net/GDB/show_report.asp?id=29358

month categorical

rows5,411
null160 (3.0%)
unique32
top_valueAugust
top_rate0.121
cardinality32
entropy3.807
entropy_ratio0.761
Show data table
Top values for month (20 unique shown, of 32 total).
valuecountshare
August63411.7%
October63211.7%
July61811.4%
September5159.5%
June4688.6%
November4588.5%
May3035.6%
April2594.8%
December2334.3%
January2284.2%
Summer2174.0%
March2013.7%
February1633.0%
Fall1292.4%
Spring961.8%
Winter571.1%
Late60.1%
about60.1%
mid50.1%
or50.1%
Top values (rank 1–20)
  1. August — 634
  2. October — 632
  3. July — 618
  4. September — 515
  5. June — 468
  6. November — 458
  7. May — 303
  8. April — 259
  9. December — 233
  10. January — 228
  11. Summer — 217
  12. March — 201
  13. February — 163
  14. Fall — 129
  15. Spring — 96
  16. Winter — 57
  17. Late — 6
  18. about — 6
  19. mid — 5
  20. or — 5

year numeric

rows5,411
null57 (1.1%)
unique99
min1,870
max2,025
mean1,998
median2,001
std15.785
q11,987
q32,009
iqr22.000
skew-0.974
kurtosis1.997
n_outliers49
outlier_rate9.15e-03
zero_rate0.000
Show data table
Histogram bins for year (median: 2001.0).
bincount
1870 – 18741
1874 – 18780
1878 – 18820
1882 – 18860
1886 – 18890
1889 – 18931
1893 – 18970
1897 – 19010
1901 – 19050
1905 – 19091
1909 – 19131
1913 – 19160
1916 – 19202
1920 – 19242
1924 – 19282
1928 – 19322
1932 – 19364
1936 – 19402
1940 – 19445
1944 – 19484
1948 – 195115
1951 – 195513
1955 – 195918
1959 – 196324
1963 – 196753
1967 – 1971120
1971 – 1975158
1975 – 1978331
1978 – 1982307
1982 – 1986257
1986 – 1990224
1990 – 1994195
1994 – 1998380
1998 – 2002610
2002 – 2006679
2006 – 2010622
2010 – 2013616
2013 – 2017355
2017 – 2021220
2021 – 2025130

classification categorical

rows5,411
null0 (0.0%)
unique3
top_valueClass B
top_rate0.503
cardinality3
entropy1.049
entropy_ratio0.662
Show data table
Top values for classification (3 unique shown, of 3 total).
valuecountshare
Class B272250.3%
Class A265549.1%
Class C340.6%
Top values (rank 1–20)
  1. Class B — 2,722
  2. Class A — 2,655
  3. Class C — 34

description text

99.9% of rows are unique strings
rows5,411
null0 (0.0%)
unique5,407
len_min10
len_max221
len_mean67.042
len_median65.000
len_p95101.500
word_mean10.621
word_median10.000
n_empty0
n_duplicates4
duplicate_rate7.39e-04
vocab_size7,169
readability_flesch_mean55.705
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate1.85e-04
Show data table
Character-length distribution for description (mean: 67.04213638883755).
charscount
10 – 152
15 – 214
21 – 2621
26 – 3156
31 – 36108
36 – 42185
42 – 47376
47 – 52551
52 – 57525
57 – 63568
63 – 68692
68 – 73495
73 – 79486
79 – 84369
84 – 89330
89 – 94196
94 – 100135
100 – 10599
105 – 11074
110 – 11642
116 – 12126
121 – 12623
126 – 13110
131 – 1379
137 – 1426
142 – 1474
147 – 1526
152 – 1582
158 – 1631
163 – 1682
168 – 1743
174 – 1790
179 – 1840
184 – 1892
189 – 1951
195 – 2000
200 – 2050
205 – 2100
210 – 2160
216 – 2212
Sample values (first 10)
  1. Rescue workers describes possible stalking on the Cahaba River outside Montevallo
  2. Possible bigfoot activity near Walker County line.
  3. Hikers off Lewis River Trail find large footprint east of Cougar
  4. Hunters on bikes have close encounter with a sasquatch near Randle
  5. Woman recalls daylight sighting while driving and a possible incident at a home east of Gaylord
  6. Possible sighting near Bowser on Vancouver Island
  7. County workers find possible footprints near Bena
  8. Man and girlfriend, camping, hear loud footsteps and tree knocking near Cispus, Washington
  9. Possible reoccurring activity at a hunting spot near Republic
  10. Brief daylight sighting within the city limits of Show Low