saturn

/home/coolhand/html/datavis/data_trove/cache/quirky/nuforc_sightings.parquet 147,890 rows sample n=147,890 seed 42 2026-05-01T23:39:09+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/quirky/nuforc_sightings.parquet
Total rows147,890
Profiled sample147,890
Columns13
Generated2026-05-01T23:39:09+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset contains 147,890 UFO sighting reports (likely from NUFORC) with 13 columns covering location, shape, duration, witness counts, and free-text descriptions. The Shape field is a clean categorical with 39 values dominated by 'Light' (27,494), 'Circle', and 'Triangle' — a natural starting point for understanding what people report. Duration is text-based but highly repetitive, with '5 minutes' and '2 minutes' as the most common values, suggesting witnesses anchor on round numbers. Watch out for 'No of observers': it is extremely skewed (max 20,000, min -10, skew 109) with ~13% outliers, so it needs cleaning before any quantitative use. Also note that 'Explanation' is 99.5% null — only a tiny fraction of sightings have an official label, with 'Starlink' explanations leading the small set that do.

Sighting high anthropic:claude-opus-4-7

Sighting is almost certainly a row identifier: every one of the 147890 values is unique, there are no nulls, and the distribution is essentially uniform (skew -0.013, kurtosis -1.13) spanning 111 to 179773. The values are not a dense 1..N sequence, suggesting an externally assigned record or sighting ID with gaps. No outliers and no zeros, consistent with an ID rather than a measurement.

Occurred high anthropic:claude-opus-4-7

Timestamp strings of the form 'YYYY-MM-DD HH:MM:SS Local', with length tightly clustered at 25 characters (mean 24.96, p95 25). Stored as text rather than parsed datetimes, and 14.6% of values are duplicates (21,626 rows), with notable spikes on July 4th evenings and one outlier '2015-11-07 18:00:00' appearing 104 times. 299 rows contain just the bare token 'Local' with no date, which will break naive datetime parsing.

Location high anthropic:claude-opus-4-7

Short 'City, State/Region, Country' location strings, averaging 20 characters and 3.6 words, dominated by US entries (Phoenix, Seattle, Las Vegas lead) with 'usa' appearing 17,880 times. The column is highly repetitive: 110,819 of 147,890 rows are duplicates (75%) across only 37,070 unique values, so it behaves like a categorical despite being free text. Language detection flags multilingual content but this mostly reflects short-string misclassification — 4,481 detected as English versus small counts in 27 other codes.

Shape high anthropic:claude-opus-4-7

Categorical descriptor of UFO sighting shapes across 39 distinct values, with 'Light' leading at 19.4% of records (27,494). The distribution is moderately spread (entropy ratio 0.74), and notably 'Other' (10,062) and 'Unknown' (10,021) together rival the second-largest real category, suggesting substantial reporter ambiguity. Null rate is 4.29%, modest but non-trivial.

Duration high anthropic:claude-opus-4-7

This is a free-text duration field, almost always a number-plus-unit phrase like '5 minutes' or '30 seconds' (mean 9.5 chars, ~2 words). Values are highly repetitive: only 15,527 distinct strings across 147,890 rows and an 89% duplicate rate, with a 4.8% null rate. The dominant units are 'minutes' and 'seconds', but the presence of an abbreviated 'min' token signals inconsistent formatting that will need normalisation.

No of observers high anthropic:claude-opus-4-7

Counts of observers per record, with a typical value of 1-2 (median 2, IQR 1) but a maximum of 20000 driving mean 4.6 and std 129.5. Skew of 109.3 and kurtosis of 14332 are extreme, and 13.0% of rows are flagged as outliers. A min of -10 is suspicious for a count, and 6.9% are zero with 4.5% null.

Reported high anthropic:claude-opus-4-7

This is a 'Reported' timestamp stored as text in fixed 27-character format like '1999-11-16 00:00:00 Pacific'. Every value has identical length (min/max/mean = 27) and 3 words, with 'Pacific' appearing as a constant timezone suffix in ~20000 rows. The multilingual alert is a false positive from the language detector misreading dates; the duplicate rate of 7.7% (11418 rows) reflects multiple events sharing a report date.

Posted high anthropic:claude-opus-4-7

This column stores posting dates as datetime strings with zeroed time components, almost certainly a publication or upload timestamp. Across 147,890 rows there are 626 distinct dates with no nulls, and the distribution is remarkably flat — entropy ratio 0.93 and the most common date (2020-06-25) accounting for only 1.24% of rows. The top dates span 1999 to 2023, suggesting the dataset covers more than two decades of activity.

Characteristics high anthropic:claude-opus-4-7

This is a multi-label categorical feature describing observed object characteristics (e.g. "Lights on object", "Aura or haze around object", "Aircraft nearby"), stored as comma-joined tags rather than a structured list. Despite 147,890 rows, only 1,446 distinct strings exist and 98.6% are duplicates, with a tiny vocab of 43 tokens. Watch for the 28.15% null rate and a truncated tag ("Changed Colo", apparently "Changed Color" cut off) that recurs across thousands of rows.

Summary high anthropic:claude-opus-4-7

Free-text summary field with 144,208 unique values across 147,890 rows and a 0.6% null rate, so virtually every record carries its own short description. Lengths are highly skewed: median 76 characters / 13 words but a max of 10,624 characters and a p95 of 479, and mean Flesch readability of 67.3 suggests fairly plain English prose. Top tokens are stopwords plus 'light' (4,676 occurrences), hinting at a recurring topical theme worth investigating; duplicates (1.9%) and boilerplate (<0.1%) are negligible.

Text high anthropic:claude-opus-4-7

Free-text field containing medium-length English prose, averaging 949.9 characters and 181.6 words with a Flesch readability of 69.6, suggesting reviews, comments, or short narratives. The column is near-unique (127,124 unique of 147,890) yet still carries 4,091 exact duplicates (3.1%) and an 11.3% null rate worth investigating. Top words are dominated by English stopwords plus a frequent first-person 'i' and 'my', hinting at personal/subjective writing rather than formal documents.

Location details high anthropic:claude-opus-4-7

Free-text supplementary location notes, populated for only 6.9% of the 147,890 rows (null_rate 0.931). When present, entries are short prose averaging 38.7 characters / 6.9 words with readable Flesch 69.8, and the top tokens ('the', 'of', 'in', 'my', 'from') confirm natural-language descriptions rather than structured place codes. Cardinality is high (9,713 uniques) but 492 exact duplicates (4.8%) hint at recurring phrases worth normalising.

Explanation high anthropic:claude-opus-4-7

Free-form classification labels explaining UFO/sky-object sightings, with categories like 'Starlink - Probable', 'Rocket - Certain', and 'Balloon - Possible' combining an object type with a confidence qualifier. The column is 99.46% null — only ~800 of 147,890 rows carry a value — so it functions as a sparse annotation rather than a primary feature. Among populated rows, 58 distinct labels appear with relatively even spread (entropy ratio 0.82); the top label 'Starlink - Probable' covers just 9.71% of non-nulls.

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

Sighting numeric

rows147,890
null0 (0.0%)
unique147,890
min111.000
max179,773
mean91,984
median91,434
std50,436
q150,146
q3134,515
iqr84,368
skew-0.013
kurtosis-1.134
n_outliers0
outlier_rate0.000
zero_rate0.000

Occurred text

rows147,890
null0 (0.0%)
unique126,264
len_min5
len_max25
len_mean24.960
len_median25.000
len_p9525.000
word_mean2.996
word_median3.000
n_empty0
n_duplicates21,626
duplicate_rate0.146
vocab_size10,098
readability_flesch_mean90.718
emoji_rate0.000
url_rate0.000
one_word_rate2.02e-03
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2012-04-10 22:30:00 Local
  2. 2011-10-13 01:36:00 Local
  3. 2013-04-08 23:35:00 Local
  4. 2017-03-10 00:30:00 Local
  5. 2019-11-29 00:00:00 Local
  6. 2012-12-04 18:45:00 Local
  7. 2005-02-18 04:00:00 Local
  8. 2022-12-28 17:03:00 Local
  9. 2018-01-10 06:45:00 Local
  10. 2020-01-29 19:25:00 Local

Location text

29 languages detected in sample 74.9% duplicate strings
rows147,890
null1 (0.0%)
unique37,070
len_min7
len_max92
len_mean20.378
len_median18.000
len_p9537.000
word_mean3.571
word_median3.000
n_empty0
n_duplicates110,819
duplicate_rate0.749
vocab_size8,750
readability_flesch_mean26.719
emoji_rate6.76e-06
url_rate0.000
one_word_rate6.76e-06
allcaps_rate3.98e-03
boilerplate_rate0.000
Sample values (first 10)
  1. Gaithersburg, MD, USA
  2. Butler, MO, USA
  3. Oklahoma City, OK, USA
  4. Edmonds, WA, USA
  5. Asheville, NC, USA
  6. Texico, IL, USA
  7. Santiago (Chile), , Chile
  8. Foxboro, MA, USA
  9. Toronto (north of) (Canada), ON, Canada
  10. Cheektowaga, NY, USA

Shape categorical

rows147,890
null6,343 (4.3%)
unique39
top_valueLight
top_rate0.194
cardinality39
entropy3.930
entropy_ratio0.744
Top values (rank 1–20)
  1. Light — 27,494
  2. Circle — 14,367
  3. Triangle — 13,086
  4. Other — 10,062
  5. Unknown — 10,021
  6. Fireball — 9,880
  7. Disk — 8,716
  8. Sphere — 7,652
  9. Oval — 6,369
  10. Orb — 5,924
  11. Formation — 4,864
  12. Changing — 3,987
  13. Cigar — 3,753
  14. Rectangle — 2,610
  15. Cylinder — 2,482
  16. Flash — 2,439
  17. Diamond — 2,116
  18. Chevron — 1,742
  19. Egg — 1,289
  20. Teardrop — 1,238

Duration text

95th-percentile length under 20 chars 89.0% duplicate strings
rows147,890
null7,148 (4.8%)
unique15,527
len_min1
len_max37
len_mean9.469
len_median9.000
len_p9515.000
word_mean2.066
word_median2.000
n_empty0
n_duplicates125,215
duplicate_rate0.890
vocab_size1,757
readability_flesch_mean84.409
emoji_rate7.11e-06
url_rate0.000
one_word_rate0.090
allcaps_rate0.030
boilerplate_rate0.000
Sample values (first 10)
  1. 15 Minutes
  2. 15 minutes
  3. 35 min.
  4. 15 minutes
  5. 20
  6. 90 seconds
  7. 5 minutes
  8. 1 hour
  9. 45 minutes
  10. about 15 minutes

No of observers numeric

skew=+109.30 13.0% rows beyond 1.5 IQR
rows147,890
null6,661 (4.5%)
unique137
min-10.000
max20,000
mean4.603
median2.000
std129.522
q11.000
q32.000
iqr1.000
skew109.296
kurtosis14,332
n_outliers18,390
outlier_rate0.130
zero_rate0.069

Reported text

8 languages detected in sample
rows147,890
null1 (0.0%)
unique136,471
len_min27
len_max27
len_mean27.000
len_median27.000
len_p9527.000
word_mean3.000
word_median3.000
n_empty0
n_duplicates11,418
duplicate_rate0.077
vocab_size24,077
readability_flesch_mean62.790
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2012-04-11 08:38:00 Pacific
  2. 2009-06-13 05:38:15 Pacific
  3. 2014-01-29 22:28:38 Pacific
  4. 2022-06-02 08:34:35 Pacific
  5. 2021-01-02 02:46:40 Pacific
  6. 2012-12-04 19:49:39 Pacific
  7. 2002-07-12 10:31:36 Pacific
  8. 2021-12-30 15:36:37 Pacific
  9. 2007-04-01 14:50:54 Pacific
  10. 2020-01-29 18:08:21 Pacific

Posted categorical

rows147,890
null1 (0.0%)
unique626
top_value2020-06-25 00:00:00
top_rate0.012
cardinality626
entropy8.644
entropy_ratio0.930
Top values (rank 1–20)
  1. 2020-06-25 00:00:00 — 1,833
  2. 2009-12-12 00:00:00 — 1,627
  3. 2006-10-30 00:00:00 — 1,573
  4. 2019-12-01 00:00:00 — 1,484
  5. 2010-11-21 00:00:00 — 1,365
  6. 2022-09-09 00:00:00 — 1,333
  7. 1999-11-02 00:00:00 — 1,314
  8. 2020-12-23 00:00:00 — 1,312
  9. 2023-03-06 00:00:00 — 1,274
  10. 2008-10-31 00:00:00 — 1,274
  11. 2022-12-22 00:00:00 — 1,252
  12. 2001-08-05 00:00:00 — 1,229
  13. 2009-03-19 00:00:00 — 1,201
  14. 2009-01-10 00:00:00 — 1,198
  15. 2013-08-30 00:00:00 — 1,142
  16. 2022-03-04 00:00:00 — 1,035
  17. 2023-09-10 00:00:00 — 1,028
  18. 2008-06-12 00:00:00 — 1,023
  19. 2012-09-24 00:00:00 — 1,019
  20. 2011-10-10 00:00:00 — 1,017

Characteristics text

28.1% null 98.6% duplicate strings
rows147,890
null41,631 (28.1%)
unique1,446
len_min6
len_max251
len_mean33.271
len_median26.000
len_p9576.000
word_mean5.613
word_median5.000
n_empty0
n_duplicates104,813
duplicate_rate0.986
vocab_size43
readability_flesch_mean69.461
emoji_rate0.000
url_rate0.000
one_word_rate3.84e-03
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Lights on object
  2. Lights on object
  3. Lights on object, Emitted beams
  4. Lights on object
  5. Lights on object, Emitted other objects, Emitted beams
  6. Lights on object
  7. Lights on object, Animals reacted
  8. Lights on object
  9. Changed Color, Aircraft nearby
  10. Aura or haze around object

Summary text

98.1% of rows are unique strings
rows147,890
null891 (0.6%)
unique144,208
len_min1
len_max10,624
len_mean134.901
len_median76.000
len_p95479.000
word_mean25.061
word_median13.000
n_empty0
n_duplicates2,791
duplicate_rate0.019
vocab_size28,632
readability_flesch_mean67.281
emoji_rate1.09e-04
url_rate7.76e-04
one_word_rate2.33e-03
allcaps_rate0.018
boilerplate_rate8.71e-04
Sample values (first 10)
  1. Large round craft in Madison Heights near 11&I-75 and similar craft over 1-75 in hazel park/ferndale area.
  2. light in sky ,noise in room, temporary paralysis, keep waking back in bed. help me please?
  3. Slow moving triangle over Little Rock, AR.
  4. 8-10 round yellow lights grouped in 2 lines of 4 with one or 2 a short distance away
  5. Brilliant glowing red object in northern Illinois moving from the northwest to the south.
  6. Sphere shaped object with colors moving at a abnormal rate of speed.
  7. White or silver sphere moving fast and erratically across horizon and against the wind
  8. There were orange colored lights coming from the northwest traveling in a line at a fast speed. The most at one time were 3. One appeared to stop overhead until the one behind came to the same point and the first one would dim and disapear. They appeared to be coming over the oce…
  9. Curly que white con trail with sideways V grey trail in front. Very high and fast silver craft.
  10. Hovering craft over houses with several bright white and red flashing lights,picked up speed and disappeared into a field

Text text

96.9% of rows are unique strings
rows147,890
null16,675 (11.3%)
unique127,124
len_min1
len_max64,550
len_mean949.922
len_median682.000
len_p952,673
word_mean181.578
word_median131.000
n_empty0
n_duplicates4,091
duplicate_rate0.031
vocab_size65,106
readability_flesch_mean69.561
emoji_rate1.98e-04
url_rate0.018
one_word_rate6.78e-04
allcaps_rate7.22e-03
boilerplate_rate1.87e-03
Sample values (first 10)
  1. Large spherical object moving westward changing colors and patterns of lights on it. Video is zoomed 16x on my Nikon camera. We spoke briefly via telephone with the witness, and he confirms that the date is correct, but he corrects the time, to reflect a sighting that occurred 5…
  2. I was a passenger and my freind was driving approx. 45 mph when the saucer followed us on passenger side It looked in comarasion of close encounters craft.It was about 500 meters from me.When it left it passed trough a door of deminsion.There was not a doubt.What happened after w…
  3. While waiting to go on duty, I noticed what appeared to be a flare coming from the north in the vicinity of Cape Canaveral. At first I thought it may have been a rocket launch then I realized it was heading in a south westerly direction when all rocket launches always go east tow…
  4. I was turning off of 30th Ave N on to 34th st going north bound and I just looked up at the sky and seen this large silver egg shaped object in the sky moving slowly and then it vary rapidly just changed directions, not like it turned but like it just started moving sideways and …
  5. At exactly 9 pm (21:00 hrs) I was walking my friend out to his truck. It was a clear sky and the stars were very bright. I noticed a somewhat dim, amber light come into view. I have seen a hundred satellites in our desert sky. This was not a satellite. I am an amature astronomer …
  6. There are four beams of light in the clouds going in a circular motion going back to the center, very very big and this is out of the usual. Advertising lights. PD
  7. 8chevron lights flying @approx.2000ft or lower i live in a ravine area as the craft flew over the resonance was high pitched the air temp.was+3C. the visibility was 20000ft.the object was flying due west &would veer 15degrees of west &climb then return to270 &veer15degrees east o…
  8. I witnessed a craft that was not an airplane or a helicopter. It was a boomerang shaped object that was covered in various colored lights, and had many pinpoint lights...like what planes have on their tails.....but there were lots of them. It sparkled like a diamond in the light.…
  9. Witness elects to remain totally anonymous; provides no contact information. PD
  10. At first we noticed lights on the horizon darting up and down, they disapeared and were replaced with a red light that moved slowly down out of the sky towards the house we were standing outside. The light then hovered and I noticed a brighter white light in the middle of the re…

Location details text

95.2% of rows are unique strings 93.1% null
rows147,890
null137,685 (93.1%)
unique9,713
len_min1
len_max197
len_mean38.747
len_median33.000
len_p9592.000
word_mean6.930
word_median6.000
n_empty0
n_duplicates492
duplicate_rate0.048
vocab_size12,384
readability_flesch_mean69.850
emoji_rate0.000
url_rate1.08e-03
one_word_rate0.038
allcaps_rate0.020
boilerplate_rate1.96e-04
Sample values (first 10)
  1. Behind my house on top of a mountain
  2. Huntington Elementary School
  3. 520 Greene Rd Dobson NC.
  4. North of my position
  5. Standing outside Guanabanas restaurant in Jupiter, FL
  6. Above the 210 Freeway nears Fairoaks
  7. TRIANGLE FLOATING LIGHTS
  8. Object was very high
  9. Thumb Broadcasting Studio
  10. over the water near the North Topsail Inlet

Explanation categorical

99.5% null
rows147,890
null147,087 (99.5%)
unique58
top_valueStarlink - Probable
top_rate0.097
cardinality58
entropy4.832
entropy_ratio0.825
Top values (rank 1–20)
  1. Starlink - Probable — 78
  2. Starlink - Certain — 69
  3. Rocket - Certain — 67
  4. Balloon - Possible — 50
  5. Starlink - Possible — 49
  6. Planet/Star - Possible — 42
  7. Planet/Star - Probable — 41
  8. Aircraft - Possible — 35
  9. Aircraft - Probable — 35
  10. Camera Anomaly - Probable — 33
  11. Camera Anomaly - Certain — 25
  12. Rocket - Probable — 24
  13. Bird - Possible — 21
  14. Searchlight - Certain — 18
  15. Balloon - Probable — 18
  16. Drone - Possible — 17
  17. Camera Anomaly - Possible — 15
  18. Bird - Probable — 14
  19. Searchlight - Probable — 12
  20. Rocket - Possible — 11