This dataset contains 147,890 UFO sighting reports (likely from NUFORC) with 13 columns covering location, shape, duration, witness counts, and free-text descriptions. The Shape field is a clean categorical with 39 values dominated by 'Light' (27,494), 'Circle', and 'Triangle' — a natural starting point for understanding what people report. Duration is text-based but highly repetitive, with '5 minutes' and '2 minutes' as the most common values, suggesting witnesses anchor on round numbers. Watch out for 'No of observers': it is extremely skewed (max 20,000, min -10, skew 109) with ~13% outliers, so it needs cleaning before any quantitative use. Also note that 'Explanation' is 99.5% null — only a tiny fraction of sightings have an official label, with 'Starlink' explanations leading the small set that do.
saturn
/home/coolhand/html/datavis/data_trove/cache/quirky/nuforc_sightings.parquet 147,890 rows sample n=147,890 seed 42 2026-05-01T23:39:09+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/cache/quirky/nuforc_sightings.parquet |
| Total rows | 147,890 |
| Profiled sample | 147,890 |
| Columns | 13 |
| Generated | 2026-05-01T23:39:09+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
Sighting is almost certainly a row identifier: every one of the 147890 values is unique, there are no nulls, and the distribution is essentially uniform (skew -0.013, kurtosis -1.13) spanning 111 to 179773. The values are not a dense 1..N sequence, suggesting an externally assigned record or sighting ID with gaps. No outliers and no zeros, consistent with an ID rather than a measurement.
Timestamp strings of the form 'YYYY-MM-DD HH:MM:SS Local', with length tightly clustered at 25 characters (mean 24.96, p95 25). Stored as text rather than parsed datetimes, and 14.6% of values are duplicates (21,626 rows), with notable spikes on July 4th evenings and one outlier '2015-11-07 18:00:00' appearing 104 times. 299 rows contain just the bare token 'Local' with no date, which will break naive datetime parsing.
Short 'City, State/Region, Country' location strings, averaging 20 characters and 3.6 words, dominated by US entries (Phoenix, Seattle, Las Vegas lead) with 'usa' appearing 17,880 times. The column is highly repetitive: 110,819 of 147,890 rows are duplicates (75%) across only 37,070 unique values, so it behaves like a categorical despite being free text. Language detection flags multilingual content but this mostly reflects short-string misclassification — 4,481 detected as English versus small counts in 27 other codes.
Categorical descriptor of UFO sighting shapes across 39 distinct values, with 'Light' leading at 19.4% of records (27,494). The distribution is moderately spread (entropy ratio 0.74), and notably 'Other' (10,062) and 'Unknown' (10,021) together rival the second-largest real category, suggesting substantial reporter ambiguity. Null rate is 4.29%, modest but non-trivial.
This is a free-text duration field, almost always a number-plus-unit phrase like '5 minutes' or '30 seconds' (mean 9.5 chars, ~2 words). Values are highly repetitive: only 15,527 distinct strings across 147,890 rows and an 89% duplicate rate, with a 4.8% null rate. The dominant units are 'minutes' and 'seconds', but the presence of an abbreviated 'min' token signals inconsistent formatting that will need normalisation.
Counts of observers per record, with a typical value of 1-2 (median 2, IQR 1) but a maximum of 20000 driving mean 4.6 and std 129.5. Skew of 109.3 and kurtosis of 14332 are extreme, and 13.0% of rows are flagged as outliers. A min of -10 is suspicious for a count, and 6.9% are zero with 4.5% null.
This is a 'Reported' timestamp stored as text in fixed 27-character format like '1999-11-16 00:00:00 Pacific'. Every value has identical length (min/max/mean = 27) and 3 words, with 'Pacific' appearing as a constant timezone suffix in ~20000 rows. The multilingual alert is a false positive from the language detector misreading dates; the duplicate rate of 7.7% (11418 rows) reflects multiple events sharing a report date.
This column stores posting dates as datetime strings with zeroed time components, almost certainly a publication or upload timestamp. Across 147,890 rows there are 626 distinct dates with no nulls, and the distribution is remarkably flat — entropy ratio 0.93 and the most common date (2020-06-25) accounting for only 1.24% of rows. The top dates span 1999 to 2023, suggesting the dataset covers more than two decades of activity.
This is a multi-label categorical feature describing observed object characteristics (e.g. "Lights on object", "Aura or haze around object", "Aircraft nearby"), stored as comma-joined tags rather than a structured list. Despite 147,890 rows, only 1,446 distinct strings exist and 98.6% are duplicates, with a tiny vocab of 43 tokens. Watch for the 28.15% null rate and a truncated tag ("Changed Colo", apparently "Changed Color" cut off) that recurs across thousands of rows.
Free-text summary field with 144,208 unique values across 147,890 rows and a 0.6% null rate, so virtually every record carries its own short description. Lengths are highly skewed: median 76 characters / 13 words but a max of 10,624 characters and a p95 of 479, and mean Flesch readability of 67.3 suggests fairly plain English prose. Top tokens are stopwords plus 'light' (4,676 occurrences), hinting at a recurring topical theme worth investigating; duplicates (1.9%) and boilerplate (<0.1%) are negligible.
Free-text field containing medium-length English prose, averaging 949.9 characters and 181.6 words with a Flesch readability of 69.6, suggesting reviews, comments, or short narratives. The column is near-unique (127,124 unique of 147,890) yet still carries 4,091 exact duplicates (3.1%) and an 11.3% null rate worth investigating. Top words are dominated by English stopwords plus a frequent first-person 'i' and 'my', hinting at personal/subjective writing rather than formal documents.
Free-text supplementary location notes, populated for only 6.9% of the 147,890 rows (null_rate 0.931). When present, entries are short prose averaging 38.7 characters / 6.9 words with readable Flesch 69.8, and the top tokens ('the', 'of', 'in', 'my', 'from') confirm natural-language descriptions rather than structured place codes. Cardinality is high (9,713 uniques) but 492 exact duplicates (4.8%) hint at recurring phrases worth normalising.
Free-form classification labels explaining UFO/sky-object sightings, with categories like 'Starlink - Probable', 'Rocket - Certain', and 'Balloon - Possible' combining an object type with a confidence qualifier. The column is 99.46% null — only ~800 of 147,890 rows carry a value — so it functions as a sparse annotation rather than a primary feature. Among populated rows, 58 distinct labels appear with relatively even spread (entropy ratio 0.82); the top label 'Starlink - Probable' covers just 9.71% of non-nulls.
Numeric correlation
Languages detected
Per-string language detection across text columns (sampled).
Sighting numeric
Occurred text
Sample values (first 10)
- 2012-04-10 22:30:00 Local
- 2011-10-13 01:36:00 Local
- 2013-04-08 23:35:00 Local
- 2017-03-10 00:30:00 Local
- 2019-11-29 00:00:00 Local
- 2012-12-04 18:45:00 Local
- 2005-02-18 04:00:00 Local
- 2022-12-28 17:03:00 Local
- 2018-01-10 06:45:00 Local
- 2020-01-29 19:25:00 Local
Location text
Sample values (first 10)
- Gaithersburg, MD, USA
- Butler, MO, USA
- Oklahoma City, OK, USA
- Edmonds, WA, USA
- Asheville, NC, USA
- Texico, IL, USA
- Santiago (Chile), , Chile
- Foxboro, MA, USA
- Toronto (north of) (Canada), ON, Canada
- Cheektowaga, NY, USA
Shape categorical
Top values (rank 1–20)
- Light — 27,494
- Circle — 14,367
- Triangle — 13,086
- Other — 10,062
- Unknown — 10,021
- Fireball — 9,880
- Disk — 8,716
- Sphere — 7,652
- Oval — 6,369
- Orb — 5,924
- Formation — 4,864
- Changing — 3,987
- Cigar — 3,753
- Rectangle — 2,610
- Cylinder — 2,482
- Flash — 2,439
- Diamond — 2,116
- Chevron — 1,742
- Egg — 1,289
- Teardrop — 1,238
Duration text
Sample values (first 10)
- 15 Minutes
- 15 minutes
- 35 min.
- 15 minutes
- 20
- 90 seconds
- 5 minutes
- 1 hour
- 45 minutes
- about 15 minutes
No of observers numeric
Reported text
Sample values (first 10)
- 2012-04-11 08:38:00 Pacific
- 2009-06-13 05:38:15 Pacific
- 2014-01-29 22:28:38 Pacific
- 2022-06-02 08:34:35 Pacific
- 2021-01-02 02:46:40 Pacific
- 2012-12-04 19:49:39 Pacific
- 2002-07-12 10:31:36 Pacific
- 2021-12-30 15:36:37 Pacific
- 2007-04-01 14:50:54 Pacific
- 2020-01-29 18:08:21 Pacific
Posted categorical
Top values (rank 1–20)
- 2020-06-25 00:00:00 — 1,833
- 2009-12-12 00:00:00 — 1,627
- 2006-10-30 00:00:00 — 1,573
- 2019-12-01 00:00:00 — 1,484
- 2010-11-21 00:00:00 — 1,365
- 2022-09-09 00:00:00 — 1,333
- 1999-11-02 00:00:00 — 1,314
- 2020-12-23 00:00:00 — 1,312
- 2023-03-06 00:00:00 — 1,274
- 2008-10-31 00:00:00 — 1,274
- 2022-12-22 00:00:00 — 1,252
- 2001-08-05 00:00:00 — 1,229
- 2009-03-19 00:00:00 — 1,201
- 2009-01-10 00:00:00 — 1,198
- 2013-08-30 00:00:00 — 1,142
- 2022-03-04 00:00:00 — 1,035
- 2023-09-10 00:00:00 — 1,028
- 2008-06-12 00:00:00 — 1,023
- 2012-09-24 00:00:00 — 1,019
- 2011-10-10 00:00:00 — 1,017
Characteristics text
Sample values (first 10)
- Lights on object
- Lights on object
- Lights on object, Emitted beams
- Lights on object
- Lights on object, Emitted other objects, Emitted beams
- Lights on object
- Lights on object, Animals reacted
- Lights on object
- Changed Color, Aircraft nearby
- Aura or haze around object
Summary text
Sample values (first 10)
- Large round craft in Madison Heights near 11&I-75 and similar craft over 1-75 in hazel park/ferndale area.
- light in sky ,noise in room, temporary paralysis, keep waking back in bed. help me please?
- Slow moving triangle over Little Rock, AR.
- 8-10 round yellow lights grouped in 2 lines of 4 with one or 2 a short distance away
- Brilliant glowing red object in northern Illinois moving from the northwest to the south.
- Sphere shaped object with colors moving at a abnormal rate of speed.
- White or silver sphere moving fast and erratically across horizon and against the wind
- There were orange colored lights coming from the northwest traveling in a line at a fast speed. The most at one time were 3. One appeared to stop overhead until the one behind came to the same point and the first one would dim and disapear. They appeared to be coming over the oce…
- Curly que white con trail with sideways V grey trail in front. Very high and fast silver craft.
- Hovering craft over houses with several bright white and red flashing lights,picked up speed and disappeared into a field
Text text
Sample values (first 10)
- Large spherical object moving westward changing colors and patterns of lights on it. Video is zoomed 16x on my Nikon camera. We spoke briefly via telephone with the witness, and he confirms that the date is correct, but he corrects the time, to reflect a sighting that occurred 5…
- I was a passenger and my freind was driving approx. 45 mph when the saucer followed us on passenger side It looked in comarasion of close encounters craft.It was about 500 meters from me.When it left it passed trough a door of deminsion.There was not a doubt.What happened after w…
- While waiting to go on duty, I noticed what appeared to be a flare coming from the north in the vicinity of Cape Canaveral. At first I thought it may have been a rocket launch then I realized it was heading in a south westerly direction when all rocket launches always go east tow…
- I was turning off of 30th Ave N on to 34th st going north bound and I just looked up at the sky and seen this large silver egg shaped object in the sky moving slowly and then it vary rapidly just changed directions, not like it turned but like it just started moving sideways and …
- At exactly 9 pm (21:00 hrs) I was walking my friend out to his truck. It was a clear sky and the stars were very bright. I noticed a somewhat dim, amber light come into view. I have seen a hundred satellites in our desert sky. This was not a satellite. I am an amature astronomer …
- There are four beams of light in the clouds going in a circular motion going back to the center, very very big and this is out of the usual. Advertising lights. PD
- 8chevron lights flying @approx.2000ft or lower i live in a ravine area as the craft flew over the resonance was high pitched the air temp.was+3C. the visibility was 20000ft.the object was flying due west &would veer 15degrees of west &climb then return to270 &veer15degrees east o…
- I witnessed a craft that was not an airplane or a helicopter. It was a boomerang shaped object that was covered in various colored lights, and had many pinpoint lights...like what planes have on their tails.....but there were lots of them. It sparkled like a diamond in the light.…
- Witness elects to remain totally anonymous; provides no contact information. PD
- At first we noticed lights on the horizon darting up and down, they disapeared and were replaced with a red light that moved slowly down out of the sky towards the house we were standing outside. The light then hovered and I noticed a brighter white light in the middle of the re…
Location details text
Sample values (first 10)
- Behind my house on top of a mountain
- Huntington Elementary School
- 520 Greene Rd Dobson NC.
- North of my position
- Standing outside Guanabanas restaurant in Jupiter, FL
- Above the 210 Freeway nears Fairoaks
- TRIANGLE FLOATING LIGHTS
- Object was very high
- Thumb Broadcasting Studio
- over the water near the North Topsail Inlet
Explanation categorical
Top values (rank 1–20)
- Starlink - Probable — 78
- Starlink - Certain — 69
- Rocket - Certain — 67
- Balloon - Possible — 50
- Starlink - Possible — 49
- Planet/Star - Possible — 42
- Planet/Star - Probable — 41
- Aircraft - Possible — 35
- Aircraft - Probable — 35
- Camera Anomaly - Probable — 33
- Camera Anomaly - Certain — 25
- Rocket - Probable — 24
- Bird - Possible — 21
- Searchlight - Certain — 18
- Balloon - Probable — 18
- Drone - Possible — 17
- Camera Anomaly - Possible — 15
- Bird - Probable — 14
- Searchlight - Probable — 12
- Rocket - Possible — 11