Sighting is almost certainly a row identifier: every one of the 147890 values is unique, there are no nulls, and the distribution is essentially flat (skew -0.013, kurtosis -1.13) spanning 111 to 179773. The values look like sparse integer IDs rather than a measured quantity, since uniqueness equals row count and no outliers are flagged.
saturn
/home/coolhand/html/datavis/data_trove/cache/quirky/ufo_sightings_20260121.parquet 147,890 rows sample n=147,890 seed 42 2026-05-01T23:37:35+00:00
Overview
| Source | /home/coolhand/html/datavis/data_trove/cache/quirky/ufo_sightings_20260121.parquet |
| Total rows | 147,890 |
| Profiled sample | 147,890 |
| Columns | 13 |
| Generated | 2026-05-01T23:37:35+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
Stored as text but functionally a timestamp: values follow a 'YYYY-MM-DD HH:MM:SS Local' pattern with len_median 25 and word_median 3, and 'local' appears in 20000 of the top words. Of 147890 rows, 126264 are unique, yet 21626 duplicates (14.6%) cluster on holiday-evening times like 2014-07-04 22:00:00 and 2015-11-07 18:00:00. 299 rows contain only the bare token 'Local', indicating missing date/time fragments that need cleaning.
Short 'City, State, Country' location strings, with a mean length of 20.4 characters and 3.6 words. Heavy duplication (74.9%, 110,819 of 147,890 rows) reflects expected city repetition, with Phoenix, Seattle, and Las Vegas leading; the multilingual flag is misleading since 4,481 of detected rows are English and the non-English counts likely come from place names like 'Montréal' or 'São Paulo'. Across 37,070 unique values the dataset skews strongly to USA (17,880 word occurrences) and Canada.
Categorical descriptor of object shape across 147890 rows with 39 distinct values, dominated by 'Light' at 19.4% followed by 'Circle' (14367) and 'Triangle' (13086). Entropy ratio of 0.74 indicates a moderately spread distribution rather than one-class dominance, but the presence of both 'Other' (10062) and 'Unknown' (10021) in the top 5 signals substantial reporter ambiguity. Null rate is 4.3%, and several top categories ('Sphere', 'Orb', 'Circle') are likely semantic duplicates worth consolidating.
Free-text duration field expressing time spans like '5 minutes' or '30 seconds', averaging 2.07 words and 9.5 characters. Values are highly repetitive: 88.97% are duplicates with only 15,527 uniques across 147,890 rows, and the top word 'minutes' appears 8,115 times alongside 'seconds' and 'min'. The mix of units and abbreviations (e.g., 'min' vs 'minutes') signals inconsistent formatting that needs normalization, and 4.83% are null.
Numeric count of observers per record, with a tight typical range (q1=1, median=2, q3=2) but extreme tail reaching 20000 and a nonsensical min of -10. Skew (109.3) and kurtosis (14332.3) confirm a long upper tail, and 13.0% of values flag as outliers while 6.9% are zero and 4.5% are null. The negative minimum and 20000 maximum suggest data-entry errors or sentinel values worth scrubbing.
This is a report-date timestamp stored as a fixed 27-character string with a trailing 'Pacific' timezone label (e.g., '1999-11-16 00:00:00 Pacific'). All 147,890 values share identical length 27 and a 3-word structure, and 'pacific' appears in roughly 20,000 word occurrences, confirming a uniform date+tz format rather than free text. The 'multilingual' alert and small de/ru/zh language counts are spurious artifacts of language detection on numeric date strings, and a 7.7% duplicate rate (11,418 rows) reflects busy reporting days like 1999-11-16 (73 rows).
Posted holds 626 unique date-stamps (all at 00:00:00) across 147,890 rows with no nulls, so it's almost certainly the post date of each record. Distribution is highly diffuse — entropy ratio 0.93 and the top date 2020-06-25 accounts for only 1.24% (1,833 rows) — yet the top values span 1999 through 2023, suggesting a wide historical window rather than a recent dump.
This is a categorical multi-label feature describing observed object characteristics (e.g. "Lights on object", "Aura or haze around object", "Aircraft nearby"), stored as comma-separated tags rather than free text. Despite 147,890 rows, only 1,446 unique strings exist and 98.6% are duplicates, with "Lights on object" alone accounting for 29,911 entries. Note the 28.15% null rate and that the language detector only classified 3,527 rows as English — likely because most values are short tag strings.
Free-text summary field with 144,208 unique values across 147,890 rows, so it's effectively a per-record description rather than a category. Lengths swing wildly from 1 to 10,624 characters (mean 134.9, median 76), and the word distribution is dominated by English stopwords plus 'light' (4,676), suggesting narrative prose about sightings or observations. Readability is conversational (Flesch 67.3), with negligible URLs, emoji, or boilerplate, and only a 1.9% duplicate rate.
Free-text field containing medium-length English prose, averaging 949 characters and 181 words with a Flesch readability of 69.6, suggesting reasonably accessible writing. High lexical diversity (127,124 unique values across 147,890 rows, vocab 65,106) and frequent first-person tokens like 'i' and 'my' point to user-authored narratives such as reviews or stories. Worth flagging: 11.28% nulls, 3.1% exact duplicates (4,091 rows), and a max length of 64,550 characters indicating extreme outliers.
Free-text 'Location details' annotations, present on only ~6.9% of rows (null_rate 0.931) with 9,713 unique values across 147,890 records. Entries are short prose (len_mean 38.7, word_mean 6.9, Flesch 69.8) dominated by English function words like 'the', 'of', 'in', suggesting handwritten descriptions rather than structured location codes. Duplicate_rate 0.048 and 492 repeated strings hint at some recurring phrases, but the field is otherwise near-unique and sparsely populated.
This is a sparse explanatory label classifying observations into likely causes (Starlink, rocket, balloon, aircraft, planet/star, camera anomaly) paired with a confidence qualifier (Certain/Probable/Possible). It is almost entirely empty at 99.46% null, leaving only ~800 labelled rows spread across 58 categories with no dominant class (top_rate 0.0971, entropy_ratio 0.825). The compound 'Cause - Confidence' structure is worth splitting into two fields for cleaner analysis.
Errors during insight pass (1)
dataset:__global__:anthropic:claude-opus-4-7: Json5EOF — ("Unclosed b'array' starting near 892", {'narrative': "This is a dataset of 147,890 UFO sighting reports with 13 columns covering location, shape, duration, witness counts, and free-text narratives. The Shape field is a clean categorical with 39 values dominated by 'Light' (27,494), 'Circle', and 'Triangle', making it the most informative quick view of what people report seeing. Witness counts ('No of observers') are extremely skewed — median is 2 but the max is 20,000 with negative minimums (-10) and over 18,000 outliers, so that column needs cleaning before any analysis. Locations are heavily US-centric, led by Phoenix, Seattle, and Las Vegas, and Duration clusters around round numbers like '5 minutes' and '2 minutes'. Note also that 'Explanation' is 99.5% null — only a tiny labelled subset has an official cause assigned.", 'confidence': 'high', 'evidence_keys': ['columns', 'row_count', 'kinds'], 'featured_charts': [{'column': 'Shape', 'kind': 'bar', 'caption': "Top reported UFO shapes — 'Light' alone accounts for nearly 1 in 5 sightings."}, {'column': 'No of observers', 'kind': 'histogram', 'caption': 'Heavily skewed witness counts; watch for the 20,000 max and negative values that need cleaning.'}, {'column': 'Location', 'kind': 'bar', 'caption': 'Most-reported cities — Phoenix, Seattle, and Las Vegas dominate, confirming a strong US bias.'}, {'column': 'Duration', 'kind': 'bar', 'caption': 'Reported sighting durations cluster on round numbers like 5, 2, and 10 minutes.'}, {'column': 'Text', 'kind': 'length', 'caption': 'Narrative length distribution — median 682 chars but max over 64,000, useful for filtering before NLP.'}]}, None)
Numeric correlation
Languages detected
Per-string language detection across text columns (sampled).
Sighting numeric
Occurred text
Sample values (first 10)
- 2012-04-10 22:30:00 Local
- 2011-10-13 01:36:00 Local
- 2013-04-08 23:35:00 Local
- 2017-03-10 00:30:00 Local
- 2019-11-29 00:00:00 Local
- 2012-12-04 18:45:00 Local
- 2005-02-18 04:00:00 Local
- 2022-12-28 17:03:00 Local
- 2018-01-10 06:45:00 Local
- 2020-01-29 19:25:00 Local
Location text
Sample values (first 10)
- Gaithersburg, MD, USA
- Butler, MO, USA
- Oklahoma City, OK, USA
- Edmonds, WA, USA
- Asheville, NC, USA
- Texico, IL, USA
- Santiago (Chile), , Chile
- Foxboro, MA, USA
- Toronto (north of) (Canada), ON, Canada
- Cheektowaga, NY, USA
Shape categorical
Top values (rank 1–20)
- Light — 27,494
- Circle — 14,367
- Triangle — 13,086
- Other — 10,062
- Unknown — 10,021
- Fireball — 9,880
- Disk — 8,716
- Sphere — 7,652
- Oval — 6,369
- Orb — 5,924
- Formation — 4,864
- Changing — 3,987
- Cigar — 3,753
- Rectangle — 2,610
- Cylinder — 2,482
- Flash — 2,439
- Diamond — 2,116
- Chevron — 1,742
- Egg — 1,289
- Teardrop — 1,238
Duration text
Sample values (first 10)
- 15 Minutes
- 15 minutes
- 35 min.
- 15 minutes
- 20
- 90 seconds
- 5 minutes
- 1 hour
- 45 minutes
- about 15 minutes
No of observers numeric
Reported text
Sample values (first 10)
- 2012-04-11 08:38:00 Pacific
- 2009-06-13 05:38:15 Pacific
- 2014-01-29 22:28:38 Pacific
- 2022-06-02 08:34:35 Pacific
- 2021-01-02 02:46:40 Pacific
- 2012-12-04 19:49:39 Pacific
- 2002-07-12 10:31:36 Pacific
- 2021-12-30 15:36:37 Pacific
- 2007-04-01 14:50:54 Pacific
- 2020-01-29 18:08:21 Pacific
Posted categorical
Top values (rank 1–20)
- 2020-06-25 00:00:00 — 1,833
- 2009-12-12 00:00:00 — 1,627
- 2006-10-30 00:00:00 — 1,573
- 2019-12-01 00:00:00 — 1,484
- 2010-11-21 00:00:00 — 1,365
- 2022-09-09 00:00:00 — 1,333
- 1999-11-02 00:00:00 — 1,314
- 2020-12-23 00:00:00 — 1,312
- 2023-03-06 00:00:00 — 1,274
- 2008-10-31 00:00:00 — 1,274
- 2022-12-22 00:00:00 — 1,252
- 2001-08-05 00:00:00 — 1,229
- 2009-03-19 00:00:00 — 1,201
- 2009-01-10 00:00:00 — 1,198
- 2013-08-30 00:00:00 — 1,142
- 2022-03-04 00:00:00 — 1,035
- 2023-09-10 00:00:00 — 1,028
- 2008-06-12 00:00:00 — 1,023
- 2012-09-24 00:00:00 — 1,019
- 2011-10-10 00:00:00 — 1,017
Characteristics text
Sample values (first 10)
- Lights on object
- Lights on object
- Lights on object, Emitted beams
- Lights on object
- Lights on object, Emitted other objects, Emitted beams
- Lights on object
- Lights on object, Animals reacted
- Lights on object
- Changed Color, Aircraft nearby
- Aura or haze around object
Summary text
Sample values (first 10)
- Large round craft in Madison Heights near 11&I-75 and similar craft over 1-75 in hazel park/ferndale area.
- light in sky ,noise in room, temporary paralysis, keep waking back in bed. help me please?
- Slow moving triangle over Little Rock, AR.
- 8-10 round yellow lights grouped in 2 lines of 4 with one or 2 a short distance away
- Brilliant glowing red object in northern Illinois moving from the northwest to the south.
- Sphere shaped object with colors moving at a abnormal rate of speed.
- White or silver sphere moving fast and erratically across horizon and against the wind
- There were orange colored lights coming from the northwest traveling in a line at a fast speed. The most at one time were 3. One appeared to stop overhead until the one behind came to the same point and the first one would dim and disapear. They appeared to be coming over the oce…
- Curly que white con trail with sideways V grey trail in front. Very high and fast silver craft.
- Hovering craft over houses with several bright white and red flashing lights,picked up speed and disappeared into a field
Text text
Sample values (first 10)
- Large spherical object moving westward changing colors and patterns of lights on it. Video is zoomed 16x on my Nikon camera. We spoke briefly via telephone with the witness, and he confirms that the date is correct, but he corrects the time, to reflect a sighting that occurred 5…
- I was a passenger and my freind was driving approx. 45 mph when the saucer followed us on passenger side It looked in comarasion of close encounters craft.It was about 500 meters from me.When it left it passed trough a door of deminsion.There was not a doubt.What happened after w…
- While waiting to go on duty, I noticed what appeared to be a flare coming from the north in the vicinity of Cape Canaveral. At first I thought it may have been a rocket launch then I realized it was heading in a south westerly direction when all rocket launches always go east tow…
- I was turning off of 30th Ave N on to 34th st going north bound and I just looked up at the sky and seen this large silver egg shaped object in the sky moving slowly and then it vary rapidly just changed directions, not like it turned but like it just started moving sideways and …
- At exactly 9 pm (21:00 hrs) I was walking my friend out to his truck. It was a clear sky and the stars were very bright. I noticed a somewhat dim, amber light come into view. I have seen a hundred satellites in our desert sky. This was not a satellite. I am an amature astronomer …
- There are four beams of light in the clouds going in a circular motion going back to the center, very very big and this is out of the usual. Advertising lights. PD
- 8chevron lights flying @approx.2000ft or lower i live in a ravine area as the craft flew over the resonance was high pitched the air temp.was+3C. the visibility was 20000ft.the object was flying due west &would veer 15degrees of west &climb then return to270 &veer15degrees east o…
- I witnessed a craft that was not an airplane or a helicopter. It was a boomerang shaped object that was covered in various colored lights, and had many pinpoint lights...like what planes have on their tails.....but there were lots of them. It sparkled like a diamond in the light.…
- Witness elects to remain totally anonymous; provides no contact information. PD
- At first we noticed lights on the horizon darting up and down, they disapeared and were replaced with a red light that moved slowly down out of the sky towards the house we were standing outside. The light then hovered and I noticed a brighter white light in the middle of the re…
Location details text
Sample values (first 10)
- Behind my house on top of a mountain
- Huntington Elementary School
- 520 Greene Rd Dobson NC.
- North of my position
- Standing outside Guanabanas restaurant in Jupiter, FL
- Above the 210 Freeway nears Fairoaks
- TRIANGLE FLOATING LIGHTS
- Object was very high
- Thumb Broadcasting Studio
- over the water near the North Topsail Inlet
Explanation categorical
Top values (rank 1–20)
- Starlink - Probable — 78
- Starlink - Certain — 69
- Rocket - Certain — 67
- Balloon - Possible — 50
- Starlink - Possible — 49
- Planet/Star - Possible — 42
- Planet/Star - Probable — 41
- Aircraft - Possible — 35
- Aircraft - Probable — 35
- Camera Anomaly - Probable — 33
- Camera Anomaly - Certain — 25
- Rocket - Probable — 24
- Bird - Possible — 21
- Searchlight - Certain — 18
- Balloon - Probable — 18
- Drone - Possible — 17
- Camera Anomaly - Possible — 15
- Bird - Probable — 14
- Searchlight - Probable — 12
- Rocket - Possible — 11