saturn

/home/coolhand/html/datavis/data_trove/cache/quirky/ufo_sightings_20260121.parquet 147,890 rows sample n=147,890 seed 42 2026-05-01T23:37:35+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/quirky/ufo_sightings_20260121.parquet
Total rows147,890
Profiled sample147,890
Columns13
Generated2026-05-01T23:37:35+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Sighting high anthropic:claude-opus-4-7

Sighting is almost certainly a row identifier: every one of the 147890 values is unique, there are no nulls, and the distribution is essentially flat (skew -0.013, kurtosis -1.13) spanning 111 to 179773. The values look like sparse integer IDs rather than a measured quantity, since uniqueness equals row count and no outliers are flagged.

Occurred high anthropic:claude-opus-4-7

Stored as text but functionally a timestamp: values follow a 'YYYY-MM-DD HH:MM:SS Local' pattern with len_median 25 and word_median 3, and 'local' appears in 20000 of the top words. Of 147890 rows, 126264 are unique, yet 21626 duplicates (14.6%) cluster on holiday-evening times like 2014-07-04 22:00:00 and 2015-11-07 18:00:00. 299 rows contain only the bare token 'Local', indicating missing date/time fragments that need cleaning.

Location high anthropic:claude-opus-4-7

Short 'City, State, Country' location strings, with a mean length of 20.4 characters and 3.6 words. Heavy duplication (74.9%, 110,819 of 147,890 rows) reflects expected city repetition, with Phoenix, Seattle, and Las Vegas leading; the multilingual flag is misleading since 4,481 of detected rows are English and the non-English counts likely come from place names like 'Montréal' or 'São Paulo'. Across 37,070 unique values the dataset skews strongly to USA (17,880 word occurrences) and Canada.

Shape high anthropic:claude-opus-4-7

Categorical descriptor of object shape across 147890 rows with 39 distinct values, dominated by 'Light' at 19.4% followed by 'Circle' (14367) and 'Triangle' (13086). Entropy ratio of 0.74 indicates a moderately spread distribution rather than one-class dominance, but the presence of both 'Other' (10062) and 'Unknown' (10021) in the top 5 signals substantial reporter ambiguity. Null rate is 4.3%, and several top categories ('Sphere', 'Orb', 'Circle') are likely semantic duplicates worth consolidating.

Duration high anthropic:claude-opus-4-7

Free-text duration field expressing time spans like '5 minutes' or '30 seconds', averaging 2.07 words and 9.5 characters. Values are highly repetitive: 88.97% are duplicates with only 15,527 uniques across 147,890 rows, and the top word 'minutes' appears 8,115 times alongside 'seconds' and 'min'. The mix of units and abbreviations (e.g., 'min' vs 'minutes') signals inconsistent formatting that needs normalization, and 4.83% are null.

No of observers high anthropic:claude-opus-4-7

Numeric count of observers per record, with a tight typical range (q1=1, median=2, q3=2) but extreme tail reaching 20000 and a nonsensical min of -10. Skew (109.3) and kurtosis (14332.3) confirm a long upper tail, and 13.0% of values flag as outliers while 6.9% are zero and 4.5% are null. The negative minimum and 20000 maximum suggest data-entry errors or sentinel values worth scrubbing.

Reported high anthropic:claude-opus-4-7

This is a report-date timestamp stored as a fixed 27-character string with a trailing 'Pacific' timezone label (e.g., '1999-11-16 00:00:00 Pacific'). All 147,890 values share identical length 27 and a 3-word structure, and 'pacific' appears in roughly 20,000 word occurrences, confirming a uniform date+tz format rather than free text. The 'multilingual' alert and small de/ru/zh language counts are spurious artifacts of language detection on numeric date strings, and a 7.7% duplicate rate (11,418 rows) reflects busy reporting days like 1999-11-16 (73 rows).

Posted high anthropic:claude-opus-4-7

Posted holds 626 unique date-stamps (all at 00:00:00) across 147,890 rows with no nulls, so it's almost certainly the post date of each record. Distribution is highly diffuse — entropy ratio 0.93 and the top date 2020-06-25 accounts for only 1.24% (1,833 rows) — yet the top values span 1999 through 2023, suggesting a wide historical window rather than a recent dump.

Characteristics high anthropic:claude-opus-4-7

This is a categorical multi-label feature describing observed object characteristics (e.g. "Lights on object", "Aura or haze around object", "Aircraft nearby"), stored as comma-separated tags rather than free text. Despite 147,890 rows, only 1,446 unique strings exist and 98.6% are duplicates, with "Lights on object" alone accounting for 29,911 entries. Note the 28.15% null rate and that the language detector only classified 3,527 rows as English — likely because most values are short tag strings.

Summary high anthropic:claude-opus-4-7

Free-text summary field with 144,208 unique values across 147,890 rows, so it's effectively a per-record description rather than a category. Lengths swing wildly from 1 to 10,624 characters (mean 134.9, median 76), and the word distribution is dominated by English stopwords plus 'light' (4,676), suggesting narrative prose about sightings or observations. Readability is conversational (Flesch 67.3), with negligible URLs, emoji, or boilerplate, and only a 1.9% duplicate rate.

Text high anthropic:claude-opus-4-7

Free-text field containing medium-length English prose, averaging 949 characters and 181 words with a Flesch readability of 69.6, suggesting reasonably accessible writing. High lexical diversity (127,124 unique values across 147,890 rows, vocab 65,106) and frequent first-person tokens like 'i' and 'my' point to user-authored narratives such as reviews or stories. Worth flagging: 11.28% nulls, 3.1% exact duplicates (4,091 rows), and a max length of 64,550 characters indicating extreme outliers.

Location details high anthropic:claude-opus-4-7

Free-text 'Location details' annotations, present on only ~6.9% of rows (null_rate 0.931) with 9,713 unique values across 147,890 records. Entries are short prose (len_mean 38.7, word_mean 6.9, Flesch 69.8) dominated by English function words like 'the', 'of', 'in', suggesting handwritten descriptions rather than structured location codes. Duplicate_rate 0.048 and 492 repeated strings hint at some recurring phrases, but the field is otherwise near-unique and sparsely populated.

Explanation high anthropic:claude-opus-4-7

This is a sparse explanatory label classifying observations into likely causes (Starlink, rocket, balloon, aircraft, planet/star, camera anomaly) paired with a confidence qualifier (Certain/Probable/Possible). It is almost entirely empty at 99.46% null, leaving only ~800 labelled rows spread across 58 categories with no dominant class (top_rate 0.0971, entropy_ratio 0.825). The compound 'Cause - Confidence' structure is worth splitting into two fields for cleaner analysis.

Errors during insight pass (1)
  • dataset:__global__:anthropic:claude-opus-4-7: Json5EOF — ("Unclosed b'array' starting near 892", {'narrative': "This is a dataset of 147,890 UFO sighting reports with 13 columns covering location, shape, duration, witness counts, and free-text narratives. The Shape field is a clean categorical with 39 values dominated by 'Light' (27,494), 'Circle', and 'Triangle', making it the most informative quick view of what people report seeing. Witness counts ('No of observers') are extremely skewed — median is 2 but the max is 20,000 with negative minimums (-10) and over 18,000 outliers, so that column needs cleaning before any analysis. Locations are heavily US-centric, led by Phoenix, Seattle, and Las Vegas, and Duration clusters around round numbers like '5 minutes' and '2 minutes'. Note also that 'Explanation' is 99.5% null — only a tiny labelled subset has an official cause assigned.", 'confidence': 'high', 'evidence_keys': ['columns', 'row_count', 'kinds'], 'featured_charts': [{'column': 'Shape', 'kind': 'bar', 'caption': "Top reported UFO shapes — 'Light' alone accounts for nearly 1 in 5 sightings."}, {'column': 'No of observers', 'kind': 'histogram', 'caption': 'Heavily skewed witness counts; watch for the 20,000 max and negative values that need cleaning.'}, {'column': 'Location', 'kind': 'bar', 'caption': 'Most-reported cities — Phoenix, Seattle, and Las Vegas dominate, confirming a strong US bias.'}, {'column': 'Duration', 'kind': 'bar', 'caption': 'Reported sighting durations cluster on round numbers like 5, 2, and 10 minutes.'}, {'column': 'Text', 'kind': 'length', 'caption': 'Narrative length distribution — median 682 chars but max over 64,000, useful for filtering before NLP.'}]}, None)

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

Sighting numeric

rows147,890
null0 (0.0%)
unique147,890
min111.000
max179,773
mean91,984
median91,434
std50,436
q150,146
q3134,515
iqr84,368
skew-0.013
kurtosis-1.134
n_outliers0
outlier_rate0.000
zero_rate0.000

Occurred text

rows147,890
null0 (0.0%)
unique126,264
len_min5
len_max25
len_mean24.960
len_median25.000
len_p9525.000
word_mean2.996
word_median3.000
n_empty0
n_duplicates21,626
duplicate_rate0.146
vocab_size10,098
readability_flesch_mean90.718
emoji_rate0.000
url_rate0.000
one_word_rate2.02e-03
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2012-04-10 22:30:00 Local
  2. 2011-10-13 01:36:00 Local
  3. 2013-04-08 23:35:00 Local
  4. 2017-03-10 00:30:00 Local
  5. 2019-11-29 00:00:00 Local
  6. 2012-12-04 18:45:00 Local
  7. 2005-02-18 04:00:00 Local
  8. 2022-12-28 17:03:00 Local
  9. 2018-01-10 06:45:00 Local
  10. 2020-01-29 19:25:00 Local

Location text

29 languages detected in sample 74.9% duplicate strings
rows147,890
null1 (0.0%)
unique37,070
len_min7
len_max92
len_mean20.378
len_median18.000
len_p9537.000
word_mean3.571
word_median3.000
n_empty0
n_duplicates110,819
duplicate_rate0.749
vocab_size8,750
readability_flesch_mean26.719
emoji_rate6.76e-06
url_rate0.000
one_word_rate6.76e-06
allcaps_rate3.98e-03
boilerplate_rate0.000
Sample values (first 10)
  1. Gaithersburg, MD, USA
  2. Butler, MO, USA
  3. Oklahoma City, OK, USA
  4. Edmonds, WA, USA
  5. Asheville, NC, USA
  6. Texico, IL, USA
  7. Santiago (Chile), , Chile
  8. Foxboro, MA, USA
  9. Toronto (north of) (Canada), ON, Canada
  10. Cheektowaga, NY, USA

Shape categorical

rows147,890
null6,343 (4.3%)
unique39
top_valueLight
top_rate0.194
cardinality39
entropy3.930
entropy_ratio0.744
Top values (rank 1–20)
  1. Light — 27,494
  2. Circle — 14,367
  3. Triangle — 13,086
  4. Other — 10,062
  5. Unknown — 10,021
  6. Fireball — 9,880
  7. Disk — 8,716
  8. Sphere — 7,652
  9. Oval — 6,369
  10. Orb — 5,924
  11. Formation — 4,864
  12. Changing — 3,987
  13. Cigar — 3,753
  14. Rectangle — 2,610
  15. Cylinder — 2,482
  16. Flash — 2,439
  17. Diamond — 2,116
  18. Chevron — 1,742
  19. Egg — 1,289
  20. Teardrop — 1,238

Duration text

95th-percentile length under 20 chars 89.0% duplicate strings
rows147,890
null7,148 (4.8%)
unique15,527
len_min1
len_max37
len_mean9.469
len_median9.000
len_p9515.000
word_mean2.066
word_median2.000
n_empty0
n_duplicates125,215
duplicate_rate0.890
vocab_size1,757
readability_flesch_mean84.409
emoji_rate7.11e-06
url_rate0.000
one_word_rate0.090
allcaps_rate0.030
boilerplate_rate0.000
Sample values (first 10)
  1. 15 Minutes
  2. 15 minutes
  3. 35 min.
  4. 15 minutes
  5. 20
  6. 90 seconds
  7. 5 minutes
  8. 1 hour
  9. 45 minutes
  10. about 15 minutes

No of observers numeric

skew=+109.30 13.0% rows beyond 1.5 IQR
rows147,890
null6,661 (4.5%)
unique137
min-10.000
max20,000
mean4.603
median2.000
std129.522
q11.000
q32.000
iqr1.000
skew109.296
kurtosis14,332
n_outliers18,390
outlier_rate0.130
zero_rate0.069

Reported text

8 languages detected in sample
rows147,890
null1 (0.0%)
unique136,471
len_min27
len_max27
len_mean27.000
len_median27.000
len_p9527.000
word_mean3.000
word_median3.000
n_empty0
n_duplicates11,418
duplicate_rate0.077
vocab_size24,077
readability_flesch_mean62.790
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2012-04-11 08:38:00 Pacific
  2. 2009-06-13 05:38:15 Pacific
  3. 2014-01-29 22:28:38 Pacific
  4. 2022-06-02 08:34:35 Pacific
  5. 2021-01-02 02:46:40 Pacific
  6. 2012-12-04 19:49:39 Pacific
  7. 2002-07-12 10:31:36 Pacific
  8. 2021-12-30 15:36:37 Pacific
  9. 2007-04-01 14:50:54 Pacific
  10. 2020-01-29 18:08:21 Pacific

Posted categorical

rows147,890
null1 (0.0%)
unique626
top_value2020-06-25 00:00:00
top_rate0.012
cardinality626
entropy8.644
entropy_ratio0.930
Top values (rank 1–20)
  1. 2020-06-25 00:00:00 — 1,833
  2. 2009-12-12 00:00:00 — 1,627
  3. 2006-10-30 00:00:00 — 1,573
  4. 2019-12-01 00:00:00 — 1,484
  5. 2010-11-21 00:00:00 — 1,365
  6. 2022-09-09 00:00:00 — 1,333
  7. 1999-11-02 00:00:00 — 1,314
  8. 2020-12-23 00:00:00 — 1,312
  9. 2023-03-06 00:00:00 — 1,274
  10. 2008-10-31 00:00:00 — 1,274
  11. 2022-12-22 00:00:00 — 1,252
  12. 2001-08-05 00:00:00 — 1,229
  13. 2009-03-19 00:00:00 — 1,201
  14. 2009-01-10 00:00:00 — 1,198
  15. 2013-08-30 00:00:00 — 1,142
  16. 2022-03-04 00:00:00 — 1,035
  17. 2023-09-10 00:00:00 — 1,028
  18. 2008-06-12 00:00:00 — 1,023
  19. 2012-09-24 00:00:00 — 1,019
  20. 2011-10-10 00:00:00 — 1,017

Characteristics text

28.1% null 98.6% duplicate strings
rows147,890
null41,631 (28.1%)
unique1,446
len_min6
len_max251
len_mean33.271
len_median26.000
len_p9576.000
word_mean5.613
word_median5.000
n_empty0
n_duplicates104,813
duplicate_rate0.986
vocab_size43
readability_flesch_mean69.461
emoji_rate0.000
url_rate0.000
one_word_rate3.84e-03
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. Lights on object
  2. Lights on object
  3. Lights on object, Emitted beams
  4. Lights on object
  5. Lights on object, Emitted other objects, Emitted beams
  6. Lights on object
  7. Lights on object, Animals reacted
  8. Lights on object
  9. Changed Color, Aircraft nearby
  10. Aura or haze around object

Summary text

98.1% of rows are unique strings
rows147,890
null891 (0.6%)
unique144,208
len_min1
len_max10,624
len_mean134.901
len_median76.000
len_p95479.000
word_mean25.061
word_median13.000
n_empty0
n_duplicates2,791
duplicate_rate0.019
vocab_size28,632
readability_flesch_mean67.281
emoji_rate1.09e-04
url_rate7.76e-04
one_word_rate2.33e-03
allcaps_rate0.018
boilerplate_rate8.71e-04
Sample values (first 10)
  1. Large round craft in Madison Heights near 11&I-75 and similar craft over 1-75 in hazel park/ferndale area.
  2. light in sky ,noise in room, temporary paralysis, keep waking back in bed. help me please?
  3. Slow moving triangle over Little Rock, AR.
  4. 8-10 round yellow lights grouped in 2 lines of 4 with one or 2 a short distance away
  5. Brilliant glowing red object in northern Illinois moving from the northwest to the south.
  6. Sphere shaped object with colors moving at a abnormal rate of speed.
  7. White or silver sphere moving fast and erratically across horizon and against the wind
  8. There were orange colored lights coming from the northwest traveling in a line at a fast speed. The most at one time were 3. One appeared to stop overhead until the one behind came to the same point and the first one would dim and disapear. They appeared to be coming over the oce…
  9. Curly que white con trail with sideways V grey trail in front. Very high and fast silver craft.
  10. Hovering craft over houses with several bright white and red flashing lights,picked up speed and disappeared into a field

Text text

96.9% of rows are unique strings
rows147,890
null16,675 (11.3%)
unique127,124
len_min1
len_max64,550
len_mean949.922
len_median682.000
len_p952,673
word_mean181.578
word_median131.000
n_empty0
n_duplicates4,091
duplicate_rate0.031
vocab_size65,106
readability_flesch_mean69.561
emoji_rate1.98e-04
url_rate0.018
one_word_rate6.78e-04
allcaps_rate7.22e-03
boilerplate_rate1.87e-03
Sample values (first 10)
  1. Large spherical object moving westward changing colors and patterns of lights on it. Video is zoomed 16x on my Nikon camera. We spoke briefly via telephone with the witness, and he confirms that the date is correct, but he corrects the time, to reflect a sighting that occurred 5…
  2. I was a passenger and my freind was driving approx. 45 mph when the saucer followed us on passenger side It looked in comarasion of close encounters craft.It was about 500 meters from me.When it left it passed trough a door of deminsion.There was not a doubt.What happened after w…
  3. While waiting to go on duty, I noticed what appeared to be a flare coming from the north in the vicinity of Cape Canaveral. At first I thought it may have been a rocket launch then I realized it was heading in a south westerly direction when all rocket launches always go east tow…
  4. I was turning off of 30th Ave N on to 34th st going north bound and I just looked up at the sky and seen this large silver egg shaped object in the sky moving slowly and then it vary rapidly just changed directions, not like it turned but like it just started moving sideways and …
  5. At exactly 9 pm (21:00 hrs) I was walking my friend out to his truck. It was a clear sky and the stars were very bright. I noticed a somewhat dim, amber light come into view. I have seen a hundred satellites in our desert sky. This was not a satellite. I am an amature astronomer …
  6. There are four beams of light in the clouds going in a circular motion going back to the center, very very big and this is out of the usual. Advertising lights. PD
  7. 8chevron lights flying @approx.2000ft or lower i live in a ravine area as the craft flew over the resonance was high pitched the air temp.was+3C. the visibility was 20000ft.the object was flying due west &would veer 15degrees of west &climb then return to270 &veer15degrees east o…
  8. I witnessed a craft that was not an airplane or a helicopter. It was a boomerang shaped object that was covered in various colored lights, and had many pinpoint lights...like what planes have on their tails.....but there were lots of them. It sparkled like a diamond in the light.…
  9. Witness elects to remain totally anonymous; provides no contact information. PD
  10. At first we noticed lights on the horizon darting up and down, they disapeared and were replaced with a red light that moved slowly down out of the sky towards the house we were standing outside. The light then hovered and I noticed a brighter white light in the middle of the re…

Location details text

95.2% of rows are unique strings 93.1% null
rows147,890
null137,685 (93.1%)
unique9,713
len_min1
len_max197
len_mean38.747
len_median33.000
len_p9592.000
word_mean6.930
word_median6.000
n_empty0
n_duplicates492
duplicate_rate0.048
vocab_size12,384
readability_flesch_mean69.850
emoji_rate0.000
url_rate1.08e-03
one_word_rate0.038
allcaps_rate0.020
boilerplate_rate1.96e-04
Sample values (first 10)
  1. Behind my house on top of a mountain
  2. Huntington Elementary School
  3. 520 Greene Rd Dobson NC.
  4. North of my position
  5. Standing outside Guanabanas restaurant in Jupiter, FL
  6. Above the 210 Freeway nears Fairoaks
  7. TRIANGLE FLOATING LIGHTS
  8. Object was very high
  9. Thumb Broadcasting Studio
  10. over the water near the North Topsail Inlet

Explanation categorical

99.5% null
rows147,890
null147,087 (99.5%)
unique58
top_valueStarlink - Probable
top_rate0.097
cardinality58
entropy4.832
entropy_ratio0.825
Top values (rank 1–20)
  1. Starlink - Probable — 78
  2. Starlink - Certain — 69
  3. Rocket - Certain — 67
  4. Balloon - Possible — 50
  5. Starlink - Possible — 49
  6. Planet/Star - Possible — 42
  7. Planet/Star - Probable — 41
  8. Aircraft - Possible — 35
  9. Aircraft - Probable — 35
  10. Camera Anomaly - Probable — 33
  11. Camera Anomaly - Certain — 25
  12. Rocket - Probable — 24
  13. Bird - Possible — 21
  14. Searchlight - Certain — 18
  15. Balloon - Probable — 18
  16. Drone - Possible — 17
  17. Camera Anomaly - Possible — 15
  18. Bird - Probable — 14
  19. Searchlight - Probable — 12
  20. Rocket - Possible — 11