data trove ufo sightings analysis

saturn notebook · generated 2026-06-22 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/ufo_by_state.json

Saturn profiled 58 rows across 2 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/ufo_by_state.json",
    "--findings", "data-trove-ufo-sightings-analysis.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset contains UFO sighting counts aggregated by U.S. state, covering all 58 rows with no missing values. The count distribution is heavily right-skewed (skew ~2.93) with high kurtosis and 4 outlier states that far exceed the norm — the max of 16,197 sightings dwarfs the median of 1,510, suggesting a handful of states dominate UFO reports. The state column has one entry per state, so the interesting story is entirely in how unevenly sightings are distributed across states. Look closely at the top states to see which ones are driving the bulk of reported sightings.

citing: row_count · column_count · stats.max · stats.median · stats.mean · stats.skew · stats.kurtosis · n_outliers · outlier_rate · top_value

Out[4]:

saturn.schema() · 2 columns

column	kind	n	null%	unique	alerts
state	categorical	58	0.0%	58	long_tail
count	numeric	58	0.0%	55	high_skew outliers

Fig 1.

count · Look for the extreme right tail — a few states have sighting counts many times higher than the median of 1,510.

Show data table

Histogram bins for count (median: 1510.5).
bin	count
1 – 2315	38
2315 – 4628	13
4628 – 6942	4
6942 – 9256	2
9256 – 1.157e+04	0
1.157e+04 – 1.388e+04	0
1.388e+04 – 1.62e+04	1

Fig 2.

state · Compare sighting counts across all 58 states to identify which handful of states dominate UFO reports.

Show data table

Top values for state (20 unique shown, of 58 total).
value	count	share
CA	1	1.7%
FL	1	1.7%
WA	1	1.7%
TX	1	1.7%
NY	1	1.7%
PA	1	1.7%
AZ	1	1.7%
OH	1	1.7%
IL	1	1.7%
NC	1	1.7%
MI	1	1.7%
OR	1	1.7%
CO	1	1.7%
NJ	1	1.7%
MO	1	1.7%
GA	1	1.7%
IN	1	1.7%
MA	1	1.7%
VA	1	1.7%
WI	1	1.7%

Fig 3.

count · Ranking states by count makes the outlier states immediately visible against the rest of the distribution.

Show data table

Histogram bins for count (median: 1510.5).
bin	count
1 – 2315	38
2315 – 4628	13
4628 – 6942	4
6942 – 9256	2
9256 – 1.157e+04	0
1.157e+04 – 1.388e+04	0
1.388e+04 – 1.62e+04	1

Fig 4.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
state	categorical	0.0%
count	numeric	0.0%

state categorical identifier

This column contains US state abbreviations, with exactly 58 unique values across 58 rows — meaning every row has a distinct state code and the dataset contains one record per state (plus potentially DC and a territory or two beyond the standard 50). Entropy ratio of 1.0 and a top_rate of 0.0172 (1/58) confirm perfectly uniform distribution with zero repetition, making this effectively a lookup key rather than a grouping variable. The long_tail alert is technically correct but misleading — there is no tail, just perfect cardinality.

Treatment: Use as a join key or index; do not one-hot encode or use as a categorical feature without aggregating additional rows per state first.

anthropic:default · confidence high

Out[10]:

saturn.columns["state"].stats

stat	value
n	58
nulls	0 (0.0%)
unique	58
top_value	CA
top_rate	0.01724
cardinality	58
entropy	5.858
entropy_ratio	1
alert: long_tail	58 singleton categories

Fig 5.

Top values for state.

Show data table

Top values for state (20 unique shown, of 58 total).
value	count	share
CA	1	1.7%
FL	1	1.7%
WA	1	1.7%
TX	1	1.7%
NY	1	1.7%
PA	1	1.7%
AZ	1	1.7%
OH	1	1.7%
IL	1	1.7%
NC	1	1.7%
MI	1	1.7%
OR	1	1.7%
CO	1	1.7%
NJ	1	1.7%
MO	1	1.7%
GA	1	1.7%
IN	1	1.7%
MA	1	1.7%
VA	1	1.7%
WI	1	1.7%

count numeric feature

This column appears to be an event or item count, likely representing frequency or volume of some activity across 58 records. The distribution is severely right-skewed (skew = 2.93, kurtosis = 11.75) with a min of 1 and a max of 16,197 against a median of only 1,510.5, indicating a handful of dominant observations pulling the mean (2,274.7) well above the median. Four outliers (≈6.9% of rows) are driving the extreme tail, and the standard deviation (2,642.8) exceeds the mean, confirming high dispersion.

Treatment: Log-transform before modelling to reduce skew; investigate the 4 outliers (max 16,197) for data-quality issues before including them.

anthropic:default · confidence medium

Out[13]:

saturn.columns["count"].stats

stat	value
n	58
nulls	0 (0.0%)
unique	55
min	1
max	16,197
mean	2275
median	1510
std	2643
q1	648.8
q3	2,789
iqr	2140
skew	2.925
kurtosis	11.75
n_outliers	4
outlier_rate	0.06897
zero_rate	0
alert: high_skew	skew=+2.93
alert: outliers	6.9% rows beyond 1.5 IQR

Fig 6.

Distribution of count. Vertical dash marks the median.

Show data table

Histogram bins for count (median: 1510.5).
bin	count
1 – 2315	38
2315 – 4628	13
4628 – 6942	4
6942 – 9256	2
9256 – 1.157e+04	0
1.157e+04 – 1.388e+04	0
1.388e+04 – 1.62e+04	1

How to cite

click to copy

BibTeX

@misc{saturn-data-trove-ufo-sightings-analysis-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove ufo sightings analysis},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-ufo-sightings-analysis}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}

APA

Steuber, L. (2026). Saturn reading: data trove ufo sightings analysis. Source: /home/coolhand/html/datavis/data_trove/data/quirky/ufo_by_state.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-ufo-sightings-analysis