saturn·

quirky ufo shapes aggregated

source /home/coolhand/html/datavis/data_trove/data/quirky/ufo_shapes_aggregated.json 28 rows 5 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset aggregates UFO sightings by shape, with 28 rows and 5 columns covering shape categories, sighting counts, average durations, and nested sightings/yearly trend data. The numeric fields are highly skewed: avgDuration ranges from 30 to 37,800 with a mean of about 3,749 and skew near 3.95, while count ranges from 1 to 12,877 with a median of just 993.5. Both fields flag outliers worth inspecting — likely a few dominant shape categories pulling the distribution. The shape column has 28 unique values (one row per shape), so it functions as an identifier rather than a grouping variable. Start by looking at which shapes drive the count and duration extremes.

citing: row_count · column_count · columns[avgDuration].stats · columns[count].stats · columns[shape].stats · columns[shape].top_values

Schema

5 columns
Per-column summary. Click column name to jump to its detail.
Alerts
shape categorical 0.0% 28
long_tail
count numeric 0.0% 24
high_skew
sightings unknown 0.0%
skipped
yearlyTrend unknown 0.0%
skipped
avgDuration numeric 0.0% 28
high_skew outliers

shape

categorical identifier long_tail
This column enumerates UFO shape descriptors (light, triangle, circle, fireball, sphere, disk, oval, formation, etc.). Every one of the 28 rows holds a distinct value, giving cardinality 28 and entropy_ratio 1.0 — the column behaves as a unique key rather than a categorical feature. The presence of bucket terms like 'unknown' and 'other' alongside specific shapes suggests this is a reference/lookup list of shape categories, not observations. Treatment: Treat as a lookup dimension; left-join on this key rather than using as a model feature. high · anthropic:claude-opus-4-7
n
28
nulls
0 (0.0%)
unique
28
top_value
light
top_rate
0.03571
cardinality
28
entropy
4.807
entropy_ratio
1

count

numeric feature high_skew
Numeric tally column with 28 rows, 24 unique values, no nulls or zeros, ranging from 1 to 12877 with a median of 993.5 and mean of 2163.93. The distribution is heavily right-skewed (skew 2.06, kurtosis 4.84) with one outlier flagged at the high end and an IQR of 3786 against a std of 2876.24. Treatment: log-transform before modelling to tame the right skew. high · anthropic:claude-opus-4-7
n
28
nulls
0 (0.0%)
unique
24
min
1
max
12,877
mean
2164
median
993.5
std
2876
q1
134.2
q3
3920
iqr
3,786
skew
2.06
kurtosis
4.845
n_outliers
1
outlier_rate
0.03571
zero_rate
0

sightings

unknown other skipped
The column 'sightings' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. The only confirmed facts are 28 rows and a null rate of 0.0; cardinality and value distribution are unavailable. Treatment: Re-profile or inspect manually to determine type before any downstream use. low · anthropic:claude-opus-4-7
n
28
nulls
0 (0.0%)
unique

yearlyTrend

unknown other skipped
The column 'yearlyTrend' was skipped by the profiler, so its kind is unknown and no statistics were computed beyond a row count of 28 and a null rate of 0.0. With no uniqueness, type, or value signals available, its content and role cannot be inferred from this evidence. Treatment: Re-profile this column with parsing enabled before deciding on any downstream handling. low · anthropic:claude-opus-4-7
n
28
nulls
0 (0.0%)
unique

avgDuration

numeric feature high_skew outliers
Likely a per-group average duration metric (probably seconds) summarised across 28 unique entities with no nulls. The distribution is heavily right-skewed (skew 3.95, kurtosis 15.42) with a median of 1906.65 but a max of 37800 — roughly 20x the median — and 2 outliers (7.14%) pulling the mean up to 3748.62. Standard deviation (7305.74) exceeds the mean, confirming a long tail. Treatment: Log-transform before modelling to tame the right tail and outliers. high · anthropic:claude-opus-4-7
n
28
nulls
0 (0.0%)
unique
28
min
30
max
37,800
mean
3749
median
1907
std
7306
q1
926.6
q3
3130
iqr
2203
skew
3.948
kurtosis
15.42
n_outliers
2
outlier_rate
0.07143
zero_rate
0