quirky ufo shapes aggregated
Reading
This dataset aggregates UFO sightings by shape, with 28 rows and 5 columns covering shape categories, sighting counts, average durations, and nested sightings/yearly trend data. The numeric fields are highly skewed: avgDuration ranges from 30 to 37,800 with a mean of about 3,749 and skew near 3.95, while count ranges from 1 to 12,877 with a median of just 993.5. Both fields flag outliers worth inspecting — likely a few dominant shape categories pulling the distribution. The shape column has 28 unique values (one row per shape), so it functions as an identifier rather than a grouping variable. Start by looking at which shapes drive the count and duration extremes.
citing: row_count · column_count · columns[avgDuration].stats · columns[count].stats · columns[shape].stats · columns[shape].top_values
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1 – 2576 | 19 |
| 2576 – 5151 | 6 |
| 5151 – 7727 | 2 |
| 7727 – 1.03e+04 | 0 |
| 1.03e+04 – 1.288e+04 | 1 |
Show data table
| bin | count |
|---|---|
| 30 – 7584 | 26 |
| 7584 – 1.514e+04 | 0 |
| 1.514e+04 – 2.269e+04 | 1 |
| 2.269e+04 – 3.025e+04 | 0 |
| 3.025e+04 – 3.78e+04 | 1 |
Show data table
| value | count | share |
|---|---|---|
| light | 1 | 3.6% |
| triangle | 1 | 3.6% |
| circle | 1 | 3.6% |
| fireball | 1 | 3.6% |
| unknown | 1 | 3.6% |
| other | 1 | 3.6% |
| sphere | 1 | 3.6% |
| disk | 1 | 3.6% |
| oval | 1 | 3.6% |
| formation | 1 | 3.6% |
| cigar | 1 | 3.6% |
| changing | 1 | 3.6% |
| flash | 1 | 3.6% |
| rectangle | 1 | 3.6% |
| cylinder | 1 | 3.6% |
| diamond | 1 | 3.6% |
| chevron | 1 | 3.6% |
| teardrop | 1 | 3.6% |
| egg | 1 | 3.6% |
| cone | 1 | 3.6% |
Schema
5 columns| Alerts | ||||
|---|---|---|---|---|
| shape | categorical | 0.0% | 28 |
long_tail
|
| count | numeric | 0.0% | 24 |
high_skew
|
| sightings | unknown | 0.0% | — |
skipped
|
| yearlyTrend | unknown | 0.0% | — |
skipped
|
| avgDuration | numeric | 0.0% | 28 |
high_skew
outliers
|
shape
categorical identifier long_tailThis column enumerates UFO shape descriptors (light, triangle, circle, fireball, sphere, disk, oval, formation, etc.). Every one of the 28 rows holds a distinct value, giving cardinality 28 and entropy_ratio 1.0 — the column behaves as a unique key rather than a categorical feature. The presence of bucket terms like 'unknown' and 'other' alongside specific shapes suggests this is a reference/lookup list of shape categories, not observations. Treatment: Treat as a lookup dimension; left-join on this key rather than using as a model feature.
- n
- 28
- nulls
- 0 (0.0%)
- unique
- 28
- top_value
- light
- top_rate
- 0.03571
- cardinality
- 28
- entropy
- 4.807
- entropy_ratio
- 1
count
numeric feature high_skewNumeric tally column with 28 rows, 24 unique values, no nulls or zeros, ranging from 1 to 12877 with a median of 993.5 and mean of 2163.93. The distribution is heavily right-skewed (skew 2.06, kurtosis 4.84) with one outlier flagged at the high end and an IQR of 3786 against a std of 2876.24. Treatment: log-transform before modelling to tame the right skew.
- n
- 28
- nulls
- 0 (0.0%)
- unique
- 24
- min
- 1
- max
- 12,877
- mean
- 2164
- median
- 993.5
- std
- 2876
- q1
- 134.2
- q3
- 3920
- iqr
- 3,786
- skew
- 2.06
- kurtosis
- 4.845
- n_outliers
- 1
- outlier_rate
- 0.03571
- zero_rate
- 0
sightings
unknown other skippedThe column 'sightings' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. The only confirmed facts are 28 rows and a null rate of 0.0; cardinality and value distribution are unavailable. Treatment: Re-profile or inspect manually to determine type before any downstream use.
- n
- 28
- nulls
- 0 (0.0%)
- unique
- —
yearlyTrend
unknown other skippedThe column 'yearlyTrend' was skipped by the profiler, so its kind is unknown and no statistics were computed beyond a row count of 28 and a null rate of 0.0. With no uniqueness, type, or value signals available, its content and role cannot be inferred from this evidence. Treatment: Re-profile this column with parsing enabled before deciding on any downstream handling.
- n
- 28
- nulls
- 0 (0.0%)
- unique
- —
avgDuration
numeric feature high_skew outliersLikely a per-group average duration metric (probably seconds) summarised across 28 unique entities with no nulls. The distribution is heavily right-skewed (skew 3.95, kurtosis 15.42) with a median of 1906.65 but a max of 37800 — roughly 20x the median — and 2 outliers (7.14%) pulling the mean up to 3748.62. Standard deviation (7305.74) exceeds the mean, confirming a long tail. Treatment: Log-transform before modelling to tame the right tail and outliers.
- n
- 28
- nulls
- 0 (0.0%)
- unique
- 28
- min
- 30
- max
- 37,800
- mean
- 3749
- median
- 1907
- std
- 7306
- q1
- 926.6
- q3
- 3130
- iqr
- 2203
- skew
- 3.948
- kurtosis
- 15.42
- n_outliers
- 2
- outlier_rate
- 0.07143
- zero_rate
- 0