data trove ufo sightings analysis
Reading
This dataset contains UFO sighting counts aggregated by U.S. state, covering all 58 rows with no missing values. The count distribution is heavily right-skewed (skew ~2.93) with high kurtosis and 4 outlier states that far exceed the norm — the max of 16,197 sightings dwarfs the median of 1,510, suggesting a handful of states dominate UFO reports. The state column has one entry per state, so the interesting story is entirely in how unevenly sightings are distributed across states. Look closely at the top states to see which ones are driving the bulk of reported sightings.
citing: row_count · column_count · stats.max · stats.median · stats.mean · stats.skew · stats.kurtosis · n_outliers · outlier_rate · top_value
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1 – 2315 | 38 |
| 2315 – 4628 | 13 |
| 4628 – 6942 | 4 |
| 6942 – 9256 | 2 |
| 9256 – 1.157e+04 | 0 |
| 1.157e+04 – 1.388e+04 | 0 |
| 1.388e+04 – 1.62e+04 | 1 |
Show data table
| value | count | share |
|---|---|---|
| CA | 1 | 1.7% |
| FL | 1 | 1.7% |
| WA | 1 | 1.7% |
| TX | 1 | 1.7% |
| NY | 1 | 1.7% |
| PA | 1 | 1.7% |
| AZ | 1 | 1.7% |
| OH | 1 | 1.7% |
| IL | 1 | 1.7% |
| NC | 1 | 1.7% |
| MI | 1 | 1.7% |
| OR | 1 | 1.7% |
| CO | 1 | 1.7% |
| NJ | 1 | 1.7% |
| MO | 1 | 1.7% |
| GA | 1 | 1.7% |
| IN | 1 | 1.7% |
| MA | 1 | 1.7% |
| VA | 1 | 1.7% |
| WI | 1 | 1.7% |
Show data table
| bin | count |
|---|---|
| 1 – 2315 | 38 |
| 2315 – 4628 | 13 |
| 4628 – 6942 | 4 |
| 6942 – 9256 | 2 |
| 9256 – 1.157e+04 | 0 |
| 1.157e+04 – 1.388e+04 | 0 |
| 1.388e+04 – 1.62e+04 | 1 |
Schema
2 columns| Alerts | ||||
|---|---|---|---|---|
| state | categorical | 0.0% | 58 |
long_tail
|
| count | numeric | 0.0% | 55 |
high_skew
outliers
|
state
categorical identifier long_tailThis column contains US state abbreviations, with exactly 58 unique values across 58 rows — meaning every row has a distinct state code and the dataset contains one record per state (plus potentially DC and a territory or two beyond the standard 50). Entropy ratio of 1.0 and a top_rate of 0.0172 (1/58) confirm perfectly uniform distribution with zero repetition, making this effectively a lookup key rather than a grouping variable. The long_tail alert is technically correct but misleading — there is no tail, just perfect cardinality. Treatment: Use as a join key or index; do not one-hot encode or use as a categorical feature without aggregating additional rows per state first.
- n
- 58
- nulls
- 0 (0.0%)
- unique
- 58
- top_value
- CA
- top_rate
- 0.01724
- cardinality
- 58
- entropy
- 5.858
- entropy_ratio
- 1
count
numeric feature high_skew outliersThis column appears to be an event or item count, likely representing frequency or volume of some activity across 58 records. The distribution is severely right-skewed (skew = 2.93, kurtosis = 11.75) with a min of 1 and a max of 16,197 against a median of only 1,510.5, indicating a handful of dominant observations pulling the mean (2,274.7) well above the median. Four outliers (≈6.9% of rows) are driving the extreme tail, and the standard deviation (2,642.8) exceeds the mean, confirming high dispersion. Treatment: Log-transform before modelling to reduce skew; investigate the 4 outliers (max 16,197) for data-quality issues before including them.
- n
- 58
- nulls
- 0 (0.0%)
- unique
- 55
- min
- 1
- max
- 16,197
- mean
- 2275
- median
- 1510
- std
- 2643
- q1
- 648.8
- q3
- 2,789
- iqr
- 2140
- skew
- 2.925
- kurtosis
- 11.75
- n_outliers
- 4
- outlier_rate
- 0.06897
- zero_rate
- 0