This dataset captures 500 trending Wikipedia articles, with each row identified by a unique title and described by days_in_top_100, peak_views, total_views, and a daily_views series. All three numeric columns are heavily right-skewed with significant outliers — total_views skew is 10.4 with a max of ~23.9M against a median of ~213K, and peak_views shows similar behavior. Most articles spend only a few days in the top 100 (median 3, max 30), but a long tail extends well beyond. Start by examining the distribution of total_views and days_in_top_100 to understand how concentrated attention is on a few breakout articles.
saturn
/home/coolhand/datasets/us-attention-data/wikipedia_trending.json 500 rows sample n=500 seed 42 2026-05-01T17:25:20+00:00
Overview
| Source | /home/coolhand/datasets/us-attention-data/wikipedia_trending.json |
| Total rows | 500 |
| Profiled sample | 500 |
| Columns | 5 |
| Generated | 2026-05-01T17:25:20+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
Wikipedia-style article titles with underscores (e.g. '1989_Tiananmen_Square_protests_and_massacre', 'Stranger_Things_season_5'), unique across all 500 rows (n_unique=500, entropy_ratio=1.0). Every value appears exactly once, so this functions as a row identifier rather than a categorical feature. The long_tail alert simply reflects that uniqueness.
Likely a per-row view count, with all 500 values unique and no nulls or zeros. The distribution is severely right-skewed (skew 10.44, kurtosis 149.66): the median is 213,065 but the mean is 580,331 and the max reaches 23,890,102, roughly 112x the median. Outliers make up 10.2% of rows (51 of 500), so a small set of viral entries dominates the tail.
This column counts days a record spent in some top-100 ranking, with 500 non-null integer values ranging from 1 to 30 and a median of just 3. The distribution is heavily right-skewed (skew 2.45, kurtosis 5.98) — most items churn out fast while a long tail lingers, producing 56 outliers (11.2% outlier rate) above the q3 of 6. Mean (5.18) sits well above median, and only 27 unique values suggest tenure is bounded and discrete.
A numeric measure of peak viewership per record, with 499 unique values across 500 rows and no nulls or zeros. The distribution is severely right-skewed (skew 9.71, kurtosis 127.99): the median is 104,303 but the mean is 159,797 and the max reaches 4,011,044, well above q3 of 148,907. 57 outliers (11.4%) sit above the upper whisker, suggesting a small tail of viral peaks dominates the variance (std 250,171).
Column 'daily_views' was skipped by the profiler, so no type, uniqueness, or distribution stats were computed despite a full 500 non-null rows. The name suggests a per-day view count (likely numeric and right-skewed in practice), but nothing in the evidence confirms that. Re-run profiling with this column included before drawing any conclusions.
Numeric correlation
title categorical
Top values (rank 1–20)
- 1989_Tiananmen_Square_protests_and_massacre — 1
- .xxx — 1
- Wikipedia:Featured_pictures — 1
- Dhurandhar — 1
- Avatar:_Fire_and_Ash — 1
- Nicolás_Maduro — 1
- Stranger_Things — 1
- Marty_Supreme — 1
- Stranger_Things_season_5 — 1
- List_of_highest-grossing_Indian_films — 1
- Bruce_Lee — 1
- Heated_Rivalry — 1
- Venezuela — 1
- One_Battle_After_Another — 1
- Donald_Trump — 1
- ChatGPT — 1
- Brigitte_Bardot — 1
- Pluribus_(TV_series) — 1
- The_Housemaid_(2025_film) — 1
- Google_Chrome — 1