us attention data wikipedia trending
Reading
This dataset captures 500 trending Wikipedia articles, with each row identified by a unique title and described by days_in_top_100, peak_views, total_views, and a daily_views series. All three numeric columns are heavily right-skewed with significant outliers — total_views skew is 10.4 with a max of ~23.9M against a median of ~213K, and peak_views shows similar behavior. Most articles spend only a few days in the top 100 (median 3, max 30), but a long tail extends well beyond. Start by examining the distribution of total_views and days_in_top_100 to understand how concentrated attention is on a few breakout articles.
citing: days_in_top_100.stats · peak_views.stats · total_views.stats · title.top_values · row_count
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 7.645e+04 – 1.159e+06 | 449 |
| 1.159e+06 – 2.241e+06 | 27 |
| 2.241e+06 – 3.324e+06 | 13 |
| 3.324e+06 – 4.406e+06 | 4 |
| 4.406e+06 – 5.489e+06 | 1 |
| 5.489e+06 – 6.571e+06 | 1 |
| 6.571e+06 – 7.654e+06 | 1 |
| 7.654e+06 – 8.736e+06 | 2 |
| 8.736e+06 – 9.818e+06 | 0 |
| 9.818e+06 – 1.09e+07 | 1 |
| 1.09e+07 – 1.198e+07 | 0 |
| 1.198e+07 – 1.307e+07 | 0 |
| 1.307e+07 – 1.415e+07 | 0 |
| 1.415e+07 – 1.523e+07 | 0 |
| 1.523e+07 – 1.631e+07 | 0 |
| 1.631e+07 – 1.74e+07 | 0 |
| 1.74e+07 – 1.848e+07 | 0 |
| 1.848e+07 – 1.956e+07 | 0 |
| 1.956e+07 – 2.064e+07 | 0 |
| 2.064e+07 – 2.173e+07 | 0 |
| 2.173e+07 – 2.281e+07 | 0 |
| 2.281e+07 – 2.389e+07 | 1 |
Show data table
| bin | count |
|---|---|
| 4.033e+04 – 2.208e+05 | 430 |
| 2.208e+05 – 4.013e+05 | 38 |
| 4.013e+05 – 5.818e+05 | 19 |
| 5.818e+05 – 7.623e+05 | 5 |
| 7.623e+05 – 9.428e+05 | 1 |
| 9.428e+05 – 1.123e+06 | 3 |
| 1.123e+06 – 1.304e+06 | 2 |
| 1.304e+06 – 1.484e+06 | 0 |
| 1.484e+06 – 1.665e+06 | 0 |
| 1.665e+06 – 1.845e+06 | 0 |
| 1.845e+06 – 2.026e+06 | 0 |
| 2.026e+06 – 2.206e+06 | 0 |
| 2.206e+06 – 2.387e+06 | 0 |
| 2.387e+06 – 2.567e+06 | 1 |
| 2.567e+06 – 2.748e+06 | 0 |
| 2.748e+06 – 2.928e+06 | 0 |
| 2.928e+06 – 3.109e+06 | 0 |
| 3.109e+06 – 3.289e+06 | 0 |
| 3.289e+06 – 3.47e+06 | 0 |
| 3.47e+06 – 3.65e+06 | 0 |
| 3.65e+06 – 3.831e+06 | 0 |
| 3.831e+06 – 4.011e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 1 – 2.318 | 247 |
| 2.318 – 3.636 | 60 |
| 3.636 – 4.955 | 47 |
| 4.955 – 6.273 | 35 |
| 6.273 – 7.591 | 12 |
| 7.591 – 8.909 | 11 |
| 8.909 – 10.23 | 17 |
| 10.23 – 11.55 | 9 |
| 11.55 – 12.86 | 6 |
| 12.86 – 14.18 | 18 |
| 14.18 – 15.5 | 5 |
| 15.5 – 16.82 | 5 |
| 16.82 – 18.14 | 2 |
| 18.14 – 19.45 | 2 |
| 19.45 – 20.77 | 0 |
| 20.77 – 22.09 | 5 |
| 22.09 – 23.41 | 1 |
| 23.41 – 24.73 | 0 |
| 24.73 – 26.05 | 3 |
| 26.05 – 27.36 | 0 |
| 27.36 – 28.68 | 1 |
| 28.68 – 30 | 14 |
Show data table
| value | count | share |
|---|---|---|
| 1989_Tiananmen_Square_protests_and_massacre | 1 | 0.2% |
| .xxx | 1 | 0.2% |
| Wikipedia:Featured_pictures | 1 | 0.2% |
| Dhurandhar | 1 | 0.2% |
| Avatar:_Fire_and_Ash | 1 | 0.2% |
| Nicolás_Maduro | 1 | 0.2% |
| Stranger_Things | 1 | 0.2% |
| Marty_Supreme | 1 | 0.2% |
| Stranger_Things_season_5 | 1 | 0.2% |
| List_of_highest-grossing_Indian_films | 1 | 0.2% |
| Bruce_Lee | 1 | 0.2% |
| Heated_Rivalry | 1 | 0.2% |
| Venezuela | 1 | 0.2% |
| One_Battle_After_Another | 1 | 0.2% |
| Donald_Trump | 1 | 0.2% |
| ChatGPT | 1 | 0.2% |
| Brigitte_Bardot | 1 | 0.2% |
| Pluribus_(TV_series) | 1 | 0.2% |
| The_Housemaid_(2025_film) | 1 | 0.2% |
| Google_Chrome | 1 | 0.2% |
Schema
5 columns| Alerts | ||||
|---|---|---|---|---|
| title | categorical | 0.0% | 500 |
long_tail
|
| total_views | numeric | 0.0% | 500 |
high_skew
outliers
|
| days_in_top_100 | numeric | 0.0% | 27 |
high_skew
outliers
|
| peak_views | numeric | 0.0% | 499 |
high_skew
outliers
|
| daily_views | unknown | 0.0% | — |
skipped
|
title
categorical identifier long_tailWikipedia-style article titles with underscores (e.g. '1989_Tiananmen_Square_protests_and_massacre', 'Stranger_Things_season_5'), unique across all 500 rows (n_unique=500, entropy_ratio=1.0). Every value appears exactly once, so this functions as a row identifier rather than a categorical feature. The long_tail alert simply reflects that uniqueness. Treatment: Treat as a row key; drop from modelling or use only for joins and lookup.
- n
- 500
- nulls
- 0 (0.0%)
- unique
- 500
- top_value
- 1989_Tiananmen_Square_protests_and_massacre
- top_rate
- 0.002
- cardinality
- 500
- entropy
- 8.966
- entropy_ratio
- 1
total_views
numeric feature high_skew outliersLikely a per-row view count, with all 500 values unique and no nulls or zeros. The distribution is severely right-skewed (skew 10.44, kurtosis 149.66): the median is 213,065 but the mean is 580,331 and the max reaches 23,890,102, roughly 112x the median. Outliers make up 10.2% of rows (51 of 500), so a small set of viral entries dominates the tail. Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 500
- nulls
- 0 (0.0%)
- unique
- 500
- min
- 76,451
- max
- 2.389e+07
- mean
- 5.803e+05
- median
- 213,065
- std
- 1.424e+06
- q1
- 1.151e+05
- q3
- 535,224
- iqr
- 4.201e+05
- skew
- 10.44
- kurtosis
- 149.7
- n_outliers
- 51
- outlier_rate
- 0.102
- zero_rate
- 0
days_in_top_100
numeric feature high_skew outliersThis column counts days a record spent in some top-100 ranking, with 500 non-null integer values ranging from 1 to 30 and a median of just 3. The distribution is heavily right-skewed (skew 2.45, kurtosis 5.98) — most items churn out fast while a long tail lingers, producing 56 outliers (11.2% outlier rate) above the q3 of 6. Mean (5.18) sits well above median, and only 27 unique values suggest tenure is bounded and discrete. Treatment: Log- or sqrt-transform before modelling to tame the right skew and outlier mass.
- n
- 500
- nulls
- 0 (0.0%)
- unique
- 27
- min
- 1
- max
- 30
- mean
- 5.176
- median
- 3
- std
- 6.239
- q1
- 2
- q3
- 6
- iqr
- 4
- skew
- 2.449
- kurtosis
- 5.979
- n_outliers
- 56
- outlier_rate
- 0.112
- zero_rate
- 0
peak_views
numeric feature high_skew outliersA numeric measure of peak viewership per record, with 499 unique values across 500 rows and no nulls or zeros. The distribution is severely right-skewed (skew 9.71, kurtosis 127.99): the median is 104,303 but the mean is 159,797 and the max reaches 4,011,044, well above q3 of 148,907. 57 outliers (11.4%) sit above the upper whisker, suggesting a small tail of viral peaks dominates the variance (std 250,171). Treatment: log-transform before modelling to tame the heavy right tail.
- n
- 500
- nulls
- 0 (0.0%)
- unique
- 499
- min
- 40,332
- max
- 4.011e+06
- mean
- 1.598e+05
- median
- 104,303
- std
- 2.502e+05
- q1
- 7.73e+04
- q3
- 148,907
- iqr
- 7.16e+04
- skew
- 9.709
- kurtosis
- 128
- n_outliers
- 57
- outlier_rate
- 0.114
- zero_rate
- 0
daily_views
unknown other skippedColumn 'daily_views' was skipped by the profiler, so no type, uniqueness, or distribution stats were computed despite a full 500 non-null rows. The name suggests a per-day view count (likely numeric and right-skewed in practice), but nothing in the evidence confirms that. Re-run profiling with this column included before drawing any conclusions. Treatment: Re-profile the column to recover type and distribution before any downstream use.
- n
- 500
- nulls
- 0 (0.0%)
- unique
- —