saturn·

us attention data wikipedia trending

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/us-attention-data/wikipedia_trending.json

Saturn profiled 500 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-attention-data/wikipedia_trending.json",
    "--findings", "us-attention-data-wikipedia_trending.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset captures 500 trending Wikipedia articles, with each row identified by a unique title and described by days_in_top_100, peak_views, total_views, and a daily_views series. All three numeric columns are heavily right-skewed with significant outliers — total_views skew is 10.4 with a max of ~23.9M against a median of ~213K, and peak_views shows similar behavior. Most articles spend only a few days in the top 100 (median 3, max 30), but a long tail extends well beyond. Start by examining the distribution of total_views and days_in_top_100 to understand how concentrated attention is on a few breakout articles.

citing: days_in_top_100.stats · peak_views.stats · total_views.stats · title.top_values · row_count

Out[4]:

saturn.schema() · 5 columns

column kind n null% unique alerts
title categorical 500 0.0% 500 long_tail
total_views numeric 500 0.0% 500 high_skew outliers
days_in_top_100 numeric 500 0.0% 27 high_skew outliers
peak_views numeric 500 0.0% 499 high_skew outliers
daily_views unknown 500 0.0% skipped
Fig 1.
total_views · Look for an extreme right tail — a few articles dominate total attention while most cluster near the median of ~213K.
Show data table
Histogram bins for total_views (median: 213065.0).
bincount
7.645e+04 – 1.159e+06449
1.159e+06 – 2.241e+0627
2.241e+06 – 3.324e+0613
3.324e+06 – 4.406e+064
4.406e+06 – 5.489e+061
5.489e+06 – 6.571e+061
6.571e+06 – 7.654e+061
7.654e+06 – 8.736e+062
8.736e+06 – 9.818e+060
9.818e+06 – 1.09e+071
1.09e+07 – 1.198e+070
1.198e+07 – 1.307e+070
1.307e+07 – 1.415e+070
1.415e+07 – 1.523e+070
1.523e+07 – 1.631e+070
1.631e+07 – 1.74e+070
1.74e+07 – 1.848e+070
1.848e+07 – 1.956e+070
1.956e+07 – 2.064e+070
2.064e+07 – 2.173e+070
2.173e+07 – 2.281e+070
2.281e+07 – 2.389e+071
Fig 2.
peak_views · Check how peak single-day views are distributed; skew of 9.7 suggests a handful of viral spikes far above the typical ~104K.
Show data table
Histogram bins for peak_views (median: 104303.0).
bincount
4.033e+04 – 2.208e+05430
2.208e+05 – 4.013e+0538
4.013e+05 – 5.818e+0519
5.818e+05 – 7.623e+055
7.623e+05 – 9.428e+051
9.428e+05 – 1.123e+063
1.123e+06 – 1.304e+062
1.304e+06 – 1.484e+060
1.484e+06 – 1.665e+060
1.665e+06 – 1.845e+060
1.845e+06 – 2.026e+060
2.026e+06 – 2.206e+060
2.206e+06 – 2.387e+060
2.387e+06 – 2.567e+061
2.567e+06 – 2.748e+060
2.748e+06 – 2.928e+060
2.928e+06 – 3.109e+060
3.109e+06 – 3.289e+060
3.289e+06 – 3.47e+060
3.47e+06 – 3.65e+060
3.65e+06 – 3.831e+060
3.831e+06 – 4.011e+061
Fig 3.
days_in_top_100 · See how briefly most articles trend — the median is just 3 days, but some persist up to 30.
Show data table
Histogram bins for days_in_top_100 (median: 3.0).
bincount
1 – 2.318247
2.318 – 3.63660
3.636 – 4.95547
4.955 – 6.27335
6.273 – 7.59112
7.591 – 8.90911
8.909 – 10.2317
10.23 – 11.559
11.55 – 12.866
12.86 – 14.1818
14.18 – 15.55
15.5 – 16.825
16.82 – 18.142
18.14 – 19.452
19.45 – 20.770
20.77 – 22.095
22.09 – 23.411
23.41 – 24.730
24.73 – 26.053
26.05 – 27.360
27.36 – 28.681
28.68 – 3014
Fig 4.
title · All 500 titles are unique; inspect title length to get a feel for the kinds of pages trending.
Show data table
Top values for title (20 unique shown, of 500 total).
valuecountshare
1989_Tiananmen_Square_protests_and_massacre10.2%
.xxx10.2%
Wikipedia:Featured_pictures10.2%
Dhurandhar10.2%
Avatar:_Fire_and_Ash10.2%
Nicolás_Maduro10.2%
Stranger_Things10.2%
Marty_Supreme10.2%
Stranger_Things_season_510.2%
List_of_highest-grossing_Indian_films10.2%
Bruce_Lee10.2%
Heated_Rivalry10.2%
Venezuela10.2%
One_Battle_After_Another10.2%
Donald_Trump10.2%
ChatGPT10.2%
Brigitte_Bardot10.2%
Pluribus_(TV_series)10.2%
The_Housemaid_(2025_film)10.2%
Google_Chrome10.2%
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
titlecategorical0.0%
total_viewsnumeric0.0%
days_in_top_100numeric0.0%
peak_viewsnumeric0.0%
daily_viewsunknown0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
total_viewsdays_in_top_100peak_views
total_views+1.00+0.59+0.80
days_in_top_100+0.59+1.00+0.23
peak_views+0.80+0.23+1.00

title categorical identifier

Wikipedia-style article titles with underscores (e.g. '1989_Tiananmen_Square_protests_and_massacre', 'Stranger_Things_season_5'), unique across all 500 rows (n_unique=500, entropy_ratio=1.0). Every value appears exactly once, so this functions as a row identifier rather than a categorical feature. The long_tail alert simply reflects that uniqueness.

Treatment: Treat as a row key; drop from modelling or use only for joins and lookup.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["title"].stats

statvalue
n500
nulls0 (0.0%)
unique500
top_value 1989_Tiananmen_Square_protests_and_massacre
top_rate 0.002
cardinality 500
entropy 8.966
entropy_ratio 1
alert: long_tail500 singleton categories
Fig 7.
Top values for title.
Show data table
Top values for title (20 unique shown, of 500 total).
valuecountshare
1989_Tiananmen_Square_protests_and_massacre10.2%
.xxx10.2%
Wikipedia:Featured_pictures10.2%
Dhurandhar10.2%
Avatar:_Fire_and_Ash10.2%
Nicolás_Maduro10.2%
Stranger_Things10.2%
Marty_Supreme10.2%
Stranger_Things_season_510.2%
List_of_highest-grossing_Indian_films10.2%
Bruce_Lee10.2%
Heated_Rivalry10.2%
Venezuela10.2%
One_Battle_After_Another10.2%
Donald_Trump10.2%
ChatGPT10.2%
Brigitte_Bardot10.2%
Pluribus_(TV_series)10.2%
The_Housemaid_(2025_film)10.2%
Google_Chrome10.2%

total_views numeric feature

Likely a per-row view count, with all 500 values unique and no nulls or zeros. The distribution is severely right-skewed (skew 10.44, kurtosis 149.66): the median is 213,065 but the mean is 580,331 and the max reaches 23,890,102, roughly 112x the median. Outliers make up 10.2% of rows (51 of 500), so a small set of viral entries dominates the tail.

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["total_views"].stats

statvalue
n500
nulls0 (0.0%)
unique500
min 76,451
max 2.389e+07
mean 5.803e+05
median 213,065
std 1.424e+06
q1 1.151e+05
q3 535,224
iqr 4.201e+05
skew 10.44
kurtosis 149.7
n_outliers 51
outlier_rate 0.102
zero_rate 0
alert: high_skewskew=+10.44
alert: outliers10.2% rows beyond 1.5 IQR
Fig 8.
Distribution of total_views. Vertical dash marks the median.
Show data table
Histogram bins for total_views (median: 213065.0).
bincount
7.645e+04 – 1.159e+06449
1.159e+06 – 2.241e+0627
2.241e+06 – 3.324e+0613
3.324e+06 – 4.406e+064
4.406e+06 – 5.489e+061
5.489e+06 – 6.571e+061
6.571e+06 – 7.654e+061
7.654e+06 – 8.736e+062
8.736e+06 – 9.818e+060
9.818e+06 – 1.09e+071
1.09e+07 – 1.198e+070
1.198e+07 – 1.307e+070
1.307e+07 – 1.415e+070
1.415e+07 – 1.523e+070
1.523e+07 – 1.631e+070
1.631e+07 – 1.74e+070
1.74e+07 – 1.848e+070
1.848e+07 – 1.956e+070
1.956e+07 – 2.064e+070
2.064e+07 – 2.173e+070
2.173e+07 – 2.281e+070
2.281e+07 – 2.389e+071

days_in_top_100 numeric feature

This column counts days a record spent in some top-100 ranking, with 500 non-null integer values ranging from 1 to 30 and a median of just 3. The distribution is heavily right-skewed (skew 2.45, kurtosis 5.98) — most items churn out fast while a long tail lingers, producing 56 outliers (11.2% outlier rate) above the q3 of 6. Mean (5.18) sits well above median, and only 27 unique values suggest tenure is bounded and discrete.

Treatment: Log- or sqrt-transform before modelling to tame the right skew and outlier mass.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["days_in_top_100"].stats

statvalue
n500
nulls0 (0.0%)
unique27
min 1
max 30
mean 5.176
median 3
std 6.239
q1 2
q3 6
iqr 4
skew 2.449
kurtosis 5.979
n_outliers 56
outlier_rate 0.112
zero_rate 0
alert: high_skewskew=+2.45
alert: outliers11.2% rows beyond 1.5 IQR
Fig 9.
Distribution of days_in_top_100. Vertical dash marks the median.
Show data table
Histogram bins for days_in_top_100 (median: 3.0).
bincount
1 – 2.318247
2.318 – 3.63660
3.636 – 4.95547
4.955 – 6.27335
6.273 – 7.59112
7.591 – 8.90911
8.909 – 10.2317
10.23 – 11.559
11.55 – 12.866
12.86 – 14.1818
14.18 – 15.55
15.5 – 16.825
16.82 – 18.142
18.14 – 19.452
19.45 – 20.770
20.77 – 22.095
22.09 – 23.411
23.41 – 24.730
24.73 – 26.053
26.05 – 27.360
27.36 – 28.681
28.68 – 3014

peak_views numeric feature

A numeric measure of peak viewership per record, with 499 unique values across 500 rows and no nulls or zeros. The distribution is severely right-skewed (skew 9.71, kurtosis 127.99): the median is 104,303 but the mean is 159,797 and the max reaches 4,011,044, well above q3 of 148,907. 57 outliers (11.4%) sit above the upper whisker, suggesting a small tail of viral peaks dominates the variance (std 250,171).

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["peak_views"].stats

statvalue
n500
nulls0 (0.0%)
unique499
min 40,332
max 4.011e+06
mean 1.598e+05
median 104,303
std 2.502e+05
q1 7.73e+04
q3 148,907
iqr 7.16e+04
skew 9.709
kurtosis 128
n_outliers 57
outlier_rate 0.114
zero_rate 0
alert: high_skewskew=+9.71
alert: outliers11.4% rows beyond 1.5 IQR
Fig 10.
Distribution of peak_views. Vertical dash marks the median.
Show data table
Histogram bins for peak_views (median: 104303.0).
bincount
4.033e+04 – 2.208e+05430
2.208e+05 – 4.013e+0538
4.013e+05 – 5.818e+0519
5.818e+05 – 7.623e+055
7.623e+05 – 9.428e+051
9.428e+05 – 1.123e+063
1.123e+06 – 1.304e+062
1.304e+06 – 1.484e+060
1.484e+06 – 1.665e+060
1.665e+06 – 1.845e+060
1.845e+06 – 2.026e+060
2.026e+06 – 2.206e+060
2.206e+06 – 2.387e+060
2.387e+06 – 2.567e+061
2.567e+06 – 2.748e+060
2.748e+06 – 2.928e+060
2.928e+06 – 3.109e+060
3.109e+06 – 3.289e+060
3.289e+06 – 3.47e+060
3.47e+06 – 3.65e+060
3.65e+06 – 3.831e+060
3.831e+06 – 4.011e+061

daily_views unknown other

Column 'daily_views' was skipped by the profiler, so no type, uniqueness, or distribution stats were computed despite a full 500 non-null rows. The name suggests a per-day view count (likely numeric and right-skewed in practice), but nothing in the evidence confirms that. Re-run profiling with this column included before drawing any conclusions.

Treatment: Re-profile the column to recover type and distribution before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[24]:

saturn.columns["daily_views"].stats

statvalue
n500
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

How to cite

click to copy

BibTeX
@misc{saturn-us-attention-data-wikipedia-trending-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: us attention data wikipedia trending},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/us-attention-data-wikipedia_trending}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: us attention data wikipedia trending. Source: /home/coolhand/datasets/us-attention-data/wikipedia_trending.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/us-attention-data-wikipedia_trending