saturn·

us attention data wikipedia event articles

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/datasets/us-attention-data/wikipedia_event_articles.json

Saturn profiled 10 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/us-attention-data/wikipedia_event_articles.json",
    "--findings", "us-attention-data-wikipedia_event_articles.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is a small dataset of 10 Wikipedia articles tracking US public attention, with view metrics (peak_views, avg_daily_views, total_views) plus an article name and a timeline field. The view metrics are heavily right-skewed — peak_views has a skew of 2.61 and a max of 739,258 against a median of just 22,111, suggesting one or two articles dominate attention. Each numeric column flags one outlier (10% outlier rate), so it's worth identifying which article is pulling the distribution. The article column has 10 unique values for 10 rows, so it functions as an identifier rather than a category to aggregate on.

citing: peak_views.stats.skew · peak_views.stats.max · peak_views.stats.median · peak_views.stats.n_outliers · avg_daily_views.stats.skew · avg_daily_views.stats.n_outliers · total_views.stats.skew · total_views.stats.n_outliers · article.stats.cardinality · row_count

Out[4]:

saturn.schema() · 5 columns

column kind n null% unique alerts
article categorical 10 0.0% 10 long_tail
avg_daily_views numeric 10 0.0% 10 outliers
peak_views numeric 10 0.0% 10 high_skew outliers
total_views numeric 10 0.0% 10 outliers
timeline unknown 10 0.0% skipped
Fig 1.
peak_views · Look for the long right tail — one article has a peak far above the median of 22,111.
Show data table
Histogram bins for peak_views (median: 22111.0).
bincount
3613 – 1.507e+059
1.507e+05 – 2.979e+050
2.979e+05 – 4.45e+050
4.45e+05 – 5.921e+050
5.921e+05 – 7.393e+051
Fig 2.
total_views · Check how concentrated total attention is; mean (1.86M) sits well above the median (1.20M).
Show data table
Histogram bins for total_views (median: 1195666.5).
bincount
2.001e+05 – 1.377e+066
1.377e+06 – 2.554e+062
2.554e+06 – 3.732e+061
3.732e+06 – 4.909e+060
4.909e+06 – 6.086e+061
Fig 3.
avg_daily_views · See whether sustained daily interest follows the same skew as peak spikes.
Show data table
Histogram bins for avg_daily_views (median: 13139.0).
bincount
2199 – 1.513e+046
1.513e+04 – 2.807e+042
2.807e+04 – 4.101e+041
4.101e+04 – 5.394e+040
5.394e+04 – 6.688e+041
Fig 4.
article · Use this as a label axis to see which specific articles drive the outliers in the view metrics.
Show data table
Top values for article (10 unique shown, of 10 total).
valuecountshare
Donald_Trump110.0%
Joe_Biden110.0%
Climate_change110.0%
COVID-19_pandemic110.0%
Artificial_intelligence110.0%
Russia110.0%
Israel110.0%
Taylor_Swift110.0%
Elon_Musk110.0%
United_States110.0%
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
articlecategorical0.0%
avg_daily_viewsnumeric0.0%
peak_viewsnumeric0.0%
total_viewsnumeric0.0%
timelineunknown0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
avg_daily_viewspeak_viewstotal_views
avg_daily_views+1.00+0.90+1.00
peak_views+0.90+1.00+0.90
total_views+1.00+0.90+1.00

article categorical identifier

This column holds Wikipedia-style article titles (e.g., Donald_Trump, COVID-19_pandemic, Taylor_Swift) using underscore-separated naming. Every one of the 10 rows is unique (n_unique=10, entropy_ratio=1.0), so it functions as a row identifier rather than a categorical feature. The mix spans people, countries, and topics, with no nulls and no repeated value (top_rate=0.1).

Treatment: Use as a join key to Wikipedia metadata; do not one-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["article"].stats

statvalue
n10
nulls0 (0.0%)
unique10
top_value Donald_Trump
top_rate 0.1
cardinality 10
entropy 3.322
entropy_ratio 1
alert: long_tail10 singleton categories
Fig 7.
Top values for article.
Show data table
Top values for article (10 unique shown, of 10 total).
valuecountshare
Donald_Trump110.0%
Joe_Biden110.0%
Climate_change110.0%
COVID-19_pandemic110.0%
Artificial_intelligence110.0%
Russia110.0%
Israel110.0%
Taylor_Swift110.0%
Elon_Musk110.0%
United_States110.0%

avg_daily_views numeric feature

Numeric column capturing average daily views per item, with all 10 rows unique and no nulls or zeros. The distribution is right-skewed (skew 1.57) with a mean of 20484 sitting well above the median of 13139, and a max of 66878 flagged as the lone outlier (10% outlier rate) versus a min of 2199. Standard deviation (19006) nearly matches the mean, signalling high dispersion in a tiny sample.

Treatment: Log-transform before regression to tame the right skew and outlier.

anthropic:claude-opus-4-7 · confidence medium
Out[15]:

saturn.columns["avg_daily_views"].stats

statvalue
n10
nulls0 (0.0%)
unique10
min 2,199
max 66,878
mean 20,484
median 13,139
std 1.901e+04
q1 1.087e+04
q3 23,820
iqr 1.295e+04
skew 1.573
kurtosis 1.608
n_outliers 1
outlier_rate 0.1
zero_rate 0
alert: outliers10.0% rows beyond 1.5 IQR
Fig 8.
Distribution of avg_daily_views. Vertical dash marks the median.
Show data table
Histogram bins for avg_daily_views (median: 13139.0).
bincount
2199 – 1.513e+046
1.513e+04 – 2.807e+042
2.807e+04 – 4.101e+041
4.101e+04 – 5.394e+040
5.394e+04 – 6.688e+041

peak_views numeric feature

This appears to be a peak view-count metric per item, with all 10 rows unique and no nulls. The distribution is heavily right-skewed (skew 2.61, kurtosis 4.93): the median is 22111 while the mean is 101490.2 and the max reaches 739258, roughly 12x the Q3 of 58074.25. One outlier (10% of rows) is dragging the standard deviation (225418) far above the IQR (43543.25).

Treatment: Log-transform before any modelling or aggregation to tame the skew and outlier.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["peak_views"].stats

statvalue
n10
nulls0 (0.0%)
unique10
min 3,613
max 739,258
mean 1.015e+05
median 22,111
std 2.254e+05
q1 14,531
q3 5.807e+04
iqr 4.354e+04
skew 2.609
kurtosis 4.928
n_outliers 1
outlier_rate 0.1
zero_rate 0
alert: high_skewskew=+2.61
alert: outliers10.0% rows beyond 1.5 IQR
Fig 9.
Distribution of peak_views. Vertical dash marks the median.
Show data table
Histogram bins for peak_views (median: 22111.0).
bincount
3613 – 1.507e+059
1.507e+05 – 2.979e+050
2.979e+05 – 4.45e+050
4.45e+05 – 5.921e+050
5.921e+05 – 7.393e+051

total_views numeric feature

Numeric view-count metric across just 10 rows, all distinct and non-null. Distribution is right-skewed (skew 1.57) with values ranging from 200,122 to 6,085,895 against a median of 1,195,666.5, and one row (10%) flagged as an outlier pulling the mean up to 1,864,031.7. With n=10 the shape estimates are fragile.

Treatment: log-transform before any modelling to tame the right skew and outlier.

anthropic:claude-opus-4-7 · confidence medium
Out[21]:

saturn.columns["total_views"].stats

statvalue
n10
nulls0 (0.0%)
unique10
min 200,122
max 6.086e+06
mean 1.864e+06
median 1.196e+06
std 1.73e+06
q1 9.891e+05
q3 2.168e+06
iqr 1.178e+06
skew 1.573
kurtosis 1.608
n_outliers 1
outlier_rate 0.1
zero_rate 0
alert: outliers10.0% rows beyond 1.5 IQR
Fig 10.
Distribution of total_views. Vertical dash marks the median.
Show data table
Histogram bins for total_views (median: 1195666.5).
bincount
2.001e+05 – 1.377e+066
1.377e+06 – 2.554e+062
2.554e+06 – 3.732e+061
3.732e+06 – 4.909e+060
4.909e+06 – 6.086e+061

timeline unknown other

This column, named 'timeline', was skipped by the profiler and has no computed statistics beyond a row count of 10 and a null rate of 0.0. Its kind is reported as 'unknown' and the unique count is missing, so nothing can be said about cardinality, types, or value distribution. The 'skipped' alert is the only substantive signal present.

Treatment: Re-profile with an appropriate parser before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[24]:

saturn.columns["timeline"].stats

statvalue
n10
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

How to cite

click to copy

BibTeX
@misc{saturn-us-attention-data-wikipedia-event-articles-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: us attention data wikipedia event articles},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/us-attention-data-wikipedia_event_articles}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: us attention data wikipedia event articles. Source: /home/coolhand/datasets/us-attention-data/wikipedia_event_articles.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/us-attention-data-wikipedia_event_articles