saturn·

us attention data wikipedia event articles

source /home/coolhand/datasets/us-attention-data/wikipedia_event_articles.json 10 rows 5 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This is a small dataset of 10 Wikipedia articles tracking US public attention, with view metrics (peak_views, avg_daily_views, total_views) plus an article name and a timeline field. The view metrics are heavily right-skewed — peak_views has a skew of 2.61 and a max of 739,258 against a median of just 22,111, suggesting one or two articles dominate attention. Each numeric column flags one outlier (10% outlier rate), so it's worth identifying which article is pulling the distribution. The article column has 10 unique values for 10 rows, so it functions as an identifier rather than a category to aggregate on.

citing: peak_views.stats.skew · peak_views.stats.max · peak_views.stats.median · peak_views.stats.n_outliers · avg_daily_views.stats.skew · avg_daily_views.stats.n_outliers · total_views.stats.skew · total_views.stats.n_outliers · article.stats.cardinality · row_count

Schema

5 columns
Per-column summary. Click column name to jump to its detail.
Alerts
article categorical 0.0% 10
long_tail
avg_daily_views numeric 0.0% 10
outliers
peak_views numeric 0.0% 10
high_skew outliers
total_views numeric 0.0% 10
outliers
timeline unknown 0.0%
skipped

article

categorical identifier long_tail
This column holds Wikipedia-style article titles (e.g., Donald_Trump, COVID-19_pandemic, Taylor_Swift) using underscore-separated naming. Every one of the 10 rows is unique (n_unique=10, entropy_ratio=1.0), so it functions as a row identifier rather than a categorical feature. The mix spans people, countries, and topics, with no nulls and no repeated value (top_rate=0.1). Treatment: Use as a join key to Wikipedia metadata; do not one-hot encode. high · anthropic:claude-opus-4-7
n
10
nulls
0 (0.0%)
unique
10
top_value
Donald_Trump
top_rate
0.1
cardinality
10
entropy
3.322
entropy_ratio
1

avg_daily_views

numeric feature outliers
Numeric column capturing average daily views per item, with all 10 rows unique and no nulls or zeros. The distribution is right-skewed (skew 1.57) with a mean of 20484 sitting well above the median of 13139, and a max of 66878 flagged as the lone outlier (10% outlier rate) versus a min of 2199. Standard deviation (19006) nearly matches the mean, signalling high dispersion in a tiny sample. Treatment: Log-transform before regression to tame the right skew and outlier. medium · anthropic:claude-opus-4-7
n
10
nulls
0 (0.0%)
unique
10
min
2,199
max
66,878
mean
20,484
median
13,139
std
1.901e+04
q1
1.087e+04
q3
23,820
iqr
1.295e+04
skew
1.573
kurtosis
1.608
n_outliers
1
outlier_rate
0.1
zero_rate
0

peak_views

numeric feature high_skew outliers
This appears to be a peak view-count metric per item, with all 10 rows unique and no nulls. The distribution is heavily right-skewed (skew 2.61, kurtosis 4.93): the median is 22111 while the mean is 101490.2 and the max reaches 739258, roughly 12x the Q3 of 58074.25. One outlier (10% of rows) is dragging the standard deviation (225418) far above the IQR (43543.25). Treatment: Log-transform before any modelling or aggregation to tame the skew and outlier. high · anthropic:claude-opus-4-7
n
10
nulls
0 (0.0%)
unique
10
min
3,613
max
739,258
mean
1.015e+05
median
22,111
std
2.254e+05
q1
14,531
q3
5.807e+04
iqr
4.354e+04
skew
2.609
kurtosis
4.928
n_outliers
1
outlier_rate
0.1
zero_rate
0

total_views

numeric feature outliers
Numeric view-count metric across just 10 rows, all distinct and non-null. Distribution is right-skewed (skew 1.57) with values ranging from 200,122 to 6,085,895 against a median of 1,195,666.5, and one row (10%) flagged as an outlier pulling the mean up to 1,864,031.7. With n=10 the shape estimates are fragile. Treatment: log-transform before any modelling to tame the right skew and outlier. medium · anthropic:claude-opus-4-7
n
10
nulls
0 (0.0%)
unique
10
min
200,122
max
6.086e+06
mean
1.864e+06
median
1.196e+06
std
1.73e+06
q1
9.891e+05
q3
2.168e+06
iqr
1.178e+06
skew
1.573
kurtosis
1.608
n_outliers
1
outlier_rate
0.1
zero_rate
0

timeline

unknown other skipped
This column, named 'timeline', was skipped by the profiler and has no computed statistics beyond a row count of 10 and a null rate of 0.0. Its kind is reported as 'unknown' and the unique count is missing, so nothing can be said about cardinality, types, or value distribution. The 'skipped' alert is the only substantive signal present. Treatment: Re-profile with an appropriate parser before deciding on downstream use. low · anthropic:claude-opus-4-7
n
10
nulls
0 (0.0%)
unique