us attention data wikipedia event articles
Reading
This is a small dataset of 10 Wikipedia articles tracking US public attention, with view metrics (peak_views, avg_daily_views, total_views) plus an article name and a timeline field. The view metrics are heavily right-skewed — peak_views has a skew of 2.61 and a max of 739,258 against a median of just 22,111, suggesting one or two articles dominate attention. Each numeric column flags one outlier (10% outlier rate), so it's worth identifying which article is pulling the distribution. The article column has 10 unique values for 10 rows, so it functions as an identifier rather than a category to aggregate on.
citing: peak_views.stats.skew · peak_views.stats.max · peak_views.stats.median · peak_views.stats.n_outliers · avg_daily_views.stats.skew · avg_daily_views.stats.n_outliers · total_views.stats.skew · total_views.stats.n_outliers · article.stats.cardinality · row_count
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 3613 – 1.507e+05 | 9 |
| 1.507e+05 – 2.979e+05 | 0 |
| 2.979e+05 – 4.45e+05 | 0 |
| 4.45e+05 – 5.921e+05 | 0 |
| 5.921e+05 – 7.393e+05 | 1 |
Show data table
| bin | count |
|---|---|
| 2.001e+05 – 1.377e+06 | 6 |
| 1.377e+06 – 2.554e+06 | 2 |
| 2.554e+06 – 3.732e+06 | 1 |
| 3.732e+06 – 4.909e+06 | 0 |
| 4.909e+06 – 6.086e+06 | 1 |
Show data table
| bin | count |
|---|---|
| 2199 – 1.513e+04 | 6 |
| 1.513e+04 – 2.807e+04 | 2 |
| 2.807e+04 – 4.101e+04 | 1 |
| 4.101e+04 – 5.394e+04 | 0 |
| 5.394e+04 – 6.688e+04 | 1 |
Show data table
| value | count | share |
|---|---|---|
| Donald_Trump | 1 | 10.0% |
| Joe_Biden | 1 | 10.0% |
| Climate_change | 1 | 10.0% |
| COVID-19_pandemic | 1 | 10.0% |
| Artificial_intelligence | 1 | 10.0% |
| Russia | 1 | 10.0% |
| Israel | 1 | 10.0% |
| Taylor_Swift | 1 | 10.0% |
| Elon_Musk | 1 | 10.0% |
| United_States | 1 | 10.0% |
Schema
5 columns| Alerts | ||||
|---|---|---|---|---|
| article | categorical | 0.0% | 10 |
long_tail
|
| avg_daily_views | numeric | 0.0% | 10 |
outliers
|
| peak_views | numeric | 0.0% | 10 |
high_skew
outliers
|
| total_views | numeric | 0.0% | 10 |
outliers
|
| timeline | unknown | 0.0% | — |
skipped
|
article
categorical identifier long_tailThis column holds Wikipedia-style article titles (e.g., Donald_Trump, COVID-19_pandemic, Taylor_Swift) using underscore-separated naming. Every one of the 10 rows is unique (n_unique=10, entropy_ratio=1.0), so it functions as a row identifier rather than a categorical feature. The mix spans people, countries, and topics, with no nulls and no repeated value (top_rate=0.1). Treatment: Use as a join key to Wikipedia metadata; do not one-hot encode.
- n
- 10
- nulls
- 0 (0.0%)
- unique
- 10
- top_value
- Donald_Trump
- top_rate
- 0.1
- cardinality
- 10
- entropy
- 3.322
- entropy_ratio
- 1
avg_daily_views
numeric feature outliersNumeric column capturing average daily views per item, with all 10 rows unique and no nulls or zeros. The distribution is right-skewed (skew 1.57) with a mean of 20484 sitting well above the median of 13139, and a max of 66878 flagged as the lone outlier (10% outlier rate) versus a min of 2199. Standard deviation (19006) nearly matches the mean, signalling high dispersion in a tiny sample. Treatment: Log-transform before regression to tame the right skew and outlier.
- n
- 10
- nulls
- 0 (0.0%)
- unique
- 10
- min
- 2,199
- max
- 66,878
- mean
- 20,484
- median
- 13,139
- std
- 1.901e+04
- q1
- 1.087e+04
- q3
- 23,820
- iqr
- 1.295e+04
- skew
- 1.573
- kurtosis
- 1.608
- n_outliers
- 1
- outlier_rate
- 0.1
- zero_rate
- 0
peak_views
numeric feature high_skew outliersThis appears to be a peak view-count metric per item, with all 10 rows unique and no nulls. The distribution is heavily right-skewed (skew 2.61, kurtosis 4.93): the median is 22111 while the mean is 101490.2 and the max reaches 739258, roughly 12x the Q3 of 58074.25. One outlier (10% of rows) is dragging the standard deviation (225418) far above the IQR (43543.25). Treatment: Log-transform before any modelling or aggregation to tame the skew and outlier.
- n
- 10
- nulls
- 0 (0.0%)
- unique
- 10
- min
- 3,613
- max
- 739,258
- mean
- 1.015e+05
- median
- 22,111
- std
- 2.254e+05
- q1
- 14,531
- q3
- 5.807e+04
- iqr
- 4.354e+04
- skew
- 2.609
- kurtosis
- 4.928
- n_outliers
- 1
- outlier_rate
- 0.1
- zero_rate
- 0
total_views
numeric feature outliersNumeric view-count metric across just 10 rows, all distinct and non-null. Distribution is right-skewed (skew 1.57) with values ranging from 200,122 to 6,085,895 against a median of 1,195,666.5, and one row (10%) flagged as an outlier pulling the mean up to 1,864,031.7. With n=10 the shape estimates are fragile. Treatment: log-transform before any modelling to tame the right skew and outlier.
- n
- 10
- nulls
- 0 (0.0%)
- unique
- 10
- min
- 200,122
- max
- 6.086e+06
- mean
- 1.864e+06
- median
- 1.196e+06
- std
- 1.73e+06
- q1
- 9.891e+05
- q3
- 2.168e+06
- iqr
- 1.178e+06
- skew
- 1.573
- kurtosis
- 1.608
- n_outliers
- 1
- outlier_rate
- 0.1
- zero_rate
- 0
timeline
unknown other skippedThis column, named 'timeline', was skipped by the profiler and has no computed statistics beyond a row count of 10 and a null rate of 0.0. Its kind is reported as 'unknown' and the unique count is missing, so nothing can be said about cardinality, types, or value distribution. The 'skipped' alert is the only substantive signal present. Treatment: Re-profile with an appropriate parser before deciding on downstream use.
- n
- 10
- nulls
- 0 (0.0%)
- unique
- —