saturn·

satire theonion index to dataset

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/satire/theonion_index_to_dataset.csv

Saturn profiled 2,103 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/satire/theonion_index_to_dataset.csv",
    "--findings", "satire-theonion_index_to_dataset.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 2,103 rows and three columns scraped from The Onion: a numeric index, a satirical headline, and an associated image URL. The headlines column is the most substantive — it has 7,613 unique vocabulary tokens, a median of 9 words, and an average Flesch readability score of about 46.9, suggesting typical news-headline phrasing. The image URL column is uniform in structure (every value is a single URL averaging ~108 characters) but contains a roughly 9.9% duplicate rate, with one image reused 11 times — worth a look if you're checking scrape integrity. The numeric index column is a clean 2 → 2104 sequence with no outliers and is essentially just a row identifier.

citing: row_count · column_count · columns[0].stats.duplicate_rate · columns[0].stats.n_duplicates · columns[0].stats.len_mean · columns[0].stats.url_rate · columns[1].stats.word_median · columns[1].stats.vocab_size · columns[1].stats.readability_flesch_mean · columns[1].top_words · columns[2].stats.min · columns[2].stats.max · columns[2].stats.mean

Fig 1.
‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident · Distribution of headline character lengths — check the spread around the 58-character median and the long tail up to 926.
Show data table
Character-length distribution for ‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident (mean: 60.99048977650975).
charscount
8 – 31131
31 – 54781
54 – 77719
77 – 100353
100 – 12399
123 – 14613
146 – 1694
169 – 1920
192 – 2150
215 – 2380
238 – 2601
260 – 2831
283 – 3060
306 – 3290
329 – 3520
352 – 3750
375 – 3980
398 – 4210
421 – 4440
444 – 4670
467 – 4900
490 – 5130
513 – 5360
536 – 5590
559 – 5820
582 – 6050
605 – 6280
628 – 6510
651 – 6740
674 – 6960
696 – 7190
719 – 7420
742 – 7650
765 – 7880
788 – 8110
811 – 8340
834 – 8570
857 – 8800
880 – 9030
903 – 9261
Fig 2.
‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident · Top words across headlines highlight the common connectors (to, of, in) that dominate Onion-style phrasing.
Show data table
Character-length distribution for ‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident (mean: 60.99048977650975).
charscount
8 – 31131
31 – 54781
54 – 77719
77 – 100353
100 – 12399
123 – 14613
146 – 1694
169 – 1920
192 – 2150
215 – 2380
238 – 2601
260 – 2831
283 – 3060
306 – 3290
329 – 3520
352 – 3750
375 – 3980
398 – 4210
421 – 4440
444 – 4670
467 – 4900
490 – 5130
513 – 5360
536 – 5590
559 – 5820
582 – 6050
605 – 6280
628 – 6510
651 – 6740
674 – 6960
696 – 7190
719 – 7420
742 – 7650
765 – 7880
788 – 8110
811 – 8340
834 – 8570
857 – 8800
880 – 9030
903 – 9261
Fig 3.
https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg · Most-repeated image URLs — useful for spotting the ~10% duplicate images, including one used 11 times.
Show data table
Character-length distribution for https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg (mean: 107.88445078459344).
charscount
107 – 1071948
107 – 1080
108 – 1080
108 – 1080
108 – 1080
108 – 1090
109 – 1090
109 – 1090
109 – 1100
110 – 1100
110 – 1100
110 – 1110
111 – 1110
111 – 1110
111 – 1120
112 – 1120
112 – 1120
112 – 1120
112 – 1130
113 – 1130
113 – 1130
113 – 1140
114 – 1140
114 – 1140
114 – 1140
114 – 1150
115 – 1150
115 – 1150
115 – 1160
116 – 1160
116 – 1160
116 – 1170
117 – 1170
117 – 1170
117 – 1180
118 – 1180
118 – 1180
118 – 1180
118 – 1190
119 – 119155
Fig 4.
1 · The index column is a uniform 2–2104 sequence; flat shape confirms it is just a row identifier.
Show data table
Histogram bins for 1 (median: 1053.0).
bincount
2 – 54.5553
54.55 – 107.153
107.1 – 159.652
159.6 – 212.253
212.2 – 264.852
264.8 – 317.353
317.3 – 369.852
369.8 – 422.453
422.4 – 474.952
474.9 – 527.553
527.5 – 58053
580 – 632.652
632.6 – 685.153
685.1 – 737.752
737.7 – 790.253
790.2 – 842.852
842.8 – 895.353
895.3 – 947.952
947.9 – 100053
1000 – 105352
1053 – 110653
1106 – 115853
1158 – 121152
1211 – 126353
1263 – 131652
1316 – 136853
1368 – 142152
1421 – 147353
1473 – 152652
1526 – 157853
1578 – 163153
1631 – 168452
1684 – 173653
1736 – 178952
1789 – 184153
1841 – 189452
1894 – 194653
1946 – 199952
1999 – 205153
2051 – 210453
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
1numeric0.0%
‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Residenttext0.0%
https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpgtext0.0%

1 numeric identifier

Column '1' is a perfectly unique integer running from 2 to 2104 across all 2103 rows, with zero nulls and a symmetric, flat distribution (skew 0.0, kurtosis -1.2, mean = median = 1053). The contiguous range and one-to-one cardinality strongly suggest a row index or sequential identifier rather than a measured feature.

Treatment: drop before modelling; it is a sequential row id with no predictive content.

anthropic:claude-opus-4-7 · confidence high
Out[11]:

saturn.columns["1"].stats

statvalue
n2,103
nulls0 (0.0%)
unique2,103
min 2
max 2,104
mean 1,053
median 1,053
std 607.2
q1 527.5
q3 1578
iqr 1,051
skew 0
kurtosis -1.2
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 6.
Distribution of 1. Vertical dash marks the median.
Show data table
Histogram bins for 1 (median: 1053.0).
bincount
2 – 54.5553
54.55 – 107.153
107.1 – 159.652
159.6 – 212.253
212.2 – 264.852
264.8 – 317.353
317.3 – 369.852
369.8 – 422.453
422.4 – 474.952
474.9 – 527.553
527.5 – 58053
580 – 632.652
632.6 – 685.153
685.1 – 737.752
737.7 – 790.253
790.2 – 842.852
842.8 – 895.353
895.3 – 947.952
947.9 – 100053
1000 – 105352
1053 – 110653
1106 – 115853
1158 – 121152
1211 – 126353
1263 – 131652
1316 – 136853
1368 – 142152
1421 – 147353
1473 – 152652
1526 – 157853
1578 – 163153
1631 – 168452
1684 – 173653
1736 – 178952
1789 – 184153
1841 – 189452
1894 – 194653
1946 – 199952
1999 – 205153
2051 – 210453

‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident text free_text

This column appears to hold short English headlines (mean 9.27 words, median length 58 chars, max 926), with the column name itself being one such headline. Of 2103 rows, 2095 are unique and only 8 duplicates exist, giving near-unique cardinality, while top words are common function words like 'to', 'of', 'in' consistent with news titles. Flesch readability averages 46.87 (moderately difficult), and there are no URLs, emoji, or empty strings.

Treatment: tokenize and embed before modelling; do not use as a categorical key given near-unique values.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident"].stats

statvalue
n2,103
nulls0 (0.0%)
unique2,095
len_min 8
len_max 926
len_mean 60.99
len_median 58
len_p95 102
word_mean 9.268
word_median 9
n_empty 0
n_duplicates 8
duplicate_rate 0.003804
vocab_size 7,613
readability_flesch_mean 46.87
emoji_rate 0
url_rate 0
one_word_rate 0.0004755
allcaps_rate 0.0004755
boilerplate_rate 0.0004755
alert: near_unique99.6% of rows are unique strings
Fig 7.
Character-length distribution for ‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident.
Show data table
Character-length distribution for ‘That’ll Be $3,’ Says Trump After Handing Water Bottle To Sick Ohio Resident (mean: 60.99048977650975).
charscount
8 – 31131
31 – 54781
54 – 77719
77 – 100353
100 – 12399
123 – 14613
146 – 1694
169 – 1920
192 – 2150
215 – 2380
238 – 2601
260 – 2831
283 – 3060
306 – 3290
329 – 3520
352 – 3750
375 – 3980
398 – 4210
421 – 4440
444 – 4670
467 – 4900
490 – 5130
513 – 5360
536 – 5590
559 – 5820
582 – 6050
605 – 6280
628 – 6510
651 – 6740
674 – 6960
696 – 7190
719 – 7420
742 – 7650
765 – 7880
788 – 8110
811 – 8340
834 – 8570
857 – 8800
880 – 9030
903 – 9261

https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg text metadata

This column holds Kinja (Gawker Media) CDN image URLs, all sharing the same transform path (c_fit,f_auto,g_center,q_60,w_645) and differing only by a hashed filename, with url_rate 1.0 and one_word_rate 1.0. Lengths are nearly fixed (min 107, max 119, median 107), and across 2103 rows there are 1894 unique values with a 9.9% duplicate_rate (209 dupes), the top URL recurring 11 times. The column header itself is a URL, suggesting the file was loaded without a proper header row.

Treatment: Treat as an image asset reference; fix the header, then either drop or fetch/encode the image separately rather than feeding the URL string into a model.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg"].stats

statvalue
n2,103
nulls0 (0.0%)
unique1,894
len_min 107
len_max 119
len_mean 107.9
len_median 107
len_p95 119
word_mean 1
word_median 1
n_empty 0
n_duplicates 209
duplicate_rate 0.09938
vocab_size 1,894
readability_flesch_mean -1672
emoji_rate 0
url_rate 1
one_word_rate 1
allcaps_rate 0
boilerplate_rate 0
alert: one_word100.0% rows are a single word
alert: url_heavy100.0% rows contain a URL
Fig 8.
Character-length distribution for https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg.
Show data table
Character-length distribution for https://i.kinja-img.com/gawker-media/image/upload/c_fit,f_auto,g_center,q_60,w_645/a30d02609da7e7ddb46d6edea9460d5e.jpg (mean: 107.88445078459344).
charscount
107 – 1071948
107 – 1080
108 – 1080
108 – 1080
108 – 1080
108 – 1090
109 – 1090
109 – 1090
109 – 1100
110 – 1100
110 – 1100
110 – 1110
111 – 1110
111 – 1110
111 – 1120
112 – 1120
112 – 1120
112 – 1120
112 – 1130
113 – 1130
113 – 1130
113 – 1140
114 – 1140
114 – 1140
114 – 1140
114 – 1150
115 – 1150
115 – 1150
115 – 1160
116 – 1160
116 – 1160
116 – 1170
117 – 1170
117 – 1170
117 – 1180
118 – 1180
118 – 1180
118 – 1180
118 – 1190
119 – 119155

How to cite

click to copy

BibTeX
@misc{saturn-satire-theonion-index-to-dataset-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: satire theonion index to dataset},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/satire-theonion_index_to_dataset}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: satire theonion index to dataset. Source: /home/coolhand/html/datavis/data_trove/entertainment/satire/theonion_index_to_dataset.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/satire-theonion_index_to_dataset