saturn·

quirky social actions

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json

Saturn profiled 2,000 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json",
    "--findings", "quirky-social_actions.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset has 2,000 rows and 3 columns: a numeric `count` and two near-identical text fields (`name` and `full`) that look like short phrases about social behavior. The `count` column is extremely right-skewed (skew 6.26, kurtosis 76.6) with a median of 14 but a max of 461 and 85 outliers — worth investigating before any averaging. The two text columns are essentially twins: same length profile (mean ~28 chars, ~4.5 words), same top words (`your`, `being`, `to`, `a`), and overlapping vocab sizes (1628 vs 1626), suggesting `full` may be a near-duplicate or light reformat of `name`. Start by inspecting the `count` distribution on a log scale and spot-checking a few rows to see how `name` and `full` actually differ.

citing: row_count · column_count · columns.count.stats.skew · columns.count.stats.kurtosis · columns.count.stats.median · columns.count.stats.max · columns.count.stats.n_outliers · columns.full.stats.len_mean · columns.full.stats.word_mean · columns.full.top_words · columns.name.stats.len_mean · columns.name.top_words · columns.full.stats.vocab_size · columns.name.stats.vocab_size

Out[4]:

saturn.schema() · 3 columns

column kind n null% unique alerts
name text 2,000 0.0% 2,000 near_unique
count numeric 2,000 0.0% 99 high_skew
full text 2,000 0.0% 2,000 near_unique
Fig 1.
count · Look for the heavy right tail — most values cluster low but a few reach into the hundreds.
Show data table
Histogram bins for count (median: 14.0).
bincount
8 – 19.321268
19.32 – 30.65274
30.65 – 41.97111
41.97 – 53.3251
53.3 – 64.6240
64.62 – 75.9512
75.95 – 87.2712
87.27 – 98.69
98.6 – 109.94
109.9 – 121.21
121.2 – 132.61
132.6 – 143.94
143.9 – 155.23
155.2 – 166.53
166.5 – 177.91
177.9 – 189.20
189.2 – 200.50
200.5 – 211.81
211.8 – 223.23
223.2 – 234.50
234.5 – 245.80
245.8 – 257.10
257.1 – 268.50
268.5 – 279.80
279.8 – 291.10
291.1 – 302.40
302.4 – 313.80
313.8 – 325.10
325.1 – 336.41
336.4 – 347.80
347.8 – 359.10
359.1 – 370.40
370.4 – 381.70
381.7 – 3930
393 – 404.40
404.4 – 415.70
415.7 – 4270
427 – 438.30
438.3 – 449.70
449.7 – 4611
Fig 2.
full · Phrase-length distribution centered around 24 characters with a p95 near 59.
Show data table
Character-length distribution for full (mean: 28.2645).
charscount
5 – 814
8 – 1029
10 – 1351
13 – 16187
16 – 18259
18 – 21157
21 – 23222
23 – 26151
26 – 29183
29 – 31138
31 – 3492
34 – 3693
36 – 3975
39 – 4261
42 – 4447
44 – 4728
47 – 5030
50 – 5230
52 – 5524
55 – 5819
58 – 6020
60 – 639
63 – 6519
65 – 6811
68 – 7120
71 – 738
73 – 763
76 – 784
78 – 813
81 – 841
84 – 860
86 – 891
89 – 920
92 – 942
94 – 972
97 – 1004
100 – 1020
102 – 1051
105 – 1071
107 – 1101
Fig 3.
name · Compare to `full` — the length profile is nearly identical, hinting these columns overlap.
Show data table
Character-length distribution for name (mean: 28.153).
charscount
5 – 75
7 – 920
9 – 1118
11 – 1251
12 – 14103
14 – 16168
16 – 18175
18 – 2080
20 – 22160
22 – 24139
24 – 26151
26 – 28136
28 – 2987
29 – 3198
31 – 3392
33 – 3527
35 – 3766
37 – 3955
39 – 4155
41 – 4244
42 – 4429
44 – 4628
46 – 4821
48 – 509
50 – 5222
52 – 5423
54 – 5616
56 – 5812
58 – 5912
59 – 6115
61 – 637
63 – 657
65 – 6714
67 – 699
69 – 7115
71 – 724
72 – 746
74 – 762
76 – 783
78 – 8016
Fig 4.
full · Top words like `your`, `being`, and `people` reveal the dataset's social/behavioral framing.
Show data table
Character-length distribution for full (mean: 28.2645).
charscount
5 – 814
8 – 1029
10 – 1351
13 – 16187
16 – 18259
18 – 21157
21 – 23222
23 – 26151
26 – 29183
29 – 31138
31 – 3492
34 – 3693
36 – 3975
39 – 4261
42 – 4447
44 – 4728
47 – 5030
50 – 5230
52 – 5524
55 – 5819
58 – 6020
60 – 639
63 – 6519
65 – 6811
68 – 7120
71 – 738
73 – 763
76 – 784
78 – 813
81 – 841
84 – 860
86 – 891
89 – 920
92 – 942
94 – 972
97 – 1004
100 – 1020
102 – 1051
105 – 1071
107 – 1101
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
countnumeric0.0%
fulltext0.0%

name text free_text

Despite the column header 'name', the values are short free-text phrases averaging 4.48 words (median 4) and up to 80 characters, with top tokens like 'your', 'being', 'to', and 'people' suggesting descriptive statements rather than proper names. All 2000 rows are unique with zero nulls or duplicates, and a Flesch readability of 51.7 indicates ordinary prose rather than identifiers.

Treatment: Tokenize and embed before modelling; do not treat as a key despite the column name.

anthropic:claude-opus-4-7 · confidence medium
Out[11]:

saturn.columns["name"].stats

statvalue
n2,000
nulls0 (0.0%)
unique2,000
len_min 5
len_max 80
len_mean 28.15
len_median 24
len_p95 59
word_mean 4.48
word_median 4
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,626
readability_flesch_mean 51.69
emoji_rate 0
url_rate 0
one_word_rate 0.0215
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 6.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 28.153).
charscount
5 – 75
7 – 920
9 – 1118
11 – 1251
12 – 14103
14 – 16168
16 – 18175
18 – 2080
20 – 22160
22 – 24139
24 – 26151
26 – 28136
28 – 2987
29 – 3198
31 – 3392
33 – 3527
35 – 3766
37 – 3955
39 – 4155
41 – 4244
42 – 4429
44 – 4628
46 – 4821
48 – 509
50 – 5222
52 – 5423
54 – 5616
56 – 5812
58 – 5912
59 – 6115
61 – 637
63 – 657
65 – 6714
67 – 699
69 – 7115
71 – 724
72 – 746
74 – 762
76 – 783
78 – 8016

count numeric feature

A non-negative integer count with 99 distinct values across 2000 rows and no nulls or zeros. The distribution is severely right-skewed (skew 6.26, kurtosis 76.6): the median is 14 and Q3 is 28, yet the max reaches 461, producing 85 outliers (4.25%). Mean (23.17) sits well above median, confirming a heavy tail rather than a symmetric spread.

Treatment: log1p-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["count"].stats

statvalue
n2,000
nulls0 (0.0%)
unique99
min 8
max 461
mean 23.17
median 14
std 24.32
q1 10
q3 28
iqr 18
skew 6.264
kurtosis 76.59
n_outliers 85
outlier_rate 0.0425
zero_rate 0
alert: high_skewskew=+6.26
Fig 7.
Distribution of count. Vertical dash marks the median.
Show data table
Histogram bins for count (median: 14.0).
bincount
8 – 19.321268
19.32 – 30.65274
30.65 – 41.97111
41.97 – 53.3251
53.3 – 64.6240
64.62 – 75.9512
75.95 – 87.2712
87.27 – 98.69
98.6 – 109.94
109.9 – 121.21
121.2 – 132.61
132.6 – 143.94
143.9 – 155.23
155.2 – 166.53
166.5 – 177.91
177.9 – 189.20
189.2 – 200.50
200.5 – 211.81
211.8 – 223.23
223.2 – 234.50
234.5 – 245.80
245.8 – 257.10
257.1 – 268.50
268.5 – 279.80
279.8 – 291.10
291.1 – 302.40
302.4 – 313.80
313.8 – 325.10
325.1 – 336.41
336.4 – 347.80
347.8 – 359.10
359.1 – 370.40
370.4 – 381.70
381.7 – 3930
393 – 404.40
404.4 – 415.70
415.7 – 4270
427 – 438.30
438.3 – 449.70
449.7 – 4611

full text free_text

The `full` column holds short English phrases averaging 4.5 words (median 4) and 28 characters, with every one of the 2000 rows unique and no duplicates or empties. Top words like "your", "being", "having", and "people" suggest these are descriptive statements or prompts rather than names or codes. Flesch readability around 51 indicates fairly plain prose, and the vocabulary of 1628 distinct words across 2000 short rows points to varied but thematically related content.

Treatment: Tokenize and embed before modelling; do not treat as a categorical.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["full"].stats

statvalue
n2,000
nulls0 (0.0%)
unique2,000
len_min 5
len_max 110
len_mean 28.26
len_median 24
len_p95 59
word_mean 4.495
word_median 4
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 1,628
readability_flesch_mean 51.63
emoji_rate 0
url_rate 0
one_word_rate 0.0215
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
Fig 8.
Character-length distribution for full.
Show data table
Character-length distribution for full (mean: 28.2645).
charscount
5 – 814
8 – 1029
10 – 1351
13 – 16187
16 – 18259
18 – 21157
21 – 23222
23 – 26151
26 – 29183
29 – 31138
31 – 3492
34 – 3693
36 – 3975
39 – 4261
42 – 4447
44 – 4728
47 – 5030
50 – 5230
52 – 5524
55 – 5819
58 – 6020
60 – 639
63 – 6519
65 – 6811
68 – 7120
71 – 738
73 – 763
76 – 784
78 – 813
81 – 841
84 – 860
86 – 891
89 – 920
92 – 942
94 – 972
97 – 1004
100 – 1020
102 – 1051
105 – 1071
107 – 1101

How to cite

click to copy

BibTeX
@misc{saturn-quirky-social-actions-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky social actions},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-social_actions}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky social actions. Source: /home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-social_actions