quirky social actions

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json

Saturn profiled 2,000 rows across 3 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json",
    "--findings", "quirky-social_actions.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset has 2,000 rows and 3 columns: a numeric `count` and two near-identical text fields (`name` and `full`) that look like short phrases about social behavior. The `count` column is extremely right-skewed (skew 6.26, kurtosis 76.6) with a median of 14 but a max of 461 and 85 outliers — worth investigating before any averaging. The two text columns are essentially twins: same length profile (mean ~28 chars, ~4.5 words), same top words (`your`, `being`, `to`, `a`), and overlapping vocab sizes (1628 vs 1626), suggesting `full` may be a near-duplicate or light reformat of `name`. Start by inspecting the `count` distribution on a log scale and spot-checking a few rows to see how `name` and `full` actually differ.

citing: row_count · column_count · columns.count.stats.skew · columns.count.stats.kurtosis · columns.count.stats.median · columns.count.stats.max · columns.count.stats.n_outliers · columns.full.stats.len_mean · columns.full.stats.word_mean · columns.full.top_words · columns.name.stats.len_mean · columns.name.top_words · columns.full.stats.vocab_size · columns.name.stats.vocab_size

Out[4]:

saturn.schema() · 3 columns

column	kind	n	null%	unique	alerts
name	text	2,000	0.0%	2,000	near_unique
count	numeric	2,000	0.0%	99	high_skew
full	text	2,000	0.0%	2,000	near_unique

Fig 1.

count · Look for the heavy right tail — most values cluster low but a few reach into the hundreds.

Show data table

Histogram bins for count (median: 14.0).
bin	count
8 – 19.32	1268
19.32 – 30.65	274
30.65 – 41.97	111
41.97 – 53.3	251
53.3 – 64.62	40
64.62 – 75.95	12
75.95 – 87.27	12
87.27 – 98.6	9
98.6 – 109.9	4
109.9 – 121.2	1
121.2 – 132.6	1
132.6 – 143.9	4
143.9 – 155.2	3
155.2 – 166.5	3
166.5 – 177.9	1
177.9 – 189.2	0
189.2 – 200.5	0
200.5 – 211.8	1
211.8 – 223.2	3
223.2 – 234.5	0
234.5 – 245.8	0
245.8 – 257.1	0
257.1 – 268.5	0
268.5 – 279.8	0
279.8 – 291.1	0
291.1 – 302.4	0
302.4 – 313.8	0
313.8 – 325.1	0
325.1 – 336.4	1
336.4 – 347.8	0
347.8 – 359.1	0
359.1 – 370.4	0
370.4 – 381.7	0
381.7 – 393	0
393 – 404.4	0
404.4 – 415.7	0
415.7 – 427	0
427 – 438.3	0
438.3 – 449.7	0
449.7 – 461	1

Fig 2.

full · Phrase-length distribution centered around 24 characters with a p95 near 59.

Show data table

Character-length distribution for full (mean: 28.2645).
chars	count
5 – 8	14
8 – 10	29
10 – 13	51
13 – 16	187
16 – 18	259
18 – 21	157
21 – 23	222
23 – 26	151
26 – 29	183
29 – 31	138
31 – 34	92
34 – 36	93
36 – 39	75
39 – 42	61
42 – 44	47
44 – 47	28
47 – 50	30
50 – 52	30
52 – 55	24
55 – 58	19
58 – 60	20
60 – 63	9
63 – 65	19
65 – 68	11
68 – 71	20
71 – 73	8
73 – 76	3
76 – 78	4
78 – 81	3
81 – 84	1
84 – 86	0
86 – 89	1
89 – 92	0
92 – 94	2
94 – 97	2
97 – 100	4
100 – 102	0
102 – 105	1
105 – 107	1
107 – 110	1

Fig 3.

name · Compare to `full` — the length profile is nearly identical, hinting these columns overlap.

Show data table

Character-length distribution for name (mean: 28.153).
chars	count
5 – 7	5
7 – 9	20
9 – 11	18
11 – 12	51
12 – 14	103
14 – 16	168
16 – 18	175
18 – 20	80
20 – 22	160
22 – 24	139
24 – 26	151
26 – 28	136
28 – 29	87
29 – 31	98
31 – 33	92
33 – 35	27
35 – 37	66
37 – 39	55
39 – 41	55
41 – 42	44
42 – 44	29
44 – 46	28
46 – 48	21
48 – 50	9
50 – 52	22
52 – 54	23
54 – 56	16
56 – 58	12
58 – 59	12
59 – 61	15
61 – 63	7
63 – 65	7
65 – 67	14
67 – 69	9
69 – 71	15
71 – 72	4
72 – 74	6
74 – 76	2
76 – 78	3
78 – 80	16

Fig 4.

full · Top words like `your`, `being`, and `people` reveal the dataset's social/behavioral framing.

Show data table

Character-length distribution for full (mean: 28.2645).
chars	count
5 – 8	14
8 – 10	29
10 – 13	51
13 – 16	187
16 – 18	259
18 – 21	157
21 – 23	222
23 – 26	151
26 – 29	183
29 – 31	138
31 – 34	92
34 – 36	93
36 – 39	75
39 – 42	61
42 – 44	47
44 – 47	28
47 – 50	30
50 – 52	30
52 – 55	24
55 – 58	19
58 – 60	20
60 – 63	9
63 – 65	19
65 – 68	11
68 – 71	20
71 – 73	8
73 – 76	3
76 – 78	4
78 – 81	3
81 – 84	1
84 – 86	0
86 – 89	1
89 – 92	0
92 – 94	2
94 – 97	2
97 – 100	4
100 – 102	0
102 – 105	1
105 – 107	1
107 – 110	1

Fig 5.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
name	text	0.0%
count	numeric	0.0%
full	text	0.0%

name text free_text

Despite the column header 'name', the values are short free-text phrases averaging 4.48 words (median 4) and up to 80 characters, with top tokens like 'your', 'being', 'to', and 'people' suggesting descriptive statements rather than proper names. All 2000 rows are unique with zero nulls or duplicates, and a Flesch readability of 51.7 indicates ordinary prose rather than identifiers.

Treatment: Tokenize and embed before modelling; do not treat as a key despite the column name.

anthropic:claude-opus-4-7 · confidence medium

Out[11]:

saturn.columns["name"].stats

stat	value
n	2,000
nulls	0 (0.0%)
unique	2,000
len_min	5
len_max	80
len_mean	28.15
len_median	24
len_p95	59
word_mean	4.48
word_median	4
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,626
readability_flesch_mean	51.69
emoji_rate	0
url_rate	0
one_word_rate	0.0215
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 6.

Character-length distribution for name.

Show data table

Character-length distribution for name (mean: 28.153).
chars	count
5 – 7	5
7 – 9	20
9 – 11	18
11 – 12	51
12 – 14	103
14 – 16	168
16 – 18	175
18 – 20	80
20 – 22	160
22 – 24	139
24 – 26	151
26 – 28	136
28 – 29	87
29 – 31	98
31 – 33	92
33 – 35	27
35 – 37	66
37 – 39	55
39 – 41	55
41 – 42	44
42 – 44	29
44 – 46	28
46 – 48	21
48 – 50	9
50 – 52	22
52 – 54	23
54 – 56	16
56 – 58	12
58 – 59	12
59 – 61	15
61 – 63	7
63 – 65	7
65 – 67	14
67 – 69	9
69 – 71	15
71 – 72	4
72 – 74	6
74 – 76	2
76 – 78	3
78 – 80	16

count numeric feature

A non-negative integer count with 99 distinct values across 2000 rows and no nulls or zeros. The distribution is severely right-skewed (skew 6.26, kurtosis 76.6): the median is 14 and Q3 is 28, yet the max reaches 461, producing 85 outliers (4.25%). Mean (23.17) sits well above median, confirming a heavy tail rather than a symmetric spread.

Treatment: log1p-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[14]:

saturn.columns["count"].stats

stat	value
n	2,000
nulls	0 (0.0%)
unique	99
min	8
max	461
mean	23.17
median	14
std	24.32
q1	10
q3	28
iqr	18
skew	6.264
kurtosis	76.59
n_outliers	85
outlier_rate	0.0425
zero_rate	0
alert: high_skew	skew=+6.26

Fig 7.

Distribution of count. Vertical dash marks the median.

Show data table

Histogram bins for count (median: 14.0).
bin	count
8 – 19.32	1268
19.32 – 30.65	274
30.65 – 41.97	111
41.97 – 53.3	251
53.3 – 64.62	40
64.62 – 75.95	12
75.95 – 87.27	12
87.27 – 98.6	9
98.6 – 109.9	4
109.9 – 121.2	1
121.2 – 132.6	1
132.6 – 143.9	4
143.9 – 155.2	3
155.2 – 166.5	3
166.5 – 177.9	1
177.9 – 189.2	0
189.2 – 200.5	0
200.5 – 211.8	1
211.8 – 223.2	3
223.2 – 234.5	0
234.5 – 245.8	0
245.8 – 257.1	0
257.1 – 268.5	0
268.5 – 279.8	0
279.8 – 291.1	0
291.1 – 302.4	0
302.4 – 313.8	0
313.8 – 325.1	0
325.1 – 336.4	1
336.4 – 347.8	0
347.8 – 359.1	0
359.1 – 370.4	0
370.4 – 381.7	0
381.7 – 393	0
393 – 404.4	0
404.4 – 415.7	0
415.7 – 427	0
427 – 438.3	0
438.3 – 449.7	0
449.7 – 461	1

full text free_text

The `full` column holds short English phrases averaging 4.5 words (median 4) and 28 characters, with every one of the 2000 rows unique and no duplicates or empties. Top words like "your", "being", "having", and "people" suggest these are descriptive statements or prompts rather than names or codes. Flesch readability around 51 indicates fairly plain prose, and the vocabulary of 1628 distinct words across 2000 short rows points to varied but thematically related content.

Treatment: Tokenize and embed before modelling; do not treat as a categorical.

anthropic:claude-opus-4-7 · confidence high

Out[17]:

saturn.columns["full"].stats

stat	value
n	2,000
nulls	0 (0.0%)
unique	2,000
len_min	5
len_max	110
len_mean	28.26
len_median	24
len_p95	59
word_mean	4.495
word_median	4
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	1,628
readability_flesch_mean	51.63
emoji_rate	0
url_rate	0
one_word_rate	0.0215
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings

Fig 8.

Character-length distribution for full.

Show data table

Character-length distribution for full (mean: 28.2645).
chars	count
5 – 8	14
8 – 10	29
10 – 13	51
13 – 16	187
16 – 18	259
18 – 21	157
21 – 23	222
23 – 26	151
26 – 29	183
29 – 31	138
31 – 34	92
34 – 36	93
36 – 39	75
39 – 42	61
42 – 44	47
44 – 47	28
47 – 50	30
50 – 52	30
52 – 55	24
55 – 58	19
58 – 60	20
60 – 63	9
63 – 65	19
65 – 68	11
68 – 71	20
71 – 73	8
73 – 76	3
76 – 78	4
78 – 81	3
81 – 84	1
84 – 86	0
86 – 89	1
89 – 92	0
92 – 94	2
94 – 97	2
97 – 100	4
100 – 102	0
102 – 105	1
105 – 107	1
107 – 110	1

How to cite

click to copy

BibTeX

@misc{saturn-quirky-social-actions-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky social actions},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-social_actions}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}

APA

Steuber, L. (2026). Saturn reading: quirky social actions. Source: /home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-social_actions