saturn·

quirky social actions

source /home/coolhand/html/datavis/data_trove/data/quirky/social_actions.json 2,000 rows 3 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset has 2,000 rows and 3 columns: a numeric `count` and two near-identical text fields (`name` and `full`) that look like short phrases about social behavior. The `count` column is extremely right-skewed (skew 6.26, kurtosis 76.6) with a median of 14 but a max of 461 and 85 outliers — worth investigating before any averaging. The two text columns are essentially twins: same length profile (mean ~28 chars, ~4.5 words), same top words (`your`, `being`, `to`, `a`), and overlapping vocab sizes (1628 vs 1626), suggesting `full` may be a near-duplicate or light reformat of `name`. Start by inspecting the `count` distribution on a log scale and spot-checking a few rows to see how `name` and `full` actually differ.

citing: row_count · column_count · columns.count.stats.skew · columns.count.stats.kurtosis · columns.count.stats.median · columns.count.stats.max · columns.count.stats.n_outliers · columns.full.stats.len_mean · columns.full.stats.word_mean · columns.full.top_words · columns.name.stats.len_mean · columns.name.top_words · columns.full.stats.vocab_size · columns.name.stats.vocab_size

Schema

3 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name text 0.0% 2,000
near_unique
count numeric 0.0% 99
high_skew
full text 0.0% 2,000
near_unique

name

text free_text near_unique
Despite the column header 'name', the values are short free-text phrases averaging 4.48 words (median 4) and up to 80 characters, with top tokens like 'your', 'being', 'to', and 'people' suggesting descriptive statements rather than proper names. All 2000 rows are unique with zero nulls or duplicates, and a Flesch readability of 51.7 indicates ordinary prose rather than identifiers. Treatment: Tokenize and embed before modelling; do not treat as a key despite the column name. medium · anthropic:claude-opus-4-7
n
2,000
nulls
0 (0.0%)
unique
2,000
len_min
5
len_max
80
len_mean
28.15
len_median
24
len_p95
59
word_mean
4.48
word_median
4
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,626
readability_flesch_mean
51.69
emoji_rate
0
url_rate
0
one_word_rate
0.0215
allcaps_rate
0
boilerplate_rate
0

count

numeric feature high_skew
A non-negative integer count with 99 distinct values across 2000 rows and no nulls or zeros. The distribution is severely right-skewed (skew 6.26, kurtosis 76.6): the median is 14 and Q3 is 28, yet the max reaches 461, producing 85 outliers (4.25%). Mean (23.17) sits well above median, confirming a heavy tail rather than a symmetric spread. Treatment: log1p-transform before modelling to tame the heavy right tail. high · anthropic:claude-opus-4-7
n
2,000
nulls
0 (0.0%)
unique
99
min
8
max
461
mean
23.17
median
14
std
24.32
q1
10
q3
28
iqr
18
skew
6.264
kurtosis
76.59
n_outliers
85
outlier_rate
0.0425
zero_rate
0

full

text free_text near_unique
The `full` column holds short English phrases averaging 4.5 words (median 4) and 28 characters, with every one of the 2000 rows unique and no duplicates or empties. Top words like "your", "being", "having", and "people" suggest these are descriptive statements or prompts rather than names or codes. Flesch readability around 51 indicates fairly plain prose, and the vocabulary of 1628 distinct words across 2000 short rows points to varied but thematically related content. Treatment: Tokenize and embed before modelling; do not treat as a categorical. high · anthropic:claude-opus-4-7
n
2,000
nulls
0 (0.0%)
unique
2,000
len_min
5
len_max
110
len_mean
28.26
len_median
24
len_p95
59
word_mean
4.495
word_median
4
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
1,628
readability_flesch_mean
51.63
emoji_rate
0
url_rate
0
one_word_rate
0.0215
allcaps_rate
0
boilerplate_rate
0