saturn

/home/coolhand/html/datavis/data_trove/cache/quirky/social_norms_20260121.parquet 355,922 rows sample n=355,922 seed 42 2026-05-01T23:32:17+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/cache/quirky/social_norms_20260121.parquet
Total rows	355,922
Profiled sample	355,922
Columns	25
Generated	2026-05-01T23:32:17+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is a social-norms annotation dataset of 355,922 rows and 25 columns, where each entry pairs a real-life 'situation' (mostly from Reddit confessions, AmItheAsshole, Dear Abby, and ROCStories) with an 'action', a rule-of-thumb ('rot'), and a battery of moral judgments by crowd workers. The most striking shape feature is heavy duplication in the text fields: 'rot-judgment' is 97% duplicated and 'characters' 91%, because they collapse to short controlled vocabularies, while 'situation' and 'rot' themselves repeat ~71% and ~27% of the time across annotators. Worth a closer look first: the moral-foundation distribution, which is dominated by 'care-harm' (~39% of non-null), and the 'action-legal' field where 93% of actions are tagged 'legal' — both suggest class imbalance that will matter for any modeling. Also note 'area' is reasonably balanced across the four source corpora, but 'split' is heavily skewed toward 'train' (66%).

area high anthropic:claude-opus-4-7

This column tags each record with one of four source areas, with 'confessions' the modal value at 30.3% of 355,922 rows. The distribution is fairly balanced — entropy_ratio of 0.97 indicates near-uniform spread across the four categories, though 'dearabby' (50,300) is roughly half the size of the other three. No nulls and only 4 unique values make this a clean grouping key.

m high anthropic:claude-opus-4-7

Column 'm' is a numeric feature with only 4 distinct values across 355,922 rows, ranging from 1 to 50 with a median of 1 and both Q1 and Q3 equal to 1. The distribution is severely right-skewed (skew 3.77, kurtosis 12.45), with 18.5% of rows flagged as outliers despite a mean of just 4.24. The tiny cardinality combined with extreme spread suggests this is a categorical-like multiplier or count where most records sit at 1 and a few jump to much larger values.

split high anthropic:claude-opus-4-7

Column holds the dataset split assignment across 355922 rows with 7 distinct values and no nulls. 'train' dominates at 65.6% (233501 rows), followed by near-equal 'test' (29239) and 'dev' (29234), plus auxiliary 'test-extra', 'analysis', and 'dev-extra' partitions; a 'none' bucket of 3913 rows is unusual and likely indicates unassigned examples.

rot-agree high anthropic:claude-opus-4-7

This is a 5-value ordinal score (0–4) capturing agreement on some rotation/rationale ('rot-agree'), with a mean of 3.10 and median 3.0 — answers cluster at the high end. The distribution is left-skewed (skew -0.68) with Q1=3 and Q3=4, and 2.4% of rows fall outside the IQR fence as low-end outliers. Nulls are negligible (0.35%) and zeros are rare (0.38%).

rot-categorization high anthropic:claude-opus-4-7

Categorical tag describing the type of 'rule of thumb' (RoT), with 15 distinct values drawn from a small base vocabulary (advice, social-norms, morality-ethics, description) plus pipe-delimited combinations. Distribution is moderately balanced (entropy ratio 0.72); 'advice' leads at 23.5% and the four single-tag values dominate, while compound tags like 'social-norms|advice' (27,657) indicate multi-label encoding stuffed into one string. Null rate is low at 0.86%.

rot-moral-foundations high anthropic:claude-opus-4-7

This column tags each row with one or more of Haidt's moral foundations (care-harm, fairness-cheating, loyalty-betrayal, authority-subversion, sanctity-degradation), with multi-label combinations encoded as pipe-delimited strings yielding 30 distinct values. Distribution is heavily skewed: 'care-harm' alone covers 38.9% of rows, and 21.7% of rows are null. Entropy ratio of 0.60 confirms the long tail collapses quickly after the top few categories.

rot-char-targeting high anthropic:claude-opus-4-7

Categorical tag identifying which character slot a rotation/transform targets, with 7 distinct values dominated by 'char-0' (51.9%) and 'char-1' (123,396 rows). The distribution is sharply long-tailed: 'char-4' and 'char-5' appear only 46 and 15 times respectively, and 23,781 rows are explicitly 'char-none'. Entropy ratio is 0.557, confirming most mass sits in the first two categories. Null rate is low at 0.46%.

rot-bad high anthropic:claude-opus-4-7

Binary 0/1 flag (n_unique=2, min=0, max=1) indicating a rare 'rot-bad' condition. Positives occur at 2.0% (mean=0.0201, zero_rate=0.9799), producing the flagged high skew (6.84) and heavy kurtosis (44.78). The 7,153 'outliers' are simply the positive class, not anomalies.

rot-judgment high anthropic:claude-opus-4-7

Short moral-judgment phrases (mean 2 words, median 9 chars, max 94) drawn from a tight vocabulary of 797 tokens, dominated by verdicts like "It's good", "shouldn't", "It's okay", and "It's wrong". With 97.0% duplicate rate and only 10,589 uniques across 355,922 rows, this behaves as a categorical label rather than free text. Casing is inconsistent (e.g. "It's good" vs "it's good" appear as separate top values), which will inflate cardinality unless normalized.

action high anthropic:claude-opus-4-7

Short English phrases describing an action or behaviour (mean 41.8 chars, median 7 words), e.g. 'being yourself.' or 'cheating on your partner.' — likely the subject of a moral/judgement prompt. Roughly 27% of the 355,922 rows are duplicates (95,292), with the top phrase repeating 461 times, so the same actions recur heavily across records. Language detection flags multilingual but only 2 non-English rows (1 la, 1 no) out of ~5k sampled, so effectively monolingual.

action-agency high anthropic:claude-opus-4-7

Binary categorical flag with only two levels, 'agency' (88.5%) and 'experience' (the remainder), likely indicating which side or channel originated the action. Class imbalance is heavy and 1.84% of rows are null, so any modelling needs to account for both. Entropy ratio of 0.515 reflects the dominance of the majority class.

action-moral-judgment high anthropic:claude-opus-4-7

A discrete moral-judgment rating on a 5-point scale from -2 to 2 (likely Likert-style: very wrong → very right). The mean of -0.178 and median of 0 indicate a slight lean toward negative judgments, with 43.7% of values exactly zero and 12.7% missing. The distribution is nearly symmetric (skew -0.011) and platykurtic, so most ratings cluster near neutral with modest spread (std 0.857).

action-agree high anthropic:claude-opus-4-7

This is a 5-level ordinal feature (values 0-4), almost certainly a Likert-style agreement rating for an 'action' item, with mean 3.10 and median 3.0 indicating a tilt toward agreement. The distribution is left-skewed (skew -0.68) with Q1=3 and Q3=4, so most respondents pick 3 or 4; only 0.38% give zero. Note the 12.48% null rate, which is substantial and likely reflects non-response or skipped items.

action-legal high anthropic:claude-opus-4-7

This is a categorical legal-status flag with only 3 distinct values ('legal', 'tolerated', 'illegal') across 355,922 rows. The distribution is severely imbalanced: 'legal' accounts for 93.2% of records while 'illegal' represents just 5,934 rows, and entropy ratio is only 0.26. Note also that 12.81% of rows are null, so absence of a status is itself a meaningful signal.

action-pressure high anthropic:claude-opus-4-7

A discrete ordinal feature taking only 5 values across the symmetric range -2 to 2, almost certainly a Likert-style pressure rating. The distribution is balanced (mean -0.04, skew -0.05) and centered on 0, which accounts for 35.3% of rows. Notable: 13.08% of rows are null, and despite being numeric there are just 5 unique values.

action-char-involved high anthropic:claude-opus-4-7

Categorical pointer to which character slot is involved in an action, with 7 distinct values dominated by char-0 (51.5%) and char-1 (~31%). The long tail collapses fast: char-3 through char-5 together account for fewer than 1,400 rows, and 13.05% of rows are null while another 22,100 are explicitly tagged 'char-none'. Entropy ratio of 0.56 confirms the heavy concentration on the first two slots.

action-hypothetical high anthropic:claude-opus-4-7

A 5-level categorical labeling whether an action is stated explicitly or only hypothetically/probably, with negative variants ('explicit-no', 'probable-no'). 'explicit' dominates at 47.1% of non-null rows, but 19.04% of values are null, which is substantial. Entropy ratio of 0.84 indicates the remaining classes are reasonably spread rather than collapsed onto a single mode.

situation high anthropic:claude-opus-4-7

Short first-person descriptions of personal situations or dilemmas, averaging 55 characters / 10 words and topping out at 300 chars, with high readability (Flesch 78.5). Massive duplication is the headline issue: 252,626 of 355,922 rows are duplicates (71%), leaving only 103,296 uniques, and several distinct strings repeat exactly 130 times — suggesting templated or oversampled records. Language detection is overwhelmingly English (4,989) with a tiny multilingual tail (6 de, 1 each es/it/nl/pt/ru) that is unlikely to matter at scale.

situation-short-id high anthropic:claude-opus-4-7

Slash-delimited source identifiers pointing back to situations on Reddit (confessions, amitheasshole), Dear Abby columns, and ROCStories sentences. Despite the 'id' name, only 103,692 values are unique across 355,922 rows — a 70.9% duplicate rate, with the most repeated key appearing 180 times, so each situation evidently surfaces in many rows. Every value is a single token (one_word_rate 1.0) up to 99 chars long.

rot high anthropic:claude-opus-4-7

Short English moral/normative statements (mean 10.1 words, max 221 chars), almost certainly rule-of-thumb (RoT) annotations describing what one should or shouldn't do. The vocabulary is small (9,659 types) and readability is high (Flesch 79.4), consistent with simple declarative templates like 'It is good/okay to...' and 'You shouldn't...'. Notable: 27.1% of rows are duplicates (96,308), with single phrases like 'It is good to be yourself.' repeating 287 times, indicating heavy template reuse rather than free generation.

rot-id high anthropic:claude-opus-4-7

Structured path-like identifiers for 'rule of thumb' records, sourced from Reddit (amitheasshole, confessions), Dear Abby columns, and ROCStories. Despite being IDs, only 291974 of 355922 rows are unique (duplicate_rate 0.18, with top values repeating up to 58 times), so this is not a primary key. Every value is a single token (one_word_rate 1.0, word_mean 1.0) with length 63-140 characters, which triggered the one_word alert but is expected for slash-delimited identifiers.

rot-worker-id high anthropic:claude-opus-4-7

Despite being typed numeric, `rot-worker-id` looks like a categorical worker identifier: only 89 unique values across 355,922 rows, no nulls, and no zeros. The distribution is broad and nearly symmetric (skew -0.08, kurtosis -1.37) spanning 2 to 144 with median 83, consistent with arbitrary id codes rather than a measured quantity. No outliers were flagged.

breakdown-worker-id high anthropic:claude-opus-4-7

This appears to be a worker identifier encoded as integers, with 117 distinct values spread roughly uniformly between 0 and 146 (mean 71.75, median 71, skew 0.05, kurtosis -1.26). Despite being stored as numeric, the low cardinality relative to 355,922 rows and the near-flat distribution suggest a categorical key rather than a measurement. No nulls and no outliers, but about 1.4% of rows carry worker id 0, which may be a sentinel.

n-characters high anthropic:claude-opus-4-7

A small-integer count column ranging 1-10 with only 8 unique values across 355,922 rows, mean 2.13 and median 2. The tight IQR (2-3) and low std (0.78) suggest most records cluster around 2-3 characters, with a mild right tail (skew 0.44) producing 1,132 outliers (~0.32%).

characters high anthropic:claude-opus-4-7

This column lists the characters/speakers in each record, encoded as pipe-delimited role strings (e.g. 'narrator|He', 'narrator|my girlfriend'). It's extremely repetitive — 91.1% of 355,922 rows are duplicates, only 31,782 unique values exist, and 'narrator' alone appears 71,601 times. Half the entries are a single token (one_word_rate 0.50, word_median 1), so this behaves more like a low-cardinality categorical tag than free text despite the text kind.

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

area categorical

rows355,922

null0 (0.0%)

unique4

top_valueconfessions

top_rate0.303

cardinality4

entropy1.947

entropy_ratio0.974

Top values (rank 1–20)

confessions — 107,749
rocstories — 101,791
amitheasshole — 96,082
dearabby — 50,300

m numeric

skew=+3.77 18.5% rows beyond 1.5 IQR

rows355,922

null0 (0.0%)

unique4

min1.000

max50.000

mean4.237

median1.000

std11.239

q11.000

q31.000

iqr0.000

skew3.774

kurtosis12.454

n_outliers65,947

outlier_rate0.185

zero_rate0.000

split categorical

rows355,922

null0 (0.0%)

unique7

top_valuetrain

top_rate0.656

cardinality7

entropy1.763

entropy_ratio0.628

Top values (rank 1–20)

train — 233,501
test — 29,239
dev — 29,234
test-extra — 20,050
analysis — 20,000
dev-extra — 19,985
none — 3,913

rot-agree numeric

rows355,922

null1,236 (0.3%)

unique5

min0.000

max4.000

mean3.100

median3.000

std0.744

q13.000

q34.000

iqr1.000

skew-0.681

kurtosis0.787

n_outliers8,547

outlier_rate0.024

zero_rate3.79e-03

rot-categorization categorical

rows355,922

null3,046 (0.9%)

unique15

top_valueadvice

top_rate0.235

cardinality15

entropy2.806

entropy_ratio0.718

Top values (rank 1–20)

advice — 82,786
social-norms — 72,934
morality-ethics — 58,564
description — 58,537
social-norms|advice — 27,657
morality-ethics|social-norms — 27,118
morality-ethics|advice — 11,498
social-norms|description — 6,078
advice|description — 4,785
morality-ethics|description — 2,023
morality-ethics|social-norms|advice — 790
morality-ethics|social-norms|description — 55
morality-ethics|advice|description — 26
social-norms|advice|description — 24
morality-ethics|social-norms|advice|description — 1

rot-moral-foundations categorical

21.7% null

rows355,922

null77,120 (21.7%)

unique30

top_valuecare-harm

top_rate0.389

cardinality30

entropy2.962

entropy_ratio0.604

Top values (rank 1–20)

care-harm — 108,535
fairness-cheating — 37,666
loyalty-betrayal — 34,581
care-harm|loyalty-betrayal — 21,125
authority-subversion — 19,087
sanctity-degradation — 14,657
care-harm|fairness-cheating — 10,787
care-harm|authority-subversion — 7,958
care-harm|sanctity-degradation — 6,328
fairness-cheating|loyalty-betrayal — 6,222
fairness-cheating|authority-subversion — 3,738
loyalty-betrayal|authority-subversion — 2,122
fairness-cheating|sanctity-degradation — 1,218
authority-subversion|sanctity-degradation — 1,042
loyalty-betrayal|sanctity-degradation — 885
care-harm|fairness-cheating|loyalty-betrayal — 834
care-harm|loyalty-betrayal|authority-subversion — 623
care-harm|authority-subversion|sanctity-degradation — 423
care-harm|fairness-cheating|authority-subversion — 416
care-harm|loyalty-betrayal|sanctity-degradation — 183

rot-char-targeting categorical

rows355,922

null1,622 (0.5%)

unique7

top_valuechar-0

top_rate0.519

cardinality7

entropy1.564

entropy_ratio0.557

Top values (rank 1–20)

char-0 — 183,950
char-1 — 123,396
char-none — 23,781
char-2 — 21,648
char-3 — 1,464
char-4 — 46
char-5 — 15

rot-bad numeric

skew=+6.84

rows355,922

null0 (0.0%)

unique2

min0.000

max1.000

mean0.020

median0.000

std0.140

q10.000

q30.000

iqr0.000

skew6.840

kurtosis44.779

n_outliers7,153

outlier_rate0.020

zero_rate0.980

rot-judgment text

95th-percentile length under 20 chars 97.0% duplicate strings

rows355,922

null1 (0.0%)

unique10,589

len_min1

len_max94

len_mean10.463

len_median9.000

len_p9519.000

word_mean2.001

word_median2.000

n_empty0

n_duplicates345,332

duplicate_rate0.970

vocab_size797

readability_flesch_mean83.273

emoji_rate0.000

url_rate0.000

one_word_rate0.208

allcaps_rate5.62e-06

boilerplate_rate0.000

Sample values (first 10)

It's expected
it is good
it's not typically expected
you should
It's not okay
It's not okay
It's bad
shouldn't
it's okay
it's rude

action text

4 languages detected in sample 26.8% duplicate strings

rows355,922

null3 (0.0%)

unique260,627

len_min1

len_max221

len_mean41.762

len_median40.000

len_p9573.000

word_mean6.969

word_median7.000

n_empty0

n_duplicates95,292

duplicate_rate0.268

vocab_size9,430

readability_flesch_mean57.878

emoji_rate0.000

url_rate2.81e-06

one_word_rate6.01e-03

allcaps_rate0.000

boilerplate_rate2.81e-06

Sample values (first 10)

Being hardworking.
calling your credit card company immediately when your credit card balance is wiped out
putting butter on a donut.
finding somewhere private to have sex.
treating a boss badly just because they are a woman.
expecting more than a person can give.
Being pregnant if you don't want to be.
letting your partner dictate what you can and cannot wear.
doing what you want with your body and telling your boyfriend about your wishes.
telling on someone else

action-agency categorical

rows355,922

null6,551 (1.8%)

unique2

top_valueagency

top_rate0.885

cardinality2

entropy0.515

entropy_ratio0.515

Top values (rank 1–20)

agency — 309,148
experience — 40,223

action-moral-judgment numeric

rows355,922

null45,215 (12.7%)

unique5

min-2.000

max2.000

mean-0.178

median0.000

std0.857

q1-1.000

q30.000

iqr1.000

skew-0.011

kurtosis-0.347

n_outliers4,592

outlier_rate0.015

zero_rate0.437

action-agree numeric

rows355,922

null44,402 (12.5%)

unique5

min0.000

max4.000

mean3.101

median3.000

std0.733

q13.000

q34.000

iqr1.000

skew-0.677

kurtosis0.883

n_outliers7,046

outlier_rate0.023

zero_rate3.76e-03

action-legal categorical

rows355,922

null45,608 (12.8%)

unique3

top_valuelegal

top_rate0.932

cardinality3

entropy0.415

entropy_ratio0.262

Top values (rank 1–20)

legal — 289,316
tolerated — 15,064
illegal — 5,934

action-pressure numeric

rows355,922

null46,537 (13.1%)

unique5

min-2.000

max2.000

mean-0.040

median0.000

std1.114

q1-1.000

q31.000

iqr2.000

skew-0.050

kurtosis-0.660

n_outliers0

outlier_rate0.000

zero_rate0.353

action-char-involved categorical

rows355,922

null46,441 (13.0%)

unique7

top_valuechar-0

top_rate0.515

cardinality7

entropy1.578

entropy_ratio0.562

Top values (rank 1–20)

char-0 — 159,381
char-1 — 107,567
char-none — 22,100
char-2 — 19,124
char-3 — 1,252
char-4 — 41
char-5 — 16

action-hypothetical categorical

rows355,922

null67,776 (19.0%)

unique5

top_valueexplicit

top_rate0.471

cardinality5

entropy1.956

entropy_ratio0.842

Top values (rank 1–20)

explicit — 135,744
hypothetical — 62,689
probable — 49,670
explicit-no — 25,022
probable-no — 15,021

situation text

8 languages detected in sample 71.0% duplicate strings

rows355,922

null0 (0.0%)

unique103,296

len_min10

len_max300

len_mean55.418

len_median52.000

len_p95100.000

word_mean10.519

word_median10.000

n_empty0

n_duplicates252,626

duplicate_rate0.710

vocab_size16,855

readability_flesch_mean78.525

emoji_rate0.000

url_rate0.000

one_word_rate0.000

allcaps_rate5.28e-04

boilerplate_rate1.29e-04

Sample values (first 10)

not being "enthusiastic enough" on my first two days of work
He went to some stores and tried them on.
I put Smart Balance Butter on a Stop and Shop glazed donut today.
having sex in the same room my ex was sleeping
I don’t want to be in my relationship anymore.
Woman Runs Out of Patience With Tardy Boyfriend
I stole a lot of money from my parents when I was younger, so now I am slowly repaying them by leaving cash in thier belongings.
Retired Husband's Wardrobe Is Too Casual for Wife's Comfort
confronting a student who walks into classroom 10 minutes before class ends, to wait for his class afterwards
texting my landlord about my housemates

situation-short-id text

100.0% rows are a single word 70.9% duplicate strings

rows355,922

null0 (0.0%)

unique103,692

len_min24

len_max99

len_mean41.295

len_median27.000

len_p9574.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates252,230

duplicate_rate0.709

vocab_size16,878

readability_flesch_mean-443.908

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

reddit/amitheasshole/axfjgx
rocstories/98e079f0-f03c-4a7d-8d2d-3856c07553dd/sentence3
reddit/confessions/ad1mwp
reddit/amitheasshole/aua8x4
reddit/confessions/ergutg
dearabby/title/dearabby_2019_5_16_2_woman-runs-out-of-patience-with
reddit/confessions/da0j01
dearabby/title/dearabby_2018_5_4_0_retired-husbands-wardrobe-is-too-casual
reddit/amitheasshole/avzgme
reddit/amitheasshole/ai0t5o

rot text

27.1% duplicate strings

rows355,922

null0 (0.0%)

unique259,614

len_min6

len_max221

len_mean54.661

len_median53.000

len_p9587.000

word_mean10.101

word_median10.000

n_empty0

n_duplicates96,308

duplicate_rate0.271

vocab_size9,659

readability_flesch_mean79.379

emoji_rate0.000

url_rate2.81e-06

one_word_rate5.62e-06

allcaps_rate0.000

boilerplate_rate5.62e-06

Sample values (first 10)

People are expected to be hardworking.
It is good to make sure something fits before buying it.
It's not typically expected to put butter on a donut.
You should find somewhere private to have sex.
It is good to talk through relationship problems.
It's not okay to expect more than a person can give.
Children are not expected to take things from their parents.
You shouldn't let your partner dictate what you can and cannot wear.
If a someone does something inappropriate, it's okay to confront them on the issue.
It's rude to tell on someone else.

rot-id text

100.0% rows are a single word

rows355,922

null0 (0.0%)

unique291,974

len_min63

len_max140

len_mean81.614

len_median68.000

len_p95114.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates63,948

duplicate_rate0.180

vocab_size18,736

readability_flesch_mean-755.236

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

rot/reddit/amitheasshole/axfjgx/39OWYR0EPN6CSDBD4KNIUO907B9FYM/89/1
rot/rocstories/98e079f0-f03c-4a7d-8d2d-3856c07553dd/sentence3/3TEM0PF1Q8CIXY1W56HJEWCVQD3D0O/30/1
rot/reddit/confessions/ad1mwp/33FOTY3KEP08ZVG01TQ88VDNF9R1CM/93/1
rot/reddit/amitheasshole/aua8x4/3AQF3RZ55BXA9T17Y1SQBXP74YI6F5/42/3
rot/reddit/confessions/ergutg/3DBQWDE4Y9DQEHEAR61YRWKGQBZN5X/102/3
rot/dearabby/title/dearabby_2019_5_16_2_woman-runs-out-of-patience-with/3LOZAJ85YGS3RE9RBTM9RR82324X2H/105/2
rot/reddit/confessions/da0j01/36PW28KO42BJQHDET3PW9K6T2UUEAS/4/2
rot/dearabby/title/dearabby_2018_5_4_0_retired-husbands-wardrobe-is-too-casual/3OF2M9AATJ3NDPDW1HGYO8A7S59KZE/42/3
rot/reddit/amitheasshole/avzgme/3NGMS9VZTOX6SMUIZUVU532KW4VFF4/42/2
rot/reddit/amitheasshole/ai0t5o/32Z9ZLUT1OZKCVYHTN2KVINB92GOHJ/106/3

rot-worker-id numeric

rows355,922

null0 (0.0%)

unique89

min2.000

max144.000

mean72.537

median83.000

std39.608

q142.000

q3105.000

iqr63.000

skew-0.080

kurtosis-1.366

n_outliers0

outlier_rate0.000

zero_rate0.000

breakdown-worker-id numeric

rows355,922

null0 (0.0%)

unique117

min0.000

max146.000

mean71.747

median71.000

std42.172

q133.000

q3106.000

iqr73.000

skew0.051

kurtosis-1.255

n_outliers0

outlier_rate0.000

zero_rate0.014

n-characters numeric

rows355,922

null0 (0.0%)

unique8

min1.000

max10.000

mean2.128

median2.000

std0.779

q12.000

q33.000

iqr1.000

skew0.438

kurtosis0.459

n_outliers1,132

outlier_rate3.18e-03

zero_rate0.000

characters text

50.1% rows are a single word 91.1% duplicate strings

rows355,922

null0 (0.0%)

unique31,782

len_min8

len_max165

len_mean18.837

len_median17.000

len_p9538.000

word_mean1.907

word_median1.000

n_empty0

n_duplicates324,140

duplicate_rate0.911

vocab_size5,682

readability_flesch_mean-63.176

emoji_rate0.000

url_rate0.000

one_word_rate0.501

allcaps_rate0.000

boilerplate_rate0.000

Sample values (first 10)

narrator
narrator|He
narrator
narrator|my ex
narrator
narrator|Woman|Tardy Boyfriend
narrator|my parents
narrator|Retired Husband|Wife
narrator|a student
narrator|my landlord|my housemates