saturn

/home/coolhand/html/datavis/data_trove/cache/quirky/social_norms_20260121.parquet 355,922 rows sample n=355,922 seed 42 2026-05-01T23:32:17+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/quirky/social_norms_20260121.parquet
Total rows355,922
Profiled sample355,922
Columns25
Generated2026-05-01T23:32:17+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is a social-norms annotation dataset of 355,922 rows and 25 columns, where each entry pairs a real-life 'situation' (mostly from Reddit confessions, AmItheAsshole, Dear Abby, and ROCStories) with an 'action', a rule-of-thumb ('rot'), and a battery of moral judgments by crowd workers. The most striking shape feature is heavy duplication in the text fields: 'rot-judgment' is 97% duplicated and 'characters' 91%, because they collapse to short controlled vocabularies, while 'situation' and 'rot' themselves repeat ~71% and ~27% of the time across annotators. Worth a closer look first: the moral-foundation distribution, which is dominated by 'care-harm' (~39% of non-null), and the 'action-legal' field where 93% of actions are tagged 'legal' — both suggest class imbalance that will matter for any modeling. Also note 'area' is reasonably balanced across the four source corpora, but 'split' is heavily skewed toward 'train' (66%).

area high anthropic:claude-opus-4-7

This column tags each record with one of four source areas, with 'confessions' the modal value at 30.3% of 355,922 rows. The distribution is fairly balanced — entropy_ratio of 0.97 indicates near-uniform spread across the four categories, though 'dearabby' (50,300) is roughly half the size of the other three. No nulls and only 4 unique values make this a clean grouping key.

m high anthropic:claude-opus-4-7

Column 'm' is a numeric feature with only 4 distinct values across 355,922 rows, ranging from 1 to 50 with a median of 1 and both Q1 and Q3 equal to 1. The distribution is severely right-skewed (skew 3.77, kurtosis 12.45), with 18.5% of rows flagged as outliers despite a mean of just 4.24. The tiny cardinality combined with extreme spread suggests this is a categorical-like multiplier or count where most records sit at 1 and a few jump to much larger values.

split high anthropic:claude-opus-4-7

Column holds the dataset split assignment across 355922 rows with 7 distinct values and no nulls. 'train' dominates at 65.6% (233501 rows), followed by near-equal 'test' (29239) and 'dev' (29234), plus auxiliary 'test-extra', 'analysis', and 'dev-extra' partitions; a 'none' bucket of 3913 rows is unusual and likely indicates unassigned examples.

rot-agree high anthropic:claude-opus-4-7

This is a 5-value ordinal score (0–4) capturing agreement on some rotation/rationale ('rot-agree'), with a mean of 3.10 and median 3.0 — answers cluster at the high end. The distribution is left-skewed (skew -0.68) with Q1=3 and Q3=4, and 2.4% of rows fall outside the IQR fence as low-end outliers. Nulls are negligible (0.35%) and zeros are rare (0.38%).

rot-categorization high anthropic:claude-opus-4-7

Categorical tag describing the type of 'rule of thumb' (RoT), with 15 distinct values drawn from a small base vocabulary (advice, social-norms, morality-ethics, description) plus pipe-delimited combinations. Distribution is moderately balanced (entropy ratio 0.72); 'advice' leads at 23.5% and the four single-tag values dominate, while compound tags like 'social-norms|advice' (27,657) indicate multi-label encoding stuffed into one string. Null rate is low at 0.86%.

rot-moral-foundations high anthropic:claude-opus-4-7

This column tags each row with one or more of Haidt's moral foundations (care-harm, fairness-cheating, loyalty-betrayal, authority-subversion, sanctity-degradation), with multi-label combinations encoded as pipe-delimited strings yielding 30 distinct values. Distribution is heavily skewed: 'care-harm' alone covers 38.9% of rows, and 21.7% of rows are null. Entropy ratio of 0.60 confirms the long tail collapses quickly after the top few categories.

rot-char-targeting high anthropic:claude-opus-4-7

Categorical tag identifying which character slot a rotation/transform targets, with 7 distinct values dominated by 'char-0' (51.9%) and 'char-1' (123,396 rows). The distribution is sharply long-tailed: 'char-4' and 'char-5' appear only 46 and 15 times respectively, and 23,781 rows are explicitly 'char-none'. Entropy ratio is 0.557, confirming most mass sits in the first two categories. Null rate is low at 0.46%.

rot-bad high anthropic:claude-opus-4-7

Binary 0/1 flag (n_unique=2, min=0, max=1) indicating a rare 'rot-bad' condition. Positives occur at 2.0% (mean=0.0201, zero_rate=0.9799), producing the flagged high skew (6.84) and heavy kurtosis (44.78). The 7,153 'outliers' are simply the positive class, not anomalies.

rot-judgment high anthropic:claude-opus-4-7

Short moral-judgment phrases (mean 2 words, median 9 chars, max 94) drawn from a tight vocabulary of 797 tokens, dominated by verdicts like "It's good", "shouldn't", "It's okay", and "It's wrong". With 97.0% duplicate rate and only 10,589 uniques across 355,922 rows, this behaves as a categorical label rather than free text. Casing is inconsistent (e.g. "It's good" vs "it's good" appear as separate top values), which will inflate cardinality unless normalized.

action high anthropic:claude-opus-4-7

Short English phrases describing an action or behaviour (mean 41.8 chars, median 7 words), e.g. 'being yourself.' or 'cheating on your partner.' — likely the subject of a moral/judgement prompt. Roughly 27% of the 355,922 rows are duplicates (95,292), with the top phrase repeating 461 times, so the same actions recur heavily across records. Language detection flags multilingual but only 2 non-English rows (1 la, 1 no) out of ~5k sampled, so effectively monolingual.

action-agency high anthropic:claude-opus-4-7

Binary categorical flag with only two levels, 'agency' (88.5%) and 'experience' (the remainder), likely indicating which side or channel originated the action. Class imbalance is heavy and 1.84% of rows are null, so any modelling needs to account for both. Entropy ratio of 0.515 reflects the dominance of the majority class.

action-moral-judgment high anthropic:claude-opus-4-7

A discrete moral-judgment rating on a 5-point scale from -2 to 2 (likely Likert-style: very wrong → very right). The mean of -0.178 and median of 0 indicate a slight lean toward negative judgments, with 43.7% of values exactly zero and 12.7% missing. The distribution is nearly symmetric (skew -0.011) and platykurtic, so most ratings cluster near neutral with modest spread (std 0.857).

action-agree high anthropic:claude-opus-4-7

This is a 5-level ordinal feature (values 0-4), almost certainly a Likert-style agreement rating for an 'action' item, with mean 3.10 and median 3.0 indicating a tilt toward agreement. The distribution is left-skewed (skew -0.68) with Q1=3 and Q3=4, so most respondents pick 3 or 4; only 0.38% give zero. Note the 12.48% null rate, which is substantial and likely reflects non-response or skipped items.

action-legal high anthropic:claude-opus-4-7

This is a categorical legal-status flag with only 3 distinct values ('legal', 'tolerated', 'illegal') across 355,922 rows. The distribution is severely imbalanced: 'legal' accounts for 93.2% of records while 'illegal' represents just 5,934 rows, and entropy ratio is only 0.26. Note also that 12.81% of rows are null, so absence of a status is itself a meaningful signal.

action-pressure high anthropic:claude-opus-4-7

A discrete ordinal feature taking only 5 values across the symmetric range -2 to 2, almost certainly a Likert-style pressure rating. The distribution is balanced (mean -0.04, skew -0.05) and centered on 0, which accounts for 35.3% of rows. Notable: 13.08% of rows are null, and despite being numeric there are just 5 unique values.

action-char-involved high anthropic:claude-opus-4-7

Categorical pointer to which character slot is involved in an action, with 7 distinct values dominated by char-0 (51.5%) and char-1 (~31%). The long tail collapses fast: char-3 through char-5 together account for fewer than 1,400 rows, and 13.05% of rows are null while another 22,100 are explicitly tagged 'char-none'. Entropy ratio of 0.56 confirms the heavy concentration on the first two slots.

action-hypothetical high anthropic:claude-opus-4-7

A 5-level categorical labeling whether an action is stated explicitly or only hypothetically/probably, with negative variants ('explicit-no', 'probable-no'). 'explicit' dominates at 47.1% of non-null rows, but 19.04% of values are null, which is substantial. Entropy ratio of 0.84 indicates the remaining classes are reasonably spread rather than collapsed onto a single mode.

situation high anthropic:claude-opus-4-7

Short first-person descriptions of personal situations or dilemmas, averaging 55 characters / 10 words and topping out at 300 chars, with high readability (Flesch 78.5). Massive duplication is the headline issue: 252,626 of 355,922 rows are duplicates (71%), leaving only 103,296 uniques, and several distinct strings repeat exactly 130 times — suggesting templated or oversampled records. Language detection is overwhelmingly English (4,989) with a tiny multilingual tail (6 de, 1 each es/it/nl/pt/ru) that is unlikely to matter at scale.

situation-short-id high anthropic:claude-opus-4-7

Slash-delimited source identifiers pointing back to situations on Reddit (confessions, amitheasshole), Dear Abby columns, and ROCStories sentences. Despite the 'id' name, only 103,692 values are unique across 355,922 rows — a 70.9% duplicate rate, with the most repeated key appearing 180 times, so each situation evidently surfaces in many rows. Every value is a single token (one_word_rate 1.0) up to 99 chars long.

rot high anthropic:claude-opus-4-7

Short English moral/normative statements (mean 10.1 words, max 221 chars), almost certainly rule-of-thumb (RoT) annotations describing what one should or shouldn't do. The vocabulary is small (9,659 types) and readability is high (Flesch 79.4), consistent with simple declarative templates like 'It is good/okay to...' and 'You shouldn't...'. Notable: 27.1% of rows are duplicates (96,308), with single phrases like 'It is good to be yourself.' repeating 287 times, indicating heavy template reuse rather than free generation.

rot-id high anthropic:claude-opus-4-7

Structured path-like identifiers for 'rule of thumb' records, sourced from Reddit (amitheasshole, confessions), Dear Abby columns, and ROCStories. Despite being IDs, only 291974 of 355922 rows are unique (duplicate_rate 0.18, with top values repeating up to 58 times), so this is not a primary key. Every value is a single token (one_word_rate 1.0, word_mean 1.0) with length 63-140 characters, which triggered the one_word alert but is expected for slash-delimited identifiers.

rot-worker-id high anthropic:claude-opus-4-7

Despite being typed numeric, `rot-worker-id` looks like a categorical worker identifier: only 89 unique values across 355,922 rows, no nulls, and no zeros. The distribution is broad and nearly symmetric (skew -0.08, kurtosis -1.37) spanning 2 to 144 with median 83, consistent with arbitrary id codes rather than a measured quantity. No outliers were flagged.

breakdown-worker-id high anthropic:claude-opus-4-7

This appears to be a worker identifier encoded as integers, with 117 distinct values spread roughly uniformly between 0 and 146 (mean 71.75, median 71, skew 0.05, kurtosis -1.26). Despite being stored as numeric, the low cardinality relative to 355,922 rows and the near-flat distribution suggest a categorical key rather than a measurement. No nulls and no outliers, but about 1.4% of rows carry worker id 0, which may be a sentinel.

n-characters high anthropic:claude-opus-4-7

A small-integer count column ranging 1-10 with only 8 unique values across 355,922 rows, mean 2.13 and median 2. The tight IQR (2-3) and low std (0.78) suggest most records cluster around 2-3 characters, with a mild right tail (skew 0.44) producing 1,132 outliers (~0.32%).

characters high anthropic:claude-opus-4-7

This column lists the characters/speakers in each record, encoded as pipe-delimited role strings (e.g. 'narrator|He', 'narrator|my girlfriend'). It's extremely repetitive — 91.1% of 355,922 rows are duplicates, only 31,782 unique values exist, and 'narrator' alone appears 71,601 times. Half the entries are a single token (one_word_rate 0.50, word_median 1), so this behaves more like a low-cardinality categorical tag than free text despite the text kind.

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

area categorical

rows355,922
null0 (0.0%)
unique4
top_valueconfessions
top_rate0.303
cardinality4
entropy1.947
entropy_ratio0.974
Top values (rank 1–20)
  1. confessions — 107,749
  2. rocstories — 101,791
  3. amitheasshole — 96,082
  4. dearabby — 50,300

m numeric

skew=+3.77 18.5% rows beyond 1.5 IQR
rows355,922
null0 (0.0%)
unique4
min1.000
max50.000
mean4.237
median1.000
std11.239
q11.000
q31.000
iqr0.000
skew3.774
kurtosis12.454
n_outliers65,947
outlier_rate0.185
zero_rate0.000

split categorical

rows355,922
null0 (0.0%)
unique7
top_valuetrain
top_rate0.656
cardinality7
entropy1.763
entropy_ratio0.628
Top values (rank 1–20)
  1. train — 233,501
  2. test — 29,239
  3. dev — 29,234
  4. test-extra — 20,050
  5. analysis — 20,000
  6. dev-extra — 19,985
  7. none — 3,913

rot-agree numeric

rows355,922
null1,236 (0.3%)
unique5
min0.000
max4.000
mean3.100
median3.000
std0.744
q13.000
q34.000
iqr1.000
skew-0.681
kurtosis0.787
n_outliers8,547
outlier_rate0.024
zero_rate3.79e-03

rot-categorization categorical

rows355,922
null3,046 (0.9%)
unique15
top_valueadvice
top_rate0.235
cardinality15
entropy2.806
entropy_ratio0.718
Top values (rank 1–20)
  1. advice — 82,786
  2. social-norms — 72,934
  3. morality-ethics — 58,564
  4. description — 58,537
  5. social-norms|advice — 27,657
  6. morality-ethics|social-norms — 27,118
  7. morality-ethics|advice — 11,498
  8. social-norms|description — 6,078
  9. advice|description — 4,785
  10. morality-ethics|description — 2,023
  11. morality-ethics|social-norms|advice — 790
  12. morality-ethics|social-norms|description — 55
  13. morality-ethics|advice|description — 26
  14. social-norms|advice|description — 24
  15. morality-ethics|social-norms|advice|description — 1

rot-moral-foundations categorical

21.7% null
rows355,922
null77,120 (21.7%)
unique30
top_valuecare-harm
top_rate0.389
cardinality30
entropy2.962
entropy_ratio0.604
Top values (rank 1–20)
  1. care-harm — 108,535
  2. fairness-cheating — 37,666
  3. loyalty-betrayal — 34,581
  4. care-harm|loyalty-betrayal — 21,125
  5. authority-subversion — 19,087
  6. sanctity-degradation — 14,657
  7. care-harm|fairness-cheating — 10,787
  8. care-harm|authority-subversion — 7,958
  9. care-harm|sanctity-degradation — 6,328
  10. fairness-cheating|loyalty-betrayal — 6,222
  11. fairness-cheating|authority-subversion — 3,738
  12. loyalty-betrayal|authority-subversion — 2,122
  13. fairness-cheating|sanctity-degradation — 1,218
  14. authority-subversion|sanctity-degradation — 1,042
  15. loyalty-betrayal|sanctity-degradation — 885
  16. care-harm|fairness-cheating|loyalty-betrayal — 834
  17. care-harm|loyalty-betrayal|authority-subversion — 623
  18. care-harm|authority-subversion|sanctity-degradation — 423
  19. care-harm|fairness-cheating|authority-subversion — 416
  20. care-harm|loyalty-betrayal|sanctity-degradation — 183

rot-char-targeting categorical

rows355,922
null1,622 (0.5%)
unique7
top_valuechar-0
top_rate0.519
cardinality7
entropy1.564
entropy_ratio0.557
Top values (rank 1–20)
  1. char-0 — 183,950
  2. char-1 — 123,396
  3. char-none — 23,781
  4. char-2 — 21,648
  5. char-3 — 1,464
  6. char-4 — 46
  7. char-5 — 15

rot-bad numeric

skew=+6.84
rows355,922
null0 (0.0%)
unique2
min0.000
max1.000
mean0.020
median0.000
std0.140
q10.000
q30.000
iqr0.000
skew6.840
kurtosis44.779
n_outliers7,153
outlier_rate0.020
zero_rate0.980

rot-judgment text

95th-percentile length under 20 chars 97.0% duplicate strings
rows355,922
null1 (0.0%)
unique10,589
len_min1
len_max94
len_mean10.463
len_median9.000
len_p9519.000
word_mean2.001
word_median2.000
n_empty0
n_duplicates345,332
duplicate_rate0.970
vocab_size797
readability_flesch_mean83.273
emoji_rate0.000
url_rate0.000
one_word_rate0.208
allcaps_rate5.62e-06
boilerplate_rate0.000
Sample values (first 10)
  1. It's expected
  2. it is good
  3. it's not typically expected
  4. you should
  5. It's not okay
  6. It's not okay
  7. It's bad
  8. shouldn't
  9. it's okay
  10. it's rude

action text

4 languages detected in sample 26.8% duplicate strings
rows355,922
null3 (0.0%)
unique260,627
len_min1
len_max221
len_mean41.762
len_median40.000
len_p9573.000
word_mean6.969
word_median7.000
n_empty0
n_duplicates95,292
duplicate_rate0.268
vocab_size9,430
readability_flesch_mean57.878
emoji_rate0.000
url_rate2.81e-06
one_word_rate6.01e-03
allcaps_rate0.000
boilerplate_rate2.81e-06
Sample values (first 10)
  1. Being hardworking.
  2. calling your credit card company immediately when your credit card balance is wiped out
  3. putting butter on a donut.
  4. finding somewhere private to have sex.
  5. treating a boss badly just because they are a woman.
  6. expecting more than a person can give.
  7. Being pregnant if you don't want to be.
  8. letting your partner dictate what you can and cannot wear.
  9. doing what you want with your body and telling your boyfriend about your wishes.
  10. telling on someone else

action-agency categorical

rows355,922
null6,551 (1.8%)
unique2
top_valueagency
top_rate0.885
cardinality2
entropy0.515
entropy_ratio0.515
Top values (rank 1–20)
  1. agency — 309,148
  2. experience — 40,223

action-moral-judgment numeric

rows355,922
null45,215 (12.7%)
unique5
min-2.000
max2.000
mean-0.178
median0.000
std0.857
q1-1.000
q30.000
iqr1.000
skew-0.011
kurtosis-0.347
n_outliers4,592
outlier_rate0.015
zero_rate0.437

action-agree numeric

rows355,922
null44,402 (12.5%)
unique5
min0.000
max4.000
mean3.101
median3.000
std0.733
q13.000
q34.000
iqr1.000
skew-0.677
kurtosis0.883
n_outliers7,046
outlier_rate0.023
zero_rate3.76e-03

action-legal categorical

rows355,922
null45,608 (12.8%)
unique3
top_valuelegal
top_rate0.932
cardinality3
entropy0.415
entropy_ratio0.262
Top values (rank 1–20)
  1. legal — 289,316
  2. tolerated — 15,064
  3. illegal — 5,934

action-pressure numeric

rows355,922
null46,537 (13.1%)
unique5
min-2.000
max2.000
mean-0.040
median0.000
std1.114
q1-1.000
q31.000
iqr2.000
skew-0.050
kurtosis-0.660
n_outliers0
outlier_rate0.000
zero_rate0.353

action-char-involved categorical

rows355,922
null46,441 (13.0%)
unique7
top_valuechar-0
top_rate0.515
cardinality7
entropy1.578
entropy_ratio0.562
Top values (rank 1–20)
  1. char-0 — 159,381
  2. char-1 — 107,567
  3. char-none — 22,100
  4. char-2 — 19,124
  5. char-3 — 1,252
  6. char-4 — 41
  7. char-5 — 16

action-hypothetical categorical

rows355,922
null67,776 (19.0%)
unique5
top_valueexplicit
top_rate0.471
cardinality5
entropy1.956
entropy_ratio0.842
Top values (rank 1–20)
  1. explicit — 135,744
  2. hypothetical — 62,689
  3. probable — 49,670
  4. explicit-no — 25,022
  5. probable-no — 15,021

situation text

8 languages detected in sample 71.0% duplicate strings
rows355,922
null0 (0.0%)
unique103,296
len_min10
len_max300
len_mean55.418
len_median52.000
len_p95100.000
word_mean10.519
word_median10.000
n_empty0
n_duplicates252,626
duplicate_rate0.710
vocab_size16,855
readability_flesch_mean78.525
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate5.28e-04
boilerplate_rate1.29e-04
Sample values (first 10)
  1. not being "enthusiastic enough" on my first two days of work
  2. He went to some stores and tried them on.
  3. I put Smart Balance Butter on a Stop and Shop glazed donut today.
  4. having sex in the same room my ex was sleeping
  5. I don’t want to be in my relationship anymore.
  6. Woman Runs Out of Patience With Tardy Boyfriend
  7. I stole a lot of money from my parents when I was younger, so now I am slowly repaying them by leaving cash in thier belongings.
  8. Retired Husband's Wardrobe Is Too Casual for Wife's Comfort
  9. confronting a student who walks into classroom 10 minutes before class ends, to wait for his class afterwards
  10. texting my landlord about my housemates

situation-short-id text

100.0% rows are a single word 70.9% duplicate strings
rows355,922
null0 (0.0%)
unique103,692
len_min24
len_max99
len_mean41.295
len_median27.000
len_p9574.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates252,230
duplicate_rate0.709
vocab_size16,878
readability_flesch_mean-443.908
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. reddit/amitheasshole/axfjgx
  2. rocstories/98e079f0-f03c-4a7d-8d2d-3856c07553dd/sentence3
  3. reddit/confessions/ad1mwp
  4. reddit/amitheasshole/aua8x4
  5. reddit/confessions/ergutg
  6. dearabby/title/dearabby_2019_5_16_2_woman-runs-out-of-patience-with
  7. reddit/confessions/da0j01
  8. dearabby/title/dearabby_2018_5_4_0_retired-husbands-wardrobe-is-too-casual
  9. reddit/amitheasshole/avzgme
  10. reddit/amitheasshole/ai0t5o

rot text

27.1% duplicate strings
rows355,922
null0 (0.0%)
unique259,614
len_min6
len_max221
len_mean54.661
len_median53.000
len_p9587.000
word_mean10.101
word_median10.000
n_empty0
n_duplicates96,308
duplicate_rate0.271
vocab_size9,659
readability_flesch_mean79.379
emoji_rate0.000
url_rate2.81e-06
one_word_rate5.62e-06
allcaps_rate0.000
boilerplate_rate5.62e-06
Sample values (first 10)
  1. People are expected to be hardworking.
  2. It is good to make sure something fits before buying it.
  3. It's not typically expected to put butter on a donut.
  4. You should find somewhere private to have sex.
  5. It is good to talk through relationship problems.
  6. It's not okay to expect more than a person can give.
  7. Children are not expected to take things from their parents.
  8. You shouldn't let your partner dictate what you can and cannot wear.
  9. If a someone does something inappropriate, it's okay to confront them on the issue.
  10. It's rude to tell on someone else.

rot-id text

100.0% rows are a single word
rows355,922
null0 (0.0%)
unique291,974
len_min63
len_max140
len_mean81.614
len_median68.000
len_p95114.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates63,948
duplicate_rate0.180
vocab_size18,736
readability_flesch_mean-755.236
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. rot/reddit/amitheasshole/axfjgx/39OWYR0EPN6CSDBD4KNIUO907B9FYM/89/1
  2. rot/rocstories/98e079f0-f03c-4a7d-8d2d-3856c07553dd/sentence3/3TEM0PF1Q8CIXY1W56HJEWCVQD3D0O/30/1
  3. rot/reddit/confessions/ad1mwp/33FOTY3KEP08ZVG01TQ88VDNF9R1CM/93/1
  4. rot/reddit/amitheasshole/aua8x4/3AQF3RZ55BXA9T17Y1SQBXP74YI6F5/42/3
  5. rot/reddit/confessions/ergutg/3DBQWDE4Y9DQEHEAR61YRWKGQBZN5X/102/3
  6. rot/dearabby/title/dearabby_2019_5_16_2_woman-runs-out-of-patience-with/3LOZAJ85YGS3RE9RBTM9RR82324X2H/105/2
  7. rot/reddit/confessions/da0j01/36PW28KO42BJQHDET3PW9K6T2UUEAS/4/2
  8. rot/dearabby/title/dearabby_2018_5_4_0_retired-husbands-wardrobe-is-too-casual/3OF2M9AATJ3NDPDW1HGYO8A7S59KZE/42/3
  9. rot/reddit/amitheasshole/avzgme/3NGMS9VZTOX6SMUIZUVU532KW4VFF4/42/2
  10. rot/reddit/amitheasshole/ai0t5o/32Z9ZLUT1OZKCVYHTN2KVINB92GOHJ/106/3

rot-worker-id numeric

rows355,922
null0 (0.0%)
unique89
min2.000
max144.000
mean72.537
median83.000
std39.608
q142.000
q3105.000
iqr63.000
skew-0.080
kurtosis-1.366
n_outliers0
outlier_rate0.000
zero_rate0.000

breakdown-worker-id numeric

rows355,922
null0 (0.0%)
unique117
min0.000
max146.000
mean71.747
median71.000
std42.172
q133.000
q3106.000
iqr73.000
skew0.051
kurtosis-1.255
n_outliers0
outlier_rate0.000
zero_rate0.014

n-characters numeric

rows355,922
null0 (0.0%)
unique8
min1.000
max10.000
mean2.128
median2.000
std0.779
q12.000
q33.000
iqr1.000
skew0.438
kurtosis0.459
n_outliers1,132
outlier_rate3.18e-03
zero_rate0.000

characters text

50.1% rows are a single word 91.1% duplicate strings
rows355,922
null0 (0.0%)
unique31,782
len_min8
len_max165
len_mean18.837
len_median17.000
len_p9538.000
word_mean1.907
word_median1.000
n_empty0
n_duplicates324,140
duplicate_rate0.911
vocab_size5,682
readability_flesch_mean-63.176
emoji_rate0.000
url_rate0.000
one_word_rate0.501
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. narrator
  2. narrator|He
  3. narrator
  4. narrator|my ex
  5. narrator
  6. narrator|Woman|Tardy Boyfriend
  7. narrator|my parents
  8. narrator|Retired Husband|Wife
  9. narrator|a student
  10. narrator|my landlord|my housemates