saturn·

letterboxd users export

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/movies/letterboxd/users_export.csv

Saturn profiled 8,139 rows across 5 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/movies/letterboxd/users_export.csv",
    "--findings", "letterboxd-users_export.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 8,139 Letterboxd user profiles with 5 columns covering identifiers (username, _id), a display name, and two activity metrics (num_reviews, num_ratings_pages). The activity metrics are the most interesting signal: num_reviews is heavily right-skewed with a mean of 868 but a median of 588 and a max of 17,184, and num_ratings_pages shows similar skew along with a 41.7% null rate that warrants investigation. Display names are also worth a look — about 60% are one-word, 12.3% are duplicates, and 'null' literally appears 307 times as a value, suggesting some data quality issues. The username and _id columns are fully unique identifiers and can largely be ignored for analytical purposes.

citing: row_count · column_count · num_reviews.stats · num_ratings_pages.stats · num_ratings_pages.null_rate · display_name.stats · display_name.top_values · username.n_unique · _id.n_unique

Out[4]:

saturn.schema() · 5 columns

column kind n null% unique alerts
_id text 8,139 0.0% 8,139 near_unique one_word
display_name text 8,139 0.0% 7,136 one_word short_text
num_ratings_pages numeric 8,139 41.7% 177 null_rate high_skew
num_reviews numeric 8,139 0.0% 2,416 high_skew outliers
username text 8,139 0.0% 8,139 near_unique one_word short_text
Fig 1.
num_reviews · Look at the long right tail — most users sit well below the 868 mean while a few hit thousands of reviews.
Show data table
Histogram bins for num_reviews (median: 588.0).
bincount
0 – 429.63145
429.6 – 859.22163
859.2 – 12891165
1289 – 1718615
1718 – 2148396
2148 – 2578217
2578 – 3007158
3007 – 343798
3437 – 386656
3866 – 429637
4296 – 472622
4726 – 515515
5155 – 558510
5585 – 601411
6014 – 64445
6444 – 68749
6874 – 73034
7303 – 77331
7733 – 81622
8162 – 85921
8592 – 90220
9022 – 94511
9451 – 98812
9881 – 1.031e+041
1.031e+04 – 1.074e+040
1.074e+04 – 1.117e+040
1.117e+04 – 1.16e+040
1.16e+04 – 1.203e+040
1.203e+04 – 1.246e+041
1.246e+04 – 1.289e+040
1.289e+04 – 1.332e+040
1.332e+04 – 1.375e+041
1.375e+04 – 1.418e+040
1.418e+04 – 1.461e+041
1.461e+04 – 1.504e+041
1.504e+04 – 1.547e+040
1.547e+04 – 1.59e+040
1.59e+04 – 1.632e+040
1.632e+04 – 1.675e+040
1.675e+04 – 1.718e+041
Fig 2.
num_ratings_pages · Same skew pattern as reviews, but note that 41.7% of values are missing before reading the distribution.
Show data table
Histogram bins for num_ratings_pages (median: 25.0).
bincount
1 – 31.182911
31.18 – 61.351307
61.35 – 91.53355
91.53 – 121.7100
121.7 – 151.936
151.9 – 182.115
182.1 – 212.29
212.2 – 242.41
242.4 – 272.63
272.6 – 302.83
302.8 – 332.92
332.9 – 363.12
363.1 – 393.30
393.3 – 423.41
423.4 – 453.60
453.6 – 483.80
483.8 – 5140
514 – 544.10
544.1 – 574.30
574.3 – 604.50
604.5 – 634.70
634.7 – 664.90
664.9 – 6950
695 – 725.21
725.2 – 755.40
755.4 – 785.60
785.6 – 815.70
815.7 – 845.90
845.9 – 876.10
876.1 – 906.20
906.2 – 936.40
936.4 – 966.60
966.6 – 996.80
996.8 – 10270
1027 – 10570
1057 – 10870
1087 – 11170
1117 – 11480
1148 – 11780
1178 – 12081
Fig 3.
display_name · Top display names show common first names like Sam and Jack, plus 307 literal 'null' entries flagging data quality issues.
Show data table
Character-length distribution for display_name (mean: 9.283327190072491).
charscount
1 – 3329
3 – 61711
6 – 81902
8 – 111037
11 – 131236
13 – 151354
15 – 18275
18 – 20167
20 – 2339
23 – 2527
25 – 2731
27 – 304
30 – 3214
32 – 352
35 – 372
37 – 392
39 – 420
42 – 441
44 – 471
47 – 491
49 – 511
51 – 540
54 – 560
56 – 590
59 – 610
61 – 630
63 – 662
66 – 680
68 – 710
71 – 730
73 – 750
75 – 780
78 – 800
80 – 830
83 – 850
85 – 870
87 – 900
90 – 920
92 – 950
95 – 971
Fig 4.
display_name · Most display names are short (median 9 chars) but the max stretches to 97 — check for outlier-style names.
Show data table
Character-length distribution for display_name (mean: 9.283327190072491).
charscount
1 – 3329
3 – 61711
6 – 81902
8 – 111037
11 – 131236
13 – 151354
15 – 18275
18 – 20167
20 – 2339
23 – 2527
25 – 2731
27 – 304
30 – 3214
32 – 352
35 – 372
37 – 392
39 – 420
42 – 441
44 – 471
47 – 491
49 – 511
51 – 540
54 – 560
56 – 590
59 – 610
61 – 630
63 – 662
66 – 680
68 – 710
71 – 730
73 – 750
75 – 780
78 – 800
80 – 830
83 – 850
85 – 870
87 – 900
90 – 920
92 – 950
95 – 971
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
_idtext0.0%
display_nametext0.0%
num_ratings_pagesnumeric41.7%
num_reviewsnumeric0.0%
usernametext0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
num_ratings_pagesnum_reviews
num_ratings_pages+1.00+0.08
num_reviews+0.08+1.00

_id text identifier

This column is a per-row identifier, almost certainly MongoDB ObjectIds: every one of the 8139 values is unique, exactly 24 characters long, single-token, and the samples are 24-char hex strings. There are no nulls, duplicates, or empties, and vocab_size equals n, confirming a pure primary key with no analytic content.

Treatment: Drop for modelling; retain only as a join key.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["_id"].stats

statvalue
n8,139
nulls0 (0.0%)
unique8,139
len_min 24
len_max 24
len_mean 24
len_median 24
len_p95 24
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 8,139
readability_flesch_mean 31.97
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
Fig 7.
Character-length distribution for _id.
Show data table
Character-length distribution for _id (mean: 24.0).
charscount
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 248139
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240
24 – 240

display_name text free_text

Short user display names: nearly 60% are a single word (one_word_rate 0.5978), median length 9 chars and median word count 1, with the top values dominated by first names like Sam, Jack, Emma. Notable quirks: 307 rows literally contain the string "null" (not actual nulls, since null_rate is 0.0), duplicate_rate is 12.3% with 1003 repeats, and 5.8% include emoji. Vocabulary is wide (7487 tokens across 8139 rows), consistent with free-form handles rather than a controlled label set.

Treatment: Treat as free-text handles: replace literal "null" tokens with true missing, lowercase-normalize, and avoid using as a join key given 12% duplicates.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["display_name"].stats

statvalue
n8,139
nulls0 (0.0%)
unique7,136
len_min 1
len_max 97
len_mean 9.283
len_median 9
len_p95 16
word_mean 1.479
word_median 1
n_empty 0
n_duplicates 1,003
duplicate_rate 0.1232
vocab_size 7,487
readability_flesch_mean 45.67
emoji_rate 0.05836
url_rate 0
one_word_rate 0.5979
allcaps_rate 0.01954
boilerplate_rate 0
alert: one_word59.8% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 8.
Character-length distribution for display_name.
Show data table
Character-length distribution for display_name (mean: 9.283327190072491).
charscount
1 – 3329
3 – 61711
6 – 81902
8 – 111037
11 – 131236
13 – 151354
15 – 18275
18 – 20167
20 – 2339
23 – 2527
25 – 2731
27 – 304
30 – 3214
32 – 352
35 – 372
37 – 392
39 – 420
42 – 441
44 – 471
47 – 491
49 – 511
51 – 540
54 – 560
56 – 590
59 – 610
61 – 630
63 – 662
66 – 680
68 – 710
71 – 730
73 – 750
75 – 780
78 – 800
80 – 830
83 – 850
85 – 870
87 – 900
90 – 920
92 – 950
95 – 971

num_ratings_pages numeric feature

Numeric count of rating pages per item, present for roughly 58% of rows (null_rate 0.4168) with 177 distinct values from 1 to 1208 and a median of 25. The distribution is severely right-skewed (skew 11.24, kurtosis 298.34) with 236 outliers above the IQR fence, so the mean of 32.81 sits well above the typical row.

Treatment: log-transform and impute missing before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["num_ratings_pages"].stats

statvalue
n8,139
nulls3,392 (41.7%)
unique177
min 1
max 1,208
mean 32.81
median 25
std 35.23
q1 15
q3 42
iqr 27
skew 11.24
kurtosis 298.3
n_outliers 236
outlier_rate 0.04972
zero_rate 0
alert: null_rate41.7% null
alert: high_skewskew=+11.24
Fig 9.
Distribution of num_ratings_pages. Vertical dash marks the median.
Show data table
Histogram bins for num_ratings_pages (median: 25.0).
bincount
1 – 31.182911
31.18 – 61.351307
61.35 – 91.53355
91.53 – 121.7100
121.7 – 151.936
151.9 – 182.115
182.1 – 212.29
212.2 – 242.41
242.4 – 272.63
272.6 – 302.83
302.8 – 332.92
332.9 – 363.12
363.1 – 393.30
393.3 – 423.41
423.4 – 453.60
453.6 – 483.80
483.8 – 5140
514 – 544.10
544.1 – 574.30
574.3 – 604.50
604.5 – 634.70
634.7 – 664.90
664.9 – 6950
695 – 725.21
725.2 – 755.40
755.4 – 785.60
785.6 – 815.70
815.7 – 845.90
845.9 – 876.10
876.1 – 906.20
906.2 – 936.40
936.4 – 966.60
966.6 – 996.80
996.8 – 10270
1027 – 10570
1057 – 10870
1087 – 11170
1117 – 11480
1148 – 11780
1178 – 12081

num_reviews numeric feature

num_reviews is a count of reviews per item, ranging from 0 to 17184 with a median of 588 and mean of 868. The distribution is heavily right-skewed (skew 3.92, kurtosis 33.3), with 505 outliers (6.2%) and only 0.96% zeros. The gap between q3 (1130) and max (17184) signals a long tail of highly-reviewed items.

Treatment: log1p-transform before modelling to tame the right tail.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["num_reviews"].stats

statvalue
n8,139
nulls0 (0.0%)
unique2,416
min 0
max 17,184
mean 868.4
median 588
std 979.1
q1 267
q3 1,130
iqr 863
skew 3.923
kurtosis 33.31
n_outliers 505
outlier_rate 0.06205
zero_rate 0.009583
alert: high_skewskew=+3.92
alert: outliers6.2% rows beyond 1.5 IQR
Fig 10.
Distribution of num_reviews. Vertical dash marks the median.
Show data table
Histogram bins for num_reviews (median: 588.0).
bincount
0 – 429.63145
429.6 – 859.22163
859.2 – 12891165
1289 – 1718615
1718 – 2148396
2148 – 2578217
2578 – 3007158
3007 – 343798
3437 – 386656
3866 – 429637
4296 – 472622
4726 – 515515
5155 – 558510
5585 – 601411
6014 – 64445
6444 – 68749
6874 – 73034
7303 – 77331
7733 – 81622
8162 – 85921
8592 – 90220
9022 – 94511
9451 – 98812
9881 – 1.031e+041
1.031e+04 – 1.074e+040
1.074e+04 – 1.117e+040
1.117e+04 – 1.16e+040
1.16e+04 – 1.203e+040
1.203e+04 – 1.246e+041
1.246e+04 – 1.289e+040
1.289e+04 – 1.332e+040
1.332e+04 – 1.375e+041
1.375e+04 – 1.418e+040
1.418e+04 – 1.461e+041
1.461e+04 – 1.504e+041
1.504e+04 – 1.547e+040
1.547e+04 – 1.59e+040
1.59e+04 – 1.632e+040
1.632e+04 – 1.675e+040
1.675e+04 – 1.718e+041

username text identifier

This column holds unique single-token usernames: every one of the 8139 rows has a distinct value (n_unique=8139, duplicate_rate=0.0) and one_word_rate is 1.0. Lengths are short and tightly bounded (len_min=2, len_mean≈9.79, len_max=15), consistent with a handle field rather than free text. No nulls, no URLs, no emoji, and allcaps usage is negligible (0.00037).

Treatment: Treat as a user identifier; drop from modelling features and use only for joins or deduplication.

anthropic:claude-opus-4-7 · confidence high
Out[24]:

saturn.columns["username"].stats

statvalue
n8,139
nulls0 (0.0%)
unique8,139
len_min 2
len_max 15
len_mean 9.793
len_median 10
len_p95 15
word_mean 1
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 8,139
readability_flesch_mean 2.78
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 0.0003686
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: short_text95th-percentile length under 20 chars
Fig 11.
Character-length distribution for username.
Show data table
Character-length distribution for username (mean: 9.793463570463201).
charscount
2 – 29
2 – 30
3 – 30
3 – 342
3 – 40
4 – 40
4 – 4168
4 – 50
5 – 50
5 – 5343
5 – 60
6 – 60
6 – 6570
6 – 70
7 – 70
7 – 7778
7 – 80
8 – 80
8 – 8941
8 – 80
8 – 90
9 – 9937
9 – 90
9 – 100
10 – 10996
10 – 100
10 – 110
11 – 11932
11 – 110
11 – 120
12 – 12805
12 – 120
12 – 130
13 – 13654
13 – 130
13 – 140
14 – 14483
14 – 140
14 – 150
15 – 15481

How to cite

click to copy

BibTeX
@misc{saturn-letterboxd-users-export-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: letterboxd users export},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/letterboxd-users_export}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: letterboxd users export. Source: /home/coolhand/html/datavis/data_trove/entertainment/movies/letterboxd/users_export.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/letterboxd-users_export