saturn
/home/coolhand/datasets/bsky-firehose-anonymized-dec-2025/bluesky_posts.csv 101,040 rows sample n=101,040 seed 42 2026-04-22T05:56:37+00:00
Overview
| Source | /home/coolhand/datasets/bsky-firehose-anonymized-dec-2025/bluesky_posts.csv |
| Total rows | 101,040 |
| Profiled sample | 101,040 |
| Columns | 19 |
| Generated | 2026-04-22T05:56:37+00:00 |
Numeric correlation
Languages detected
Per-string language detection across text columns (sampled).
text text
31 languages detected in sample
16.9% rows are all-caps
rows101,040
null0 (0.0%)
unique95,935
len_min1
len_max525
len_mean97.627
len_median68.000
len_p95290.000
word_mean14.235
word_median10.000
n_empty0
n_duplicates5,105
duplicate_rate0.051
vocab_size77,183
readability_flesch_mean64.091
emoji_rate0.183
url_rate0.076
one_word_rate0.190
allcaps_rate0.169
boilerplate_rate1.05e-03
Sample values (first 10)
- Un client arrêté après avoir poignardé son livreur de repas "Un livreur de repas a été grièvement blessé lors d’une tentative de meurtre dans la nuit de dimanche à lundi, à Bülach. Le..." https://www.20min.ch/fr/story/buelach-zh-un-client-arrete-apres-avoir-poignarde-son-livreu…
- Fındığım bu giboyla ilgili itiraf etmek istediğin bir şey varsa tam vakti 😅 Dm kutuma gel anlat,söz bende kalacak anlattıkların 😅😅
- -#strongertogether
- Tu b’Shevat is approaching rapidly. Nit to put any pressure on you, but…
- Someone mentioned it in my timeline, and so I just rewatched "The Blue Carbuncle", from The Adventures of Sherlock Holmes (1984) with Jeremy Brett. It's a great Christmas story and Jeremy Brett is without question the best Holmes ever. I found it on Britbox
- https://trecome.info/articles/89cfe941-6cf1-45ae-8626-cc0241375b46 【新着記事】 宇宙ステーションは「組み立てる」時代から「一発で広げる」時代へ?
- Happy Christmas Eve, sweet Flanoy! 🎄🧡🐈
- THESE FUCKERS SERIOUSLY COULDN’T WAIT ONE DAY!? /vneg #dandysworld
- 어딜 가지.....
- Made in hckr.fr 🏴☠️🖤 Le genre de petit message qui me fait chaud au cœur.
author_did_hash text
56.5% duplicate strings
95th-percentile length under 20 chars
100.0% rows are a single word
rows101,040
null0 (0.0%)
unique43,998
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates57,042
duplicate_rate0.565
vocab_size13,938
readability_flesch_mean68.345
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate3.46e-04
boilerplate_rate0.000
Sample values (first 10)
- 203b2f94ca34ad57
- e3fb7462b68ce168
- 8b80d746cd58f608
- 74e2cbc89edd37a6
- ed4f29630f55ae1d
- 039de54e8bef8899
- 61ee7267320497ee
- b464fb16192641fa
- 7422e82a369d2ace
- 8dc83ec255dde07a
uri_hash text
100.0% of rows are unique strings
95th-percentile length under 20 chars
100.0% rows are a single word
rows101,040
null1 (0.0%)
unique101,039
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean69.614
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate5.25e-04
boilerplate_rate0.000
Sample values (first 10)
- 1a925eae4a68e954
- 9c7b35c448e9f56a
- 00000da0008897eb
- cd823fdacd02b11c
- 0a525ed50b0474f2
- 175ab73228973fa3
- 60f3a1a69409b7ae
- e9e0e481dbe7f266
- d49a9bf37ba42904
- ef51be69a5ee76f8
reply_parent_hash text
95th-percentile length under 20 chars
57.7% null
100.0% rows are a single word
rows101,040
null58,270 (57.7%)
unique34,738
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,032
duplicate_rate0.188
vocab_size17,415
readability_flesch_mean71.729
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate7.95e-04
boilerplate_rate0.000
Sample values (first 10)
- fc2267f29dd1a492
- 6b56ce9644d8dcfc
- 701912916dd3aecb
- f16b66c1507d3da9
- 63ea68b3eabeb6c5
- 2e341c64d79713f6
- bfbbd6900834f900
- dd990c5f31cc4ea6
- 3b2f41bfb941204a
- a5ba750d7bf30263
reply_root_hash text
50.3% duplicate strings
95th-percentile length under 20 chars
57.7% null
100.0% rows are a single word
rows101,040
null58,270 (57.7%)
unique21,277
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates21,493
duplicate_rate0.503
vocab_size12,498
readability_flesch_mean77.228
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate8.18e-04
boilerplate_rate0.000
Sample values (first 10)
- fc2267f29dd1a492
- 6b56ce9644d8dcfc
- 701912916dd3aecb
- f16b66c1507d3da9
- 63ea68b3eabeb6c5
- 2e341c64d79713f6
- dc0cf00aab42248a
- 65f573012a42f37a
- 152ff36a17b9ab54
- 2da15f2e55e9a171
sentiment categorical
rows101,040
null0 (0.0%)
unique3
top_valueneutral
top_rate0.485
cardinality3
entropy1.473
entropy_ratio0.930
Top values (rank 1–20)
- neutral — 48,981
- positive — 34,622
- negative — 17,437
sentiment_score numeric
5.7% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique1,928
min-0.998
max1.000
mean0.107
median0.000
std0.410
q10.000
q30.402
iqr0.402
skew0.019
kurtosis0.018
n_outliers5,763
outlier_rate0.057
zero_rate0.478
created_at text
95.6% of rows are unique strings
100.0% rows are a single word
100.0% rows are all-caps
rows101,040
null0 (0.0%)
unique96,576
len_min20
len_max35
len_mean24.345
len_median24.000
len_p9527.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates4,464
duplicate_rate0.044
vocab_size19,720
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
- 2025-12-15T15:02:56.000000Z
- 2025-12-24T05:46:28.199Z
- 2025-12-24T05:53:52.540Z
- 2025-12-24T05:51:06.770Z
- 2025-12-24T05:24:05.186Z
- 2025-12-24T06:00:49.556+00:00
- 2025-12-24T05:25:08.535Z
- 2025-12-24T05:56:08.507Z
- 2025-12-24T05:51:12.695Z
- 2025-12-24T05:00:11.869Z
timestamp text
100.0% of rows are unique strings
100.0% rows are a single word
100.0% rows are all-caps
rows101,040
null0 (0.0%)
unique101,040
len_min26
len_max26
len_mean26.000
len_median26.000
len_p9526.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
- 2025-12-23T23:35:20.812256
- 2025-12-23T23:46:29.113253
- 2025-12-23T23:53:52.818216
- 2025-12-23T23:51:06.721420
- 2025-12-23T23:24:08.130284
- 2025-12-24T00:00:49.619695
- 2025-12-23T23:25:09.314207
- 2025-12-23T23:56:13.728686
- 2025-12-23T23:51:13.117978
- 2025-12-23T23:00:12.134916
language categorical
rows101,040
null0 (0.0%)
unique90
top_valueen
top_rate0.608
cardinality90
entropy2.178
entropy_ratio0.336
Top values (rank 1–20)
- en — 61,468
- ja — 12,607
- unknown — 11,481
- en-US — 3,617
- ko — 2,406
- de — 1,821
- pt — 1,295
- es — 1,153
- fr — 746
- th — 612
- tr — 548
- nl — 525
- zh — 315
- it — 276
- ru — 213
- fi — 193
- ja-JP — 170
- id — 158
- pl — 139
- el — 116
char_count numeric
rows101,040
null0 (0.0%)
unique341
min1.000
max525.000
mean97.627
median68.000
std86.052
q130.000
q3143.000
iqr113.000
skew1.018
kurtosis-0.057
n_outliers289
outlier_rate2.86e-03
zero_rate0.000
word_count numeric
rows101,040
null0 (0.0%)
unique79
min0.000
max83.000
mean14.675
median10.000
std14.223
q13.000
q322.000
iqr19.000
skew1.209
kurtosis0.699
n_outliers2,882
outlier_rate0.029
zero_rate6.04e-04
has_images numeric
skew=+2.12
13.6% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.136
median0.000
std0.343
q10.000
q30.000
iqr0.000
skew2.120
kurtosis2.497
n_outliers13,768
outlier_rate0.136
zero_rate0.864
has_video numeric
skew=+8.50
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.013
median0.000
std0.115
q10.000
q30.000
iqr0.000
skew8.497
kurtosis70.192
n_outliers1,344
outlier_rate0.013
zero_rate0.987
has_link numeric
18.0% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.180
median0.000
std0.384
q10.000
q30.000
iqr0.000
skew1.670
kurtosis0.789
n_outliers18,140
outlier_rate0.180
zero_rate0.820
embed_type categorical
61.2% null
rows101,040
null61,791 (61.2%)
unique5
top_valueapp.bsky.embed.external
top_rate0.462
cardinality5
entropy1.717
entropy_ratio0.739
Top values (rank 1–20)
- app.bsky.embed.external — 18,140
- app.bsky.embed.images — 13,768
- app.bsky.embed.record — 5,126
- app.bsky.embed.video — 1,344
- app.bsky.embed.recordWithMedia — 871
hashtags text
90.0% duplicate strings
90.2% rows are a single word
rows101,040
null0 (0.0%)
unique10,103
len_min2
len_max1,122
len_mean10.378
len_median2.000
len_p9563.000
word_mean1.384
word_median1.000
n_empty0
n_duplicates90,937
duplicate_rate0.900
vocab_size7,036
readability_flesch_mean2.752
emoji_rate0.000
url_rate0.000
one_word_rate0.902
allcaps_rate5.70e-03
boilerplate_rate0.000
Sample values (first 10)
- []
- []
- ["#strongertogether"]
- []
- []
- []
- []
- ["#dandysworld"]
- []
- []
mentions text
98.1% duplicate strings
95th-percentile length under 20 chars
99.6% rows are a single word
rows101,040
null0 (0.0%)
unique1,921
len_min2
len_max420
len_mean2.670
len_median2.000
len_p952.000
word_mean1.012
word_median1.000
n_empty0
n_duplicates99,119
duplicate_rate0.981
vocab_size660
readability_flesch_mean0.702
emoji_rate0.000
url_rate0.000
one_word_rate0.996
allcaps_rate3.96e-05
boilerplate_rate0.000
Sample values (first 10)
- []
- []
- []
- []
- []
- []
- []
- []
- []
- []
links text
96.3% duplicate strings
95th-percentile length under 20 chars
99.8% rows are a single word
rows101,040
null0 (0.0%)
unique3,771
len_min2
len_max266
len_mean4.950
len_median2.000
len_p952.000
word_mean1.003
word_median1.000
n_empty0
n_duplicates97,269
duplicate_rate0.963
vocab_size904
readability_flesch_mean-20.108
emoji_rate0.000
url_rate0.048
one_word_rate0.998
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
- ["https://www.20min.ch/fr/story/buelach-zh-un-client-arrete-apres-avoir-poignarde-son-livreur-de-repas-103470448"]
- []
- []
- []
- []
- ["https://trecome.info/articles/89cfe941-6cf1-45ae-8626-cc0241375b46"]
- []
- []
- []
- []