saturn

/home/coolhand/datasets/bsky-firehose-anonymized-dec-2025/bluesky_posts.csv 101,040 rows sample n=101,040 seed 42 2026-04-22T05:56:37+00:00

Overview

Source/home/coolhand/datasets/bsky-firehose-anonymized-dec-2025/bluesky_posts.csv
Total rows101,040
Profiled sample101,040
Columns19
Generated2026-04-22T05:56:37+00:00

Numeric correlation

Languages detected

Per-string language detection across text columns (sampled).

text text

31 languages detected in sample 16.9% rows are all-caps
rows101,040
null0 (0.0%)
unique95,935
len_min1
len_max525
len_mean97.627
len_median68.000
len_p95290.000
word_mean14.235
word_median10.000
n_empty0
n_duplicates5,105
duplicate_rate0.051
vocab_size77,183
readability_flesch_mean64.091
emoji_rate0.183
url_rate0.076
one_word_rate0.190
allcaps_rate0.169
boilerplate_rate1.05e-03
Sample values (first 10)
  1. Un client arrêté après avoir poignardé son livreur de repas "Un livreur de repas a été grièvement blessé lors d’une tentative de meurtre dans la nuit de dimanche à lundi, à Bülach. Le..." https://www.20min.ch/fr/story/buelach-zh-un-client-arrete-apres-avoir-poignarde-son-livreu…
  2. Fındığım bu giboyla ilgili itiraf etmek istediğin bir şey varsa tam vakti 😅 Dm kutuma gel anlat,söz bende kalacak anlattıkların 😅😅
  3. -#strongertogether
  4. Tu b’Shevat is approaching rapidly. Nit to put any pressure on you, but…
  5. Someone mentioned it in my timeline, and so I just rewatched "The Blue Carbuncle", from The Adventures of Sherlock Holmes (1984) with Jeremy Brett. It's a great Christmas story and Jeremy Brett is without question the best Holmes ever. I found it on Britbox
  6. https://trecome.info/articles/89cfe941-6cf1-45ae-8626-cc0241375b46 【新着記事】 宇宙ステーションは「組み立てる」時代から「一発で広げる」時代へ?
  7. Happy Christmas Eve, sweet Flanoy! 🎄🧡🐈
  8. THESE FUCKERS SERIOUSLY COULDN’T WAIT ONE DAY!? /vneg #dandysworld
  9. 어딜 가지.....
  10. Made in hckr.fr 🏴‍☠️🖤 Le genre de petit message qui me fait chaud au cœur.

author_did_hash text

56.5% duplicate strings 95th-percentile length under 20 chars 100.0% rows are a single word
rows101,040
null0 (0.0%)
unique43,998
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates57,042
duplicate_rate0.565
vocab_size13,938
readability_flesch_mean68.345
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate3.46e-04
boilerplate_rate0.000
Sample values (first 10)
  1. 203b2f94ca34ad57
  2. e3fb7462b68ce168
  3. 8b80d746cd58f608
  4. 74e2cbc89edd37a6
  5. ed4f29630f55ae1d
  6. 039de54e8bef8899
  7. 61ee7267320497ee
  8. b464fb16192641fa
  9. 7422e82a369d2ace
  10. 8dc83ec255dde07a

uri_hash text

100.0% of rows are unique strings 95th-percentile length under 20 chars 100.0% rows are a single word
rows101,040
null1 (0.0%)
unique101,039
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean69.614
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate5.25e-04
boilerplate_rate0.000
Sample values (first 10)
  1. 1a925eae4a68e954
  2. 9c7b35c448e9f56a
  3. 00000da0008897eb
  4. cd823fdacd02b11c
  5. 0a525ed50b0474f2
  6. 175ab73228973fa3
  7. 60f3a1a69409b7ae
  8. e9e0e481dbe7f266
  9. d49a9bf37ba42904
  10. ef51be69a5ee76f8

reply_parent_hash text

95th-percentile length under 20 chars 57.7% null 100.0% rows are a single word
rows101,040
null58,270 (57.7%)
unique34,738
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,032
duplicate_rate0.188
vocab_size17,415
readability_flesch_mean71.729
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate7.95e-04
boilerplate_rate0.000
Sample values (first 10)
  1. fc2267f29dd1a492
  2. 6b56ce9644d8dcfc
  3. 701912916dd3aecb
  4. f16b66c1507d3da9
  5. 63ea68b3eabeb6c5
  6. 2e341c64d79713f6
  7. bfbbd6900834f900
  8. dd990c5f31cc4ea6
  9. 3b2f41bfb941204a
  10. a5ba750d7bf30263

reply_root_hash text

50.3% duplicate strings 95th-percentile length under 20 chars 57.7% null 100.0% rows are a single word
rows101,040
null58,270 (57.7%)
unique21,277
len_min16
len_max16
len_mean16.000
len_median16.000
len_p9516.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates21,493
duplicate_rate0.503
vocab_size12,498
readability_flesch_mean77.228
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate8.18e-04
boilerplate_rate0.000
Sample values (first 10)
  1. fc2267f29dd1a492
  2. 6b56ce9644d8dcfc
  3. 701912916dd3aecb
  4. f16b66c1507d3da9
  5. 63ea68b3eabeb6c5
  6. 2e341c64d79713f6
  7. dc0cf00aab42248a
  8. 65f573012a42f37a
  9. 152ff36a17b9ab54
  10. 2da15f2e55e9a171

sentiment categorical

rows101,040
null0 (0.0%)
unique3
top_valueneutral
top_rate0.485
cardinality3
entropy1.473
entropy_ratio0.930
Top values (rank 1–20)
  1. neutral — 48,981
  2. positive — 34,622
  3. negative — 17,437

sentiment_score numeric

5.7% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique1,928
min-0.998
max1.000
mean0.107
median0.000
std0.410
q10.000
q30.402
iqr0.402
skew0.019
kurtosis0.018
n_outliers5,763
outlier_rate0.057
zero_rate0.478

created_at text

95.6% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps
rows101,040
null0 (0.0%)
unique96,576
len_min20
len_max35
len_mean24.345
len_median24.000
len_p9527.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates4,464
duplicate_rate0.044
vocab_size19,720
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2025-12-15T15:02:56.000000Z
  2. 2025-12-24T05:46:28.199Z
  3. 2025-12-24T05:53:52.540Z
  4. 2025-12-24T05:51:06.770Z
  5. 2025-12-24T05:24:05.186Z
  6. 2025-12-24T06:00:49.556+00:00
  7. 2025-12-24T05:25:08.535Z
  8. 2025-12-24T05:56:08.507Z
  9. 2025-12-24T05:51:12.695Z
  10. 2025-12-24T05:00:11.869Z

timestamp text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps
rows101,040
null0 (0.0%)
unique101,040
len_min26
len_max26
len_mean26.000
len_median26.000
len_p9526.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2025-12-23T23:35:20.812256
  2. 2025-12-23T23:46:29.113253
  3. 2025-12-23T23:53:52.818216
  4. 2025-12-23T23:51:06.721420
  5. 2025-12-23T23:24:08.130284
  6. 2025-12-24T00:00:49.619695
  7. 2025-12-23T23:25:09.314207
  8. 2025-12-23T23:56:13.728686
  9. 2025-12-23T23:51:13.117978
  10. 2025-12-23T23:00:12.134916

language categorical

rows101,040
null0 (0.0%)
unique90
top_valueen
top_rate0.608
cardinality90
entropy2.178
entropy_ratio0.336
Top values (rank 1–20)
  1. en — 61,468
  2. ja — 12,607
  3. unknown — 11,481
  4. en-US — 3,617
  5. ko — 2,406
  6. de — 1,821
  7. pt — 1,295
  8. es — 1,153
  9. fr — 746
  10. th — 612
  11. tr — 548
  12. nl — 525
  13. zh — 315
  14. it — 276
  15. ru — 213
  16. fi — 193
  17. ja-JP — 170
  18. id — 158
  19. pl — 139
  20. el — 116

char_count numeric

rows101,040
null0 (0.0%)
unique341
min1.000
max525.000
mean97.627
median68.000
std86.052
q130.000
q3143.000
iqr113.000
skew1.018
kurtosis-0.057
n_outliers289
outlier_rate2.86e-03
zero_rate0.000

word_count numeric

rows101,040
null0 (0.0%)
unique79
min0.000
max83.000
mean14.675
median10.000
std14.223
q13.000
q322.000
iqr19.000
skew1.209
kurtosis0.699
n_outliers2,882
outlier_rate0.029
zero_rate6.04e-04

has_images numeric

skew=+2.12 13.6% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.136
median0.000
std0.343
q10.000
q30.000
iqr0.000
skew2.120
kurtosis2.497
n_outliers13,768
outlier_rate0.136
zero_rate0.864

has_video numeric

skew=+8.50
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.013
median0.000
std0.115
q10.000
q30.000
iqr0.000
skew8.497
kurtosis70.192
n_outliers1,344
outlier_rate0.013
zero_rate0.987

has_link numeric

18.0% rows beyond 1.5 IQR
rows101,040
null0 (0.0%)
unique2
min0.000
max1.000
mean0.180
median0.000
std0.384
q10.000
q30.000
iqr0.000
skew1.670
kurtosis0.789
n_outliers18,140
outlier_rate0.180
zero_rate0.820

embed_type categorical

61.2% null
rows101,040
null61,791 (61.2%)
unique5
top_valueapp.bsky.embed.external
top_rate0.462
cardinality5
entropy1.717
entropy_ratio0.739
Top values (rank 1–20)
  1. app.bsky.embed.external — 18,140
  2. app.bsky.embed.images — 13,768
  3. app.bsky.embed.record — 5,126
  4. app.bsky.embed.video — 1,344
  5. app.bsky.embed.recordWithMedia — 871

hashtags text

90.0% duplicate strings 90.2% rows are a single word
rows101,040
null0 (0.0%)
unique10,103
len_min2
len_max1,122
len_mean10.378
len_median2.000
len_p9563.000
word_mean1.384
word_median1.000
n_empty0
n_duplicates90,937
duplicate_rate0.900
vocab_size7,036
readability_flesch_mean2.752
emoji_rate0.000
url_rate0.000
one_word_rate0.902
allcaps_rate5.70e-03
boilerplate_rate0.000
Sample values (first 10)
  1. []
  2. []
  3. ["#strongertogether"]
  4. []
  5. []
  6. []
  7. []
  8. ["#dandysworld"]
  9. []
  10. []

mentions text

98.1% duplicate strings 95th-percentile length under 20 chars 99.6% rows are a single word
rows101,040
null0 (0.0%)
unique1,921
len_min2
len_max420
len_mean2.670
len_median2.000
len_p952.000
word_mean1.012
word_median1.000
n_empty0
n_duplicates99,119
duplicate_rate0.981
vocab_size660
readability_flesch_mean0.702
emoji_rate0.000
url_rate0.000
one_word_rate0.996
allcaps_rate3.96e-05
boilerplate_rate0.000
Sample values (first 10)
  1. []
  2. []
  3. []
  4. []
  5. []
  6. []
  7. []
  8. []
  9. []
  10. []

links text

96.3% duplicate strings 95th-percentile length under 20 chars 99.8% rows are a single word
rows101,040
null0 (0.0%)
unique3,771
len_min2
len_max266
len_mean4.950
len_median2.000
len_p952.000
word_mean1.003
word_median1.000
n_empty0
n_duplicates97,269
duplicate_rate0.963
vocab_size904
readability_flesch_mean-20.108
emoji_rate0.000
url_rate0.048
one_word_rate0.998
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. ["https://www.20min.ch/fr/story/buelach-zh-un-client-arrete-apres-avoir-poignarde-son-livreur-de-repas-103470448"]
  2. []
  3. []
  4. []
  5. []
  6. ["https://trecome.info/articles/89cfe941-6cf1-45ae-8626-cc0241375b46"]
  7. []
  8. []
  9. []
  10. []