Summary confidence: high
This is a 404,841-row Bluesky image-post dataset (lukeslp/bluesky-alt-text) capturing posts with attached images, their alt text, author identifiers, and raw AT-Protocol records across 21 columns. Two things stand out for follow-up: alt-text length is extremely skewed (mean 227 chars but max 65,192, with ~12% outliers), suggesting a small number of very long descriptions are dragging the distribution; and authorship is highly concentrated, with one DID accounting for 32,558 posts and the top handle 'firefaerie81.bsky.social' contributing 6,828 — worth checking for bot or scraper bias. Content is overwhelmingly English (~72% of langs_json) but spans 214 language tags, and images are predominantly JPEG (93.5%) with PNG a distant second. Note also that ~31% of image URLs and author handles are null, which likely reflects the split between 'author_feed' (69%) and 'jetstream' (31%) source modes.
citing: row_count · column_count · columns.image_alt_length.stats · columns.author_did.top_values · columns.author_handle.top_values · columns.author_handle.null_rate · columns.image_mime_type.top_values · columns.langs_json.top_values · columns.source_mode.top_values · columns.image_count_in_post.stats · columns.text.language_counts