Visual Describer
The flagship. Splits your image into quadrants and describes each region for more thorough results. Works in light and dark mode.
A collection of image describers, from a simple drop-and-go page to multi-pass analyzers, a safety scanner, and an art critic. Pick whichever fits what you're doing. Every tool below is live.
Drop or paste an image, get alt text. Start here.
The flagship. Splits your image into quadrants and describes each region for more thorough results. Works in light and dark mode.
The original single-page generator — click, drop, or paste an image and get alt text back. Fastest option if you just need a description.
A preview of the next major version. Functionally complete, still being polished.
Runs entirely in your browser, saves nothing to a server. Bring your own API key and nothing leaves your machine except the image going to the model.
Specialised tools for when plain alt text isn't enough — deeper analysis, longer critiques, safety scans, and model comparisons.
Critiques an image as art — composition, mood, technique — then gives a subjective Hot Take. Great for artwork and photography; overkill for screenshots.
Scans an image for unsafe content and flags what it finds by category. Meant as a moderation aid, not a gatekeeper.
Runs the image through three passes — overview, detailed scan, and final synthesis — then merges them into one description.
The three-pass describer but all passes run at once. Faster, uses more API quota.
A three-stage pipeline where you can pick a different model for each stage — overview, detailed analysis, and synthesis. Good for mixing a cheap model with a precise one.
Runs the same image through several models side by side so you can compare their outputs and pick the one you like.
Uses your own LM Studio server as the vision backend. Keep everything on your machine, use whichever model you've loaded locally.
Same simple describer, one vision model each. Use whichever you have an API key for, or try a few and compare how they describe the same image.
Grok's vision model. Fast, conversational, decent on photographs.
Claude's vision capability. Tends toward careful, thorough descriptions.
OpenAI's vision. Good general-purpose baseline — often the fallback if another model struggles.
Mistral's Pixtral vision model. Experimental — sometimes excellent, occasionally patchy.
A variant of the Pixtral describer that keeps a history of images and their generated alt text.
A smaller Pixtral sandbox for quick experiments. Same model, stripped-down UI.
Uses Hugging Face Inference for a smaller, open vision model. Cheaper and slower than the commercial ones above.
A describer routed through a Coze bot tuned for accessible, ethical image descriptions. Slower, more careful output.
Google's Gemma 3 vision variant. Currently broken — kept here so the work isn't lost.
Development copies kept live so they can be tested in the open. Expect rough edges.
Bleeding-edge copy of the main Visual Describer. Same tool, newer code.
Dev copy of the tuned single-page generator.
A rebuild of the alt generator with the UI split into smaller reusable pieces. For testing, not production use.
The "be right back" page that used to live at /alt/. Kept for continuity.
If you'd rather not use a web tool, these run on your own computer. Linked to their downloads or GitHub repositories.
Downloadable desktop app. Runs a local vision model so your images never leave your computer.
Source code for the desktop app. Electron + Ollama. Clone and build your own.
A full image viewer with EXIF parsing, Imgur integration, and alt-text generation built in. Private repo — ask for access.
The scraper that built the dataset above. Collects and validates alt text from the Bluesky firehose.