Vision and Image Guide

The shared library exposes unified interfaces for Computer Vision (image analysis) and Image Generation across multiple providers.

Core Capabilities

👁️ Vision (Analysis)

Analyze images for description, OCR, code detection, and more.

Providers: OpenAI (GPT-4o), Anthropic (Claude 3.5), xAI (Grok Vision)

🎨 Generation

Create images from text prompts.

Providers: OpenAI (DALL-E 3), xAI (Aurora - coming soon)

Usage Patterns

1. Simple Vision Analysis

from llm_providers.xai_provider import XAIProvider

provider = XAIProvider()
with open('chart.png', 'rb') as f:
    result = provider.analyze_image(
        image=f.read(),
        prompt="Describe the trends in this chart"
    )
print(result.content)

2. Generating Images

from llm_providers.openai_provider import OpenAIProvider

provider = OpenAIProvider()
result = provider.generate_image(
    prompt="A futuristic city on Mars, vaporwave style",
    model="dall-e-3"
)
# result.image_data contains base64 string

Real-World Examples

Alt Text Generator

def generate_alt_text(image_path):
    provider = XAIProvider()
    with open(image_path, 'rb') as f:
        return provider.analyze_image(
            f.read(), 
            "Generate concise alt text for accessibility."
        ).content

Supported Formats

Input: JPEG, PNG, GIF, WebP
Output: PNG (base64 encoded)

Last Updated: November 19, 2025