Back to full roadmap

topiccore

Vision / Image Input

Add images to prompts — OCR, classification, scene understanding, UI critique.

3 hours1 resources

GPT-4o, Claude 4, Gemini 2.x — all vision-capable. Practical: Higher resolution → more tokens; downscale when unnecessary.

Patterns:

"Generate code from this mockup"
"Extract 5 fields from this invoice PDF: amount, date, vendor, VAT, invoice #"
"List the UX problems in this flow"

Resources(1)

DDocs(1)

Anthropic — Vision

✓ Production-Ready

Document Understanding (PDF / Tables)

Open the full interactive roadmap