Back to full roadmap
topiccore
Vision / Image Input
Add images to prompts — OCR, classification, scene understanding, UI critique.
3 hours1 resources
GPT-4o, Claude 4, Gemini 2.x — all vision-capable. Practical: Higher resolution → more tokens; downscale when unnecessary.
Patterns:
- "Generate code from this mockup"
- "Extract 5 fields from this invoice PDF: amount, date, vendor, VAT, invoice #"
- "List the UX problems in this flow"