Skip to content
Back to full roadmap
topiccore

Vision / Image Input

Add images to prompts — OCR, classification, scene understanding, UI critique.

3 hours1 resources

GPT-4o, Claude 4, Gemini 2.x — all vision-capable. Practical: Higher resolution → more tokens; downscale when unnecessary.

Patterns:

  • "Generate code from this mockup"
  • "Extract 5 fields from this invoice PDF: amount, date, vendor, VAT, invoice #"
  • "List the UX problems in this flow"

Resources(1)

Vision / Image Input · Prompt Engineer Roadmap | Şükrü Yusuf Kaya