Back to full roadmap
topicadvanced★ Pro
Video Understanding
Gemini native video + Twelve Labs — frame extraction + temporal QA + scene segmentation.
3 hours2 resources1 prereqs
Gemini 2.x has native video input — a 1-hour video directly in context. Ask second-level questions: "What happens at 10:32?"
Alternative: frame extraction (1fps or scene-change) → batch to vision LLM. More expensive, more flexible.
Twelve Labs: dedicated video AI — semantic search, scene segmentation, key moment detection.