Skip to content
Back to full roadmap
topicadvanced★ Pro

Video Understanding

Gemini native video + Twelve Labs — frame extraction + temporal QA + scene segmentation.

3 hours2 resources1 prereqs

Gemini 2.x has native video input — a 1-hour video directly in context. Ask second-level questions: "What happens at 10:32?"

Alternative: frame extraction (1fps or scene-change) → batch to vision LLM. More expensive, more flexible.

Twelve Labs: dedicated video AI — semantic search, scene segmentation, key moment detection.

Prerequisites

Resources(2)