# Video LLM FT: LLaVA-NeXT-Video + VideoLLaMA3 + Frame Sampling Strategy

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-video-llm-finetuning
> Updated: 2026-05-14T14:42:54.603Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VI — Vision-Language Multimodal FT
**TLDR:** Video LLM — image's temporal extension. LLaVA-NeXT-Video, VideoLLaMA3, Qwen 2.5-VL native video. Frame sampling (uniform vs adaptive), temporal token compression, long-video Q&A (>1h). Video LLM FT on RTX 4090 — practical with short clips (10-30s).

