LLaVA-1.5 / 1.6 / OneVision: 2-Stage Training + Projector Pretrain + Instruction Tune
LLaVA'nın klasik 2-stage training reçetesi: (1) Projector-only pretrain (LAION-CC-SBU 558K image-caption pair üzerinde), (2) End-to-end instruction tune (LLaVA-Instruct-150K + custom). Freeze strategy ablation (vision frozen vs unfrozen, LLM frozen vs unfrozen). RTX 4090'da LLaVA-1.6 Mistral 7B FT.
Şükrü Yusuf KAYA
32 dakikalık okuma
İleri1. LLaVA 2-Stage Training#
Stage 1: Projector Pretrain - Frozen: Vision encoder, LLM - Trainable: Projector (MLP) only - Data: 558K LAION-CC-SBU image-caption pairs - Format: <image>{caption} - Süre: ~12 saat 8×A100 80GB - Amaç: Image embeddings'i LLM embedding space'iyle align et Stage 2: Visual Instruction Tune - Frozen: Vision encoder (genelde) - Trainable: LLM (full or LoRA) + Projector - Data: 150K-665K visual instruction pairs (LLaVA-Instruct + custom) - Format: <image>\nUser: question\nAssistant: answer - Süre: ~10 saat 8×A100 80GB (Vicuna 13B) - Amaç: Multimodal instruction following
2. Freeze Strategy Ablation (Llava-1.6 Mistral 7B, RTX 4090 QLoRA)#
| Config | Trainable params | MM-Bench accuracy | Wall-clock |
|---|---|---|---|
| Frozen vision + frozen LLM + train projector | ~7M | 38.2 | 6h |
| Frozen vision + LoRA LLM + train projector | 64M | 56.8 | 8h |
| Unfrozen vision + LoRA LLM + projector | 124M | 58.4 | 10h |
| Full FT (vision + LLM + projector) | 7.5B | 60.1 | needs cloud |
Karar: RTX 4090 baseline → frozen vision + LoRA LLM + projector (cost-effective + iyi kalite).
✅ Teslim
- LLaVA-1.6 Mistral 7B'yi mini visual instruction dataset üzerinde FT et. 2) Frozen vs unfrozen vision ablation. 3) Sonraki ders: 6.3 — Llama 3.2 Vision 11B/90B.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
İlgili İçerikler
Part 0 — Engineering Foundations
Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı
Öğrenmeye BaşlaPart 0 — Engineering Foundations
Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir
Öğrenmeye BaşlaPart 0 — Engineering Foundations