Skip to content

Pixtral 12B + Pixtral Large: Mistral Multimodal — Resolution-Free + Apache 2.0

Pixtral 12B (Mistral Nemo 12B + 400M ViT) + Pixtral Large (124B) — Mistral's open multimodal. Apache 2.0, resolution-free, EU AI Act-compliance friendly. 7-32 image per context, 128K context. Pixtral 12B QLoRA marginal on RTX 4090 (~22 GB).

Şükrü Yusuf KAYA
22 min read
Advanced
Pixtral 12B + Pixtral Large: Mistral Multimodal — Resolution-Free + Apache 2.0

1. Pixtral Spec#

ModelTotalVisionLLM BaseContextLisans
Pixtral 12B12B + 400M ViTPixtral ViT 400MMistral Nemo 12B128KApache 2.0
Pixtral Large124BPixtral ViT 1BMistral Large 2 123B128KMistral Research
Apache 2.0 önemli: EU AI Act + commercial production için en esnek lisans.
# Pixtral 12B basic FT from transformers import LlavaForConditionalGeneration, AutoProcessor model = LlavaForConditionalGeneration.from_pretrained( "mistral-community/pixtral-12b", quantization_config=bnb_4bit, torch_dtype="bfloat16", ) # LoRA + Visual instruction tune workflow Qwen 2.5-VL ile aynı
✅ Teslim
  1. Pixtral 12B AWQ inference test (vLLM). 2) Mini visual SFT. 3) Sonraki ders: 6.6 — InternVL2.5 / Idefics3 / Phi-4-Multimodal.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content