Pixtral 12B + Pixtral Large: Mistral Multimodal — Resolution-Free + Apache 2.0

Pixtral 12B (Mistral Nemo 12B + 400M ViT) + Pixtral Large (124B) — Mistral's open multimodal. Apache 2.0, resolution-free, EU AI Act-compliance friendly. 7-32 image per context, 128K context. Pixtral 12B QLoRA marginal on RTX 4090 (~22 GB).

Şükrü Yusuf KAYA

22 min read

5/14/2026

Advanced

Pixtral 12B + Pixtral Large: Mistral Multimodal — Resolution-Free + Apache 2.0

1. Pixtral Spec#

Model	Total	Vision	LLM Base	Context	Lisans
Pixtral 12B	12B + 400M ViT	Pixtral ViT 400M	Mistral Nemo 12B	128K	Apache 2.0
Pixtral Large	124B	Pixtral ViT 1B	Mistral Large 2 123B	128K	Mistral Research

Apache 2.0 önemli: EU AI Act + commercial production için en esnek lisans.

# Pixtral 12B basic FT
from transformers import LlavaForConditionalGeneration, AutoProcessor
model = LlavaForConditionalGeneration.from_pretrained(
    "mistral-community/pixtral-12b",
    quantization_config=bnb_4bit,
    torch_dtype="bfloat16",
)
# LoRA + Visual instruction tune workflow Qwen 2.5-VL ile aynı

✅ Teslim

Pixtral 12B AWQ inference test (vLLM). 2) Mini visual SFT. 3) Sonraki ders: 6.6 — InternVL2.5 / Idefics3 / Phi-4-Multimodal.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Pixtral 12B + Pixtral Large: Mistral Multimodal — Resolution-Free + Apache 2.0

1. Pixtral Spec#

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter