Mistral 7B v0.3 + Mistral Small 3 (24B): Sliding Window Deprecation + Tool-Calling

Mistral 7B v0.3 (Apache 2.0, 32K context), Mistral Small 3 (24B, Apache 2.0, 32K). v0.3'te sliding window deprecation, function-calling chat template, tool-token training. RTX 4090'da Mistral 7B QLoRA 1 epoch ~45 dakika. Mistral Small 3 (24B): NF4 = 12 GB, QLoRA marjinal sığar (~22 GB peak).

Şükrü Yusuf KAYA

32 dakikalık okuma

14.05.2026

İleri

Mistral 7B v0.3 + Mistral Small 3 (24B): Sliding Window Deprecation + Tool-Calling

1. Mistral Ailesi Kısa Tarihçe#

Versiyon	Çıkış	Önemli farklar
Mistral 7B v0.1	2023-09	Sliding window 4K, GQA
Mistral 7B v0.2	2024-03	Sliding window kaldırıldı, 32K
Mistral 7B v0.3	2024-05	Function-calling, vocab 32K → 32768 (BoS/Tools eklendi)
Mistral Small 3	2024-10	24B, Apache 2.0, low-latency
Mistral Large 2	2024-07	123B (Mistral lisans, ticari ücretli)

Cookbook'un kuralı: Apache 2.0 olanlar (7B v0.3, Small 3, Codestral Mamba) — diğerleri ticari kısıtlı.

2. Function-Calling Chat Template#

Mistral 7B v0.3 chat template:

<s>[INST] {system_message}\n\n{user_message} [/INST] {assistant_message}</s>
[INST] [AVAILABLE_TOOLS] [...] [/AVAILABLE_TOOLS] {user_message} [/INST] [TOOL_CALLS] [...] </s>

Special tokens:

[INST]
/
[/INST]
— instruction wrapper
[AVAILABLE_TOOLS]
/
[/AVAILABLE_TOOLS]
— tools description
[TOOL_CALLS]
— model'in tool çağrıları
[TOOL_RESULTS]
/
[/TOOL_RESULTS]
— tool output

Tool-calling FT için dataset format:

{
  "messages": [
    {"role": "user", "content": "Bugün Istanbul'da hava?"},
    {"role": "assistant", "tool_calls": [{
      "function": {"name": "get_weather", "arguments": "{\"city\": \"Istanbul\"}"}
    }]},
    {"role": "tool", "content": "{\"temp\": 18, \"condition\": \"partly cloudy\"}"},
    {"role": "assistant", "content": "Istanbul'da hava 18°C, parçalı bulutlu."}
  ]
}

python

# === Mistral 7B v0.3 Function-Calling FT Lab ===
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
from transformers import BitsAndBytesConfig
import torch
 
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_use_double_quant=True,
                         bnb_4bit_compute_dtype=torch.bfloat16)
 
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    quantization_config=bnb,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
tok.pad_token = tok.eos_token
 
lora = LoraConfig(r=32, lora_alpha=64, lora_dropout=0.05,
                  target_modules=["q_proj","k_proj","v_proj","o_proj",
                                  "gate_proj","up_proj","down_proj"],
                  task_type="CAUSAL_LM")
model = get_peft_model(model, lora)
 
# Function-calling dataset (e.g. Glaive-Function-Calling-v2 TR-translated)
dataset = load_dataset("user/tr-function-calling", split="train")
# Format: {"messages": [...]}
 
def format_fc(example):
    return {"text": tok.apply_chat_template(
        example["messages"],
        tools=example.get("tools"),
        tokenize=False,
    )}
 
dataset = dataset.map(format_fc, num_proc=8)
 
cfg = SFTConfig(
    output_dir="mistral-7b-v03-fc-tr",
    num_train_epochs=2,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    bf16=True, optim="paged_adamw_8bit",
    max_seq_length=4096, packing=True,
    dataset_text_field="text",
    logging_steps=5, report_to="wandb",
)
trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=dataset, args=cfg)
trainer.train()

Mistral 7B v0.3 function-calling FT

3. Mistral Small 3 (24B) — RTX 4090 Marjinal#

24B NF4 = 12 GB sadece W. Activation + LoRA + B ≈ 8-10 GB. Total ~22 GB, sığar ama gergin.

Term	Value
W (NF4)	12.0 GB
A (seq=2048, batch=1, grad-ckpt)	5.5 GB
O + G + B	3.5 GB
Total	21 GB

Cookbook'un kuralı: 24B QLoRA için batch=1, seq=2048 ile başla — sonra kademeli artır.

✅ Teslim

Mistral 7B v0.3'ü function-calling dataset üzerinde FT et. 2) Tool-call accuracy'i ölç. 3) Sonraki ders: 3.6 — Gemma 3 1B / 4B / 12B / 27B.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

Part 0 — Engineering Foundations

Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı

Öğrenmeye Başla

Part 0 — Engineering Foundations

Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir

Öğrenmeye Başla

Part 0 — Engineering Foundations

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix ve Container Reçeteleri

Öğrenmeye Başla

Bağlantılı Pillar Konular

Bu yazının bağlandığı pillar konular

Pillar Konusu

Agentic AI ve Otonom Sistemler

Agentic AI, büyük dil modelinin tek bir cevap vermek yerine; planlama, araç çağırma (tool use), bellek (memory) ve geri bildirim döngüleri ile çok adımlı görevleri otonom biçimde tamamladığı yapay zeka mimarisidir.