Phi-4 + Phi-4-mini: Microsoft'un Synthetic-Curriculum Modeli — TR'de Niye Kırılgan?

Phi-4 14B + Phi-4-mini 3.8B — Microsoft'un 'textbook quality' synthetic data ile train edilmiş modelleri. Math + code'da güçlü, genel TR konuşmada zayıf. Niye? Synthetic data ağırlıklı İngilizce. RTX 4090'da Phi-4 QLoRA Lab + niche domain'lerde nasıl parlıyor (math reasoning, code completion).

Şükrü Yusuf KAYA

30 dakikalık okuma

14.05.2026

İleri

Phi-4 + Phi-4-mini: Microsoft'un Synthetic-Curriculum Modeli — TR'de Niye Kırılgan?

1. Phi-4 Mimari#

Feature	Phi-4	Phi-4-mini
Layers	40	32
Hidden	5120	3072
KV heads	10	24
Vocab	100,352 (tiktoken cl100k)	100,352
Active params	14.7B	3.82B
Native context	16K	128K
Pre-train data	9.4T (synthetic-heavy)	7T (synthetic + web)

Synthetic curriculum:

Phi-1/2/3: "textbook-quality" web filter + GPT-4 generated synthetic textbook
Phi-4: 2.5T synthetic + 5T filtered web + 1.5T code

Sonuç: Genel İngilizce kalite çok yüksek; ama:

Multilingual zayıf (synthetic data mostly EN)
TR-MMLU: 26-29 (Llama 8B 32.4, Qwen 7B 38.1)
Math + Code'da Llama 8B'yi rahat geçer

2. Phi-4'ün Parladığı Yerler#

Benchmark	Phi-4 14B	Llama 3.1 8B	Qwen 2.5 7B
MMLU (EN)	84.8	73.0	74.2
GSM8K (math)	94.3	84.5	85.4
MATH-500	80.4	51.9	49.8
HumanEval (code)	82.6	72.6	80.5
TR-MMLU	27.4	32.4	38.1
MT-Bench-TR	4.1	6.4	7.3

Karar: English math/code use-case → Phi-4 baseline. TR kullanım → Llama / Qwen / Gemma.

python

# === Phi-4 14B Math FT Lab (English, GSM8K-style) ===
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
import torch
 
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_use_double_quant=True,
                         bnb_4bit_compute_dtype=torch.bfloat16)
 
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-4",
    quantization_config=bnb,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16, device_map="cuda",
)
tok = AutoTokenizer.from_pretrained("microsoft/phi-4")
 
lora = LoraConfig(r=32, lora_alpha=64,
                  target_modules=["q_proj","k_proj","v_proj","o_proj",
                                  "gate_proj","up_proj","down_proj"],
                  task_type="CAUSAL_LM")
model = get_peft_model(model, lora)
 
dataset = load_dataset("openai/gsm8k", "main", split="train")
 
def to_chat(ex):
    messages = [
        {"role": "user", "content": ex["question"]},
        {"role": "assistant", "content": ex["answer"]},
    ]
    return {"text": tok.apply_chat_template(messages, tokenize=False)}
 
dataset = dataset.map(to_chat, num_proc=8)
 
cfg = SFTConfig(
    output_dir="phi-4-math-ft",
    num_train_epochs=2,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=1e-4,                       # Phi-4 daha düşük lr ister (sensitive)
    bf16=True, optim="paged_adamw_8bit",
    max_seq_length=4096, packing=True,
    dataset_text_field="text",
    logging_steps=5, report_to="wandb",
)
SFTTrainer(model=model, tokenizer=tok, train_dataset=dataset, args=cfg).train()

Phi-4 14B Math FT (GSM8K) — RTX 4090

✅ Teslim

Phi-4'ü TR ile FT etmeyi deneme (test edilecek hipotez: synthetic-curriculum modelin TR adaptation'ı klasik modellerden farklı mı?). 2) Phi-4-mini'yi GSM8K'da FT et, math reasoning'i +%5 yükselt. 3) Sonraki ders: 3.8 — SmolLM3 1.7B.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

Part 0 — Engineering Foundations

Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı

Öğrenmeye Başla

Part 0 — Engineering Foundations

Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir

Öğrenmeye Başla

Part 0 — Engineering Foundations

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix ve Container Reçeteleri

Öğrenmeye Başla