İçeriğe geç

Phi-4 + Phi-4-mini: Microsoft'un Synthetic-Curriculum Modeli — TR'de Niye Kırılgan?

Phi-4 14B + Phi-4-mini 3.8B — Microsoft'un 'textbook quality' synthetic data ile train edilmiş modelleri. Math + code'da güçlü, genel TR konuşmada zayıf. Niye? Synthetic data ağırlıklı İngilizce. RTX 4090'da Phi-4 QLoRA Lab + niche domain'lerde nasıl parlıyor (math reasoning, code completion).

Şükrü Yusuf KAYA
30 dakikalık okuma
İleri
Phi-4 + Phi-4-mini: Microsoft'un Synthetic-Curriculum Modeli — TR'de Niye Kırılgan?

1. Phi-4 Mimari#

FeaturePhi-4Phi-4-mini
Layers4032
Hidden51203072
KV heads1024
Vocab100,352 (tiktoken cl100k)100,352
Active params14.7B3.82B
Native context16K128K
Pre-train data9.4T (synthetic-heavy)7T (synthetic + web)
Synthetic curriculum:
  • Phi-1/2/3: "textbook-quality" web filter + GPT-4 generated synthetic textbook
  • Phi-4: 2.5T synthetic + 5T filtered web + 1.5T code
Sonuç: Genel İngilizce kalite çok yüksek; ama:
  • Multilingual zayıf (synthetic data mostly EN)
  • TR-MMLU: 26-29 (Llama 8B 32.4, Qwen 7B 38.1)
  • Math + Code'da Llama 8B'yi rahat geçer

2. Phi-4'ün Parladığı Yerler#

BenchmarkPhi-4 14BLlama 3.1 8BQwen 2.5 7B
MMLU (EN)84.873.074.2
GSM8K (math)94.384.585.4
MATH-50080.451.949.8
HumanEval (code)82.672.680.5
TR-MMLU27.432.438.1
MT-Bench-TR4.16.47.3
Karar: English math/code use-case → Phi-4 baseline. TR kullanım → Llama / Qwen / Gemma.
python
# === Phi-4 14B Math FT Lab (English, GSM8K-style) ===
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
import torch
 
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16)
 
model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-4",
quantization_config=bnb,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16, device_map="cuda",
)
tok = AutoTokenizer.from_pretrained("microsoft/phi-4")
 
lora = LoraConfig(r=32, lora_alpha=64,
target_modules=["q_proj","k_proj","v_proj","o_proj",
"gate_proj","up_proj","down_proj"],
task_type="CAUSAL_LM")
model = get_peft_model(model, lora)
 
dataset = load_dataset("openai/gsm8k", "main", split="train")
 
def to_chat(ex):
messages = [
{"role": "user", "content": ex["question"]},
{"role": "assistant", "content": ex["answer"]},
]
return {"text": tok.apply_chat_template(messages, tokenize=False)}
 
dataset = dataset.map(to_chat, num_proc=8)
 
cfg = SFTConfig(
output_dir="phi-4-math-ft",
num_train_epochs=2,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=1e-4, # Phi-4 daha düşük lr ister (sensitive)
bf16=True, optim="paged_adamw_8bit",
max_seq_length=4096, packing=True,
dataset_text_field="text",
logging_steps=5, report_to="wandb",
)
SFTTrainer(model=model, tokenizer=tok, train_dataset=dataset, args=cfg).train()
Phi-4 14B Math FT (GSM8K) — RTX 4090
✅ Teslim
  1. Phi-4'ü TR ile FT etmeyi deneme (test edilecek hipotez: synthetic-curriculum modelin TR adaptation'ı klasik modellerden farklı mı?). 2) Phi-4-mini'yi GSM8K'da FT et, math reasoning'i +%5 yükselt. 3) Sonraki ders: 3.8 — SmolLM3 1.7B.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler