Qwen 2.5 32B / 72B Math + Code Mastery: GSM8K + MATH-500 + HumanEval FT Reçetesi
Qwen 2.5 32B/72B — math + code'da Llama 70B'yi geçen baseline. Math-heavy dataset mix (GSM8K + MATH + AIME + MetaMathQA), step-by-step CoT format, code execution loop, hyperparameter farkları (lr daha düşük, ep daha çok). 4×H100 80GB QLoRA 32B reçetesi (~3 saat).
Şükrü Yusuf KAYA
32 dakikalık okuma
İleri1. Memory Budget#
| Setup | Hardware | W (NF4) | Total peak | Sığar? |
|---|---|---|---|---|
| Qwen 2.5 32B QLoRA | 1×H100 80GB | 16 GB | 38 GB | ✅ |
| Qwen 2.5 32B QLoRA | 4×H100 80GB FSDP | 4 GB/GPU | 12 GB/GPU | ✅ rahat |
| Qwen 2.5 32B QLoRA | 1× RTX 4090 24GB | 16 GB | OOM | ❌ |
| Qwen 2.5 32B QLoRA | 2× RTX 4090 FSDP | 8 GB/GPU | 18 GB/GPU | ⚠️ marjinal |
| Qwen 2.5 72B QLoRA | 8×H100 80GB FSDP | 4.5 GB/GPU | 14 GB/GPU | ✅ |
| Qwen 2.5 72B QLoRA | 1×H100 80GB CPU offload | 36 GB | 70 GB | ⚠️ yavaş |
python
# === Math-Heavy Dataset Mix ===from datasets import load_dataset, concatenate_datasets # Cookbook'un math mix'igsm8k = load_dataset("openai/gsm8k", "main", split="train") # 7.4Kmath = load_dataset("hendrycks/competition_math", split="train") # 7.5Kmetamath = load_dataset("meta-math/MetaMathQA", split="train") # 395Korca_math = load_dataset("microsoft/orca-math-word-problems-200k", split="train") # 200K # Step-by-step CoT formatdef to_chat_cot(ex, source): if source == "gsm8k": problem = ex["question"] solution = ex["answer"] # already step-by-step elif source == "math": problem = ex["problem"] solution = ex["solution"] elif source == "metamath": problem = ex["query"] solution = ex["response"] return { "text": tokenizer.apply_chat_template([ {"role": "user", "content": problem}, {"role": "assistant", "content": solution}, ], tokenize=False) } # Mix with τ=0.4 samplingimport numpy as npsizes = np.array([len(gsm8k), len(math), len(metamath), len(orca_math)])weights = (sizes ** 0.4) / (sizes ** 0.4).sum() # Train — Qwen 2.5 32B + 4×H100 + ~3 saatMath-heavy dataset mix — Qwen 32B
2. Qwen Code Variant — Qwen 2.5 Coder#
| Model | HumanEval base | After FT |
|---|---|---|
| Qwen 2.5 Coder 32B | 92.7% | 95.2% (+2.5) |
| DeepSeek-Coder-V2 16B | 90.2% | 93.1% |
| Llama 3.1 70B (non-code) | 80.5% | 85.0% |
Karar: Code FT yapacaksan baseline Qwen 2.5 Coder 32B. Genel chat için Qwen 2.5 72B-Instruct.
✅ Teslim
- Qwen 2.5 32B'yi mini math mix ile FT et (1000 sample). 2) GSM8K accuracy ölç. 3) Sonraki ders: 4.9 — Command-R+ + Granite 3.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
İlgili İçerikler
Part 0 — Engineering Foundations
Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı
Öğrenmeye BaşlaPart 0 — Engineering Foundations
Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir
Öğrenmeye BaşlaPart 0 — Engineering Foundations