İçeriğe geç

Qwen 2.5 32B / 72B Math + Code Mastery: GSM8K + MATH-500 + HumanEval FT Reçetesi

Qwen 2.5 32B/72B — math + code'da Llama 70B'yi geçen baseline. Math-heavy dataset mix (GSM8K + MATH + AIME + MetaMathQA), step-by-step CoT format, code execution loop, hyperparameter farkları (lr daha düşük, ep daha çok). 4×H100 80GB QLoRA 32B reçetesi (~3 saat).

Şükrü Yusuf KAYA
32 dakikalık okuma
İleri
Qwen 2.5 32B / 72B Math + Code Mastery: GSM8K + MATH-500 + HumanEval FT Reçetesi

1. Memory Budget#

SetupHardwareW (NF4)Total peakSığar?
Qwen 2.5 32B QLoRA1×H100 80GB16 GB38 GB
Qwen 2.5 32B QLoRA4×H100 80GB FSDP4 GB/GPU12 GB/GPU✅ rahat
Qwen 2.5 32B QLoRA1× RTX 4090 24GB16 GBOOM
Qwen 2.5 32B QLoRA2× RTX 4090 FSDP8 GB/GPU18 GB/GPU⚠️ marjinal
Qwen 2.5 72B QLoRA8×H100 80GB FSDP4.5 GB/GPU14 GB/GPU
Qwen 2.5 72B QLoRA1×H100 80GB CPU offload36 GB70 GB⚠️ yavaş
python
# === Math-Heavy Dataset Mix ===
from datasets import load_dataset, concatenate_datasets
 
# Cookbook'un math mix'i
gsm8k = load_dataset("openai/gsm8k", "main", split="train") # 7.4K
math = load_dataset("hendrycks/competition_math", split="train") # 7.5K
metamath = load_dataset("meta-math/MetaMathQA", split="train") # 395K
orca_math = load_dataset("microsoft/orca-math-word-problems-200k", split="train") # 200K
 
# Step-by-step CoT format
def to_chat_cot(ex, source):
if source == "gsm8k":
problem = ex["question"]
solution = ex["answer"] # already step-by-step
elif source == "math":
problem = ex["problem"]
solution = ex["solution"]
elif source == "metamath":
problem = ex["query"]
solution = ex["response"]
return {
"text": tokenizer.apply_chat_template([
{"role": "user", "content": problem},
{"role": "assistant", "content": solution},
], tokenize=False)
}
 
# Mix with τ=0.4 sampling
import numpy as np
sizes = np.array([len(gsm8k), len(math), len(metamath), len(orca_math)])
weights = (sizes ** 0.4) / (sizes ** 0.4).sum()
 
# Train — Qwen 2.5 32B + 4×H100 + ~3 saat
Math-heavy dataset mix — Qwen 32B

2. Qwen Code Variant — Qwen 2.5 Coder#

ModelHumanEval baseAfter FT
Qwen 2.5 Coder 32B92.7%95.2% (+2.5)
DeepSeek-Coder-V2 16B90.2%93.1%
Llama 3.1 70B (non-code)80.5%85.0%
Karar: Code FT yapacaksan baseline Qwen 2.5 Coder 32B. Genel chat için Qwen 2.5 72B-Instruct.
✅ Teslim
  1. Qwen 2.5 32B'yi mini math mix ile FT et (1000 sample). 2) GSM8K accuracy ölç. 3) Sonraki ders: 4.9 — Command-R+ + Granite 3.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler