OpenAI o-series Reinforcement Fine-Tuning (RFT): Grader Function Design
OpenAI announced RFT in late 2024: fine-tune o-series models (o1, o3, o4-mini) with reasoning RL. **Grader function** — function you write that gives numerical score to model output (math correctness, code execution, custom rule). Ideal for verifiable domains. JSON-based grader spec.
Şükrü Yusuf KAYA
28 min read
Advancedjson
// === OpenAI Grader Function Spec ==={ "type": "multi", "graders": { "math_correctness": { "type": "string_check", "input": "{{sample.final_answer}}", "reference": "{{item.gold_answer}}", "operation": "eq" }, "step_count": { "type": "python", "source": "def grade(sample, item):\n steps = sample['response'].count('Step')\n return min(steps / 5.0, 1.0)" }, "uses_formula": { "type": "string_check", "input": "{{sample.response}}", "operation": "contains", "reference": "{{item.required_formula}}" } }, "calculate_output": "0.7 * math_correctness + 0.2 * uses_formula + 0.1 * step_count"}OpenAI RFT grader spec
✅ Teslim
- Math problem set için grader function tasarla. 2) o4-mini RFT job pilot. 3) Sonraki ders: 14.3 — GPT-5/5.1 Distillation Pipeline.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Part 0 — Engineering Foundations
Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract
Start LearningPart 0 — Engineering Foundations
Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem
Start LearningPart 0 — Engineering Foundations