OpenAI o-series Reinforcement Fine-Tuning (RFT): Grader Function Design

OpenAI announced RFT in late 2024: fine-tune o-series models (o1, o3, o4-mini) with reasoning RL. **Grader function** — function you write that gives numerical score to model output (math correctness, code execution, custom rule). Ideal for verifiable domains. JSON-based grader spec.

Şükrü Yusuf KAYA

28 min read

6/26/2026

Advanced

OpenAI o-series Reinforcement Fine-Tuning (RFT): Grader Function Design

json

// === OpenAI Grader Function Spec ===
{
  "type": "multi",
  "graders": {
    "math_correctness": {
      "type": "string_check",
      "input": "{{sample.final_answer}}",
      "reference": "{{item.gold_answer}}",
      "operation": "eq"
    },
    "step_count": {
      "type": "python",
      "source": "def grade(sample, item):\n    steps = sample['response'].count('Step')\n    return min(steps / 5.0, 1.0)"
    },
    "uses_formula": {
      "type": "string_check",
      "input": "{{sample.response}}",
      "operation": "contains",
      "reference": "{{item.required_formula}}"
    }
  },
  "calculate_output": "0.7 * math_correctness + 0.2 * uses_formula + 0.1 * step_count"
}

OpenAI RFT grader spec

✅ Teslim

Math problem set için grader function tasarla. 2) o4-mini RFT job pilot. 3) Sonraki ders: 14.3 — GPT-5/5.1 Distillation Pipeline.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

OpenAI o-series Reinforcement Fine-Tuning (RFT): Grader Function Design

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter