# Reasoning Eval: AIME 2024/2025 + MATH-500 + GPQA-Diamond + LiveCodeBench

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-reasoning-eval-aime-math-gpqa
> Updated: 2026-05-14T14:42:59.217Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XII — Reasoning Model FT (R1-style)
**TLDR:** Reasoning model standard eval suite: AIME 2024 (30 problems, USA Math Olympiad), AIME 2025 (new), MATH-500 (500 high-school competition), GPQA-Diamond (graduate-level science Q&A), LiveCodeBench (monthly-refreshed). pass@1 vs majority voting (pass@64) difference. Cookbook standard eval pipeline.

