# DeepSeek-R1 GRPO in Depth: Mathematics of Open Reasoning RL — Group Relative Policy Optimization

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/deepseek-r1-grpo-derinlemesine-matematik
> Updated: 2026-05-13T13:00:30.896Z
> Category: LLM Mühendisliği
> Module: Module 17: Reasoning Models — Test-Time Compute Revolution
**TLDR:** Main training algorithm of DeepSeek-R1 (January 2025) GRPO (Group Relative Policy Optimization). Line-by-line derivation of differences from PPO. Value-function-free advantage estimation (group comparison). Detailed walk-through of 4-stage training (R1-Zero → Cold Start → Reasoning RL → Distill). Empirical phenomenon of 'aha moments' — examples and statistical analysis given in paper. Turkish R1 fine-tune strategies.

