Reasoning Revolution: From OpenAI o1 to DeepSeek-R1 — Test-Time Compute and the Rebirth of Chain-of-Thought
2024-2026 LLM frontier: reasoning models. OpenAI o1 (Sept 2024), DeepSeek-R1 (Jan 2025) revolution. Test-time compute scaling (Kaplan's new dimension), chain-of-thought intensification, hidden reasoning tokens (o1) vs visible (R1), RL training reasoning patterns. AIME, MATH benchmark revolution, GPT-4 → o1 90% accuracy jump.
Şükrü Yusuf KAYA
75 min read
Advanced🧠 Reasoning models — LLM'in 2024-2026 devrimi
12 Eylül 2024. OpenAI o1 lansmanı. AIME (American Invitational Mathematics Examination) matematik problem accuracy: GPT-4o %12 → o1 %83. 7x improvement tek model değişikliğiyle. Mekanizma: test-time compute scaling. Model 'düşünmek' için daha fazla zaman harcıyor — 100K+ reasoning tokens before final answer. Ocak 2025: DeepSeek-R1 açık-kaynak olarak benzer kalite, full transparency. 'Reasoning' yeni paradigma. 75 dakika sonra: reasoning models'in matematiksel anatomisini, o1 ve R1 architectural farklarını, test-time compute scaling'in matematiğini derinlemesine kavramış olacaksın.
Ders Haritası (10 Bölüm)#
- Pre-reasoning era — GPT-4 sınırı
- Chain-of-Thought — Wei 2022, prompting trick
- o1 lansman — OpenAI Eylül 2024
- Test-time compute scaling — Kaplan'ın yeni boyutu
- RL for reasoning — process reward model
- DeepSeek-R1 (Ocak 2025) — open-source breakthrough
- Reasoning tokens — hidden (o1) vs visible (R1)
- AIME, MATH, Codeforces benchmarks — quality leap
- Cost economics — reasoning expensive
- Türkçe reasoning — practical implications
1-5. Reasoning Evrim#
1.1 Pre-reasoning era (GPT-4 era)#
GPT-4 matematik:
- Simple arithmetic: OK
- AIME problems: %12 accuracy
- Hard math olympiad: <%5
LLM'lerin 'düşünememesi' bilinen problem. Direct token generation, reasoning ihtimalleri.
1.2 Chain-of-Thought (Wei 2022)#
Google Wei et al. 'Chain-of-Thought Prompting Elicits Reasoning in LLMs'.
Prompt trick:
Query: '23 × 47 = ?' Kötü: '1081' (yanlış, model atlama yapıyor) İyi (CoT prompt): 'Let me think step by step. 23 × 47 = 23 × 40 + 23 × 7 = 920 + 161 = 1081'
CoT prompting → %20-40 accuracy boost matematik problem'lerinde. GPT-4 era'da standard practice.
1.3 CoT'in sınırı#
Prompt-based CoT short reasoning sequences. Karmaşık problemler için yetmez (multi-step proof, complex algorithm).
Model 'beam search'-like exploration yapamıyor — sadece linear generation.
1.4 o1 (OpenAI, Eylül 2024)#
'Learning to Reason with LLMs' blog post.
Key innovation: RL training reasoning patterns.
- Pre-trained base + extensive RL on math/code/reasoning tasks
- Reward: final answer correctness
- Model discovers long reasoning strategies
Result: reasoning tokens before answer:
User: 'AIME 2024 Problem 1: ...' Model internal: '<reasoning>Let me think... try approach A... no, B...</reasoning>' Model output: 'Answer: 42'
Reasoning tokens hidden (not visible to user). 'Thinking time' market metaphor.
1.5 Test-time compute scaling#
Kaplan 2020 scaling laws: compute → loss.
New dimension: test-time compute = reasoning tokens generated.
Accuracy = f(train_compute, test_time_compute)
o1 paper: doubling test-time compute → consistent accuracy improvement. New scaling law.
GPT-4o: ~100 tokens for 'thinking'
o1: ~10,000-100,000 reasoning tokens before answer
1.6 RL training detay#
Process Reward Model (PRM): not just final answer, intermediate reasoning steps reward.
Reward(reasoning_step) = (helpful_to_final_answer, no_logical_errors)
Model öğrenir: 'düşünme yolu doğru olmalı'. Backtracking, self-verification, alternative approaches.
1.7 Self-play + search#
AlphaGo benzeri RL: model kendi reasoning'ini eleştirir, alternatif düşünür.
Thought 1 → Critic: 'wrong direction' Backtrack → Thought 2 → Critic: 'good' Continue... Final answer
6-10. DeepSeek-R1 + Pratik#
6.1 DeepSeek-R1 (Ocak 2025)#
DeepSeek AI: 'R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning'.
Major open-source breakthrough:
- Full paper, architecture, training recipe
- Distilled smaller variants (R1-distilled 7B, 32B)
- Open weights HuggingFace
- Quality: comparable o1
DeepSeek-R1 details:
- Base: DeepSeek-V3 671B MoE (active 37B)
- RL training: GRPO (Group Relative Policy Optimization)
- Process reward + outcome reward combination
- 30M reasoning examples
- ~$5M training cost
6.2 Hidden vs Visible reasoning#
o1 hidden: reasoning tokens internal, user görmüyor. Privacy + IP protection.
R1 visible: reasoning tokens public, user görüyor.
Visible approach educational + debuggable. Hidden approach UX cleaner.
6.3 Reasoning quality leap#
AIME 2024:
- GPT-4o: %12
- Claude 3.5 Sonnet: %16
- o1-preview: %75
- o1: %83
- DeepSeek-R1: %80
MATH benchmark:
- GPT-4o: %76
- o1: %94
- R1: %93
Codeforces:
- GPT-4o: %12 percentile
- o1: %89 percentile (expert programmer)
- R1: %85
6.4 Cost economics#
Reasoning expensive:
- o1 input: 2.5)
- o1 output: 10)
- 4-6x more expensive
Plus reasoning tokens count against output. 10K reasoning + 500 output = 10500 output tokens billed.
6.5 When to use reasoning models#
- Math, science, coding (high benefit)
- Multi-step planning (chain-of-thought needed)
- NOT for: simple Q&A, summarization, creative writing (overkill)
6.6 Türkçe reasoning#
DeepSeek-R1 Türkçe quality:
- Math problem reasoning: OK (English internal, output Turkish)
- Türkçe-specific tasks (Türkçe grammar reasoning): weaker than English
Fine-tune Türkçe reasoning: emerging research area.
6.7 Future#
- 2025-2026: reasoning standard, all major labs
- Test-time compute scaling continues
- 'Reasoning agents' — multi-LLM verification
- AI safety implications: model 'thinks' before output, more controllable
✅ Ders 17.1 Özeti — Reasoning Models
Reasoning models 2024-2026 LLM frontier. o1 (OpenAI Eylül 2024): RL on reasoning tasks, hidden reasoning tokens, AIME %83. DeepSeek-R1 (Ocak 2025): open-source breakthrough, visible reasoning, comparable quality, GRPO RL. Test-time compute new scaling dimension — Kaplan'ın 4. boyutu. Cost: 4-6x more than GPT-4o. Use case: math, code, planning. Türkçe reasoning emerging. Ders 17.2'de DeepSeek-R1 deep dive ve self-host.
Sıradaki Ders: DeepSeek-R1 Self-Host#
Ders 17.2: DeepSeek-R1-distilled (7B, 32B) self-host, prompt patterns, Türkçe math reasoning deployment.
Frequently Asked Questions
No. Much better in math/code/planning. Overkill for simple Q&A, creative writing, summarization (slow + expensive). Use-case matters.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Module 0: Course Framework & Workshop Setup
Who Is an LLM Engineer? The AI Engineering Career Ladder from Junior to Staff
Start LearningModule 0: Course Framework & Workshop Setup
Course Philosophy: Why This Path, Why This Order — The Skeleton of an 8-Month Curriculum
Start LearningModule 0: Course Framework & Workshop Setup
Workshop Setup: uv, PyTorch 2.5+, CUDA, WSL2, Mac MPS, Triton, FlashAttention, Nsight
Start LearningConnected pillar topics