İçeriğe geç

Reasoning Devrim: OpenAI o1'den DeepSeek-R1'e — Test-Time Compute ve Chain-of-Thought'un Yeniden Doğuşu

2024-2026 LLM frontier'ı: reasoning models. OpenAI o1 (Eylül 2024), DeepSeek-R1 (Ocak 2025) devrim. Test-time compute scaling (Kaplan'ın yeni boyutu), chain-of-thought intensification, hidden reasoning tokens (o1) vs visible (R1), RL training reasoning patterns. AIME, MATH benchmark devrim, GPT-4 → o1 90% accuracy sıçraması.

Şükrü Yusuf KAYA
75 dakikalık okuma
İleri
Reasoning Devrim: OpenAI o1'den DeepSeek-R1'e — Test-Time Compute ve Chain-of-Thought'un Yeniden Doğuşu
🧠 Reasoning models — LLM'in 2024-2026 devrimi
12 Eylül 2024. OpenAI o1 lansmanı. AIME (American Invitational Mathematics Examination) matematik problem accuracy: GPT-4o %12 → o1 %83. 7x improvement tek model değişikliğiyle. Mekanizma: test-time compute scaling. Model 'düşünmek' için daha fazla zaman harcıyor — 100K+ reasoning tokens before final answer. Ocak 2025: DeepSeek-R1 açık-kaynak olarak benzer kalite, full transparency. 'Reasoning' yeni paradigma. 75 dakika sonra: reasoning models'in matematiksel anatomisini, o1 ve R1 architectural farklarını, test-time compute scaling'in matematiğini derinlemesine kavramış olacaksın.

Ders Haritası (10 Bölüm)#

  1. Pre-reasoning era — GPT-4 sınırı
  2. Chain-of-Thought — Wei 2022, prompting trick
  3. o1 lansman — OpenAI Eylül 2024
  4. Test-time compute scaling — Kaplan'ın yeni boyutu
  5. RL for reasoning — process reward model
  6. DeepSeek-R1 (Ocak 2025) — open-source breakthrough
  7. Reasoning tokens — hidden (o1) vs visible (R1)
  8. AIME, MATH, Codeforces benchmarks — quality leap
  9. Cost economics — reasoning expensive
  10. Türkçe reasoning — practical implications

1-5. Reasoning Evrim#

1.1 Pre-reasoning era (GPT-4 era)#

GPT-4 matematik:
  • Simple arithmetic: OK
  • AIME problems: %12 accuracy
  • Hard math olympiad: <%5
LLM'lerin 'düşünememesi' bilinen problem. Direct token generation, reasoning ihtimalleri.

1.2 Chain-of-Thought (Wei 2022)#

Google Wei et al. 'Chain-of-Thought Prompting Elicits Reasoning in LLMs'.
Prompt trick:
Query: '23 × 47 = ?' Kötü: '1081' (yanlış, model atlama yapıyor) İyi (CoT prompt): 'Let me think step by step. 23 × 47 = 23 × 40 + 23 × 7 = 920 + 161 = 1081'
CoT prompting → %20-40 accuracy boost matematik problem'lerinde. GPT-4 era'da standard practice.

1.3 CoT'in sınırı#

Prompt-based CoT short reasoning sequences. Karmaşık problemler için yetmez (multi-step proof, complex algorithm).
Model 'beam search'-like exploration yapamıyor — sadece linear generation.

1.4 o1 (OpenAI, Eylül 2024)#

'Learning to Reason with LLMs' blog post.
Key innovation: RL training reasoning patterns.
  • Pre-trained base + extensive RL on math/code/reasoning tasks
  • Reward: final answer correctness
  • Model discovers long reasoning strategies
Result: reasoning tokens before answer:
User: 'AIME 2024 Problem 1: ...' Model internal: '<reasoning>Let me think... try approach A... no, B...</reasoning>' Model output: 'Answer: 42'
Reasoning tokens hidden (not visible to user). 'Thinking time' market metaphor.

1.5 Test-time compute scaling#

Kaplan 2020 scaling laws: compute → loss. New dimension: test-time compute = reasoning tokens generated.
Accuracy = f(train_compute, test_time_compute)
o1 paper: doubling test-time compute → consistent accuracy improvement. New scaling law.
GPT-4o: ~100 tokens for 'thinking' o1: ~10,000-100,000 reasoning tokens before answer

1.6 RL training detay#

Process Reward Model (PRM): not just final answer, intermediate reasoning steps reward.
Reward(reasoning_step) = (helpful_to_final_answer, no_logical_errors)
Model öğrenir: 'düşünme yolu doğru olmalı'. Backtracking, self-verification, alternative approaches.

1.7 Self-play + search#

AlphaGo benzeri RL: model kendi reasoning'ini eleştirir, alternatif düşünür.
Thought 1 → Critic: 'wrong direction' Backtrack → Thought 2 → Critic: 'good' Continue... Final answer

6-10. DeepSeek-R1 + Pratik#

6.1 DeepSeek-R1 (Ocak 2025)#

DeepSeek AI: 'R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning'.
Major open-source breakthrough:
  • Full paper, architecture, training recipe
  • Distilled smaller variants (R1-distilled 7B, 32B)
  • Open weights HuggingFace
  • Quality: comparable o1
DeepSeek-R1 details:
  • Base: DeepSeek-V3 671B MoE (active 37B)
  • RL training: GRPO (Group Relative Policy Optimization)
  • Process reward + outcome reward combination
  • 30M reasoning examples
  • ~$5M training cost

6.2 Hidden vs Visible reasoning#

o1 hidden: reasoning tokens internal, user görmüyor. Privacy + IP protection. R1 visible: reasoning tokens public, user görüyor.
Visible approach educational + debuggable. Hidden approach UX cleaner.

6.3 Reasoning quality leap#

AIME 2024:
  • GPT-4o: %12
  • Claude 3.5 Sonnet: %16
  • o1-preview: %75
  • o1: %83
  • DeepSeek-R1: %80
MATH benchmark:
  • GPT-4o: %76
  • o1: %94
  • R1: %93
Codeforces:
  • GPT-4o: %12 percentile
  • o1: %89 percentile (expert programmer)
  • R1: %85

6.4 Cost economics#

Reasoning expensive:
  • o1 input: 15/1M(vsGPT4o15/1M (vs GPT-4o 2.5)
  • o1 output: 60/1M(vsGPT4o60/1M (vs GPT-4o 10)
  • 4-6x more expensive
Plus reasoning tokens count against output. 10K reasoning + 500 output = 10500 output tokens billed.

6.5 When to use reasoning models#

  • Math, science, coding (high benefit)
  • Multi-step planning (chain-of-thought needed)
  • NOT for: simple Q&A, summarization, creative writing (overkill)

6.6 Türkçe reasoning#

DeepSeek-R1 Türkçe quality:
  • Math problem reasoning: OK (English internal, output Turkish)
  • Türkçe-specific tasks (Türkçe grammar reasoning): weaker than English
Fine-tune Türkçe reasoning: emerging research area.

6.7 Future#

  • 2025-2026: reasoning standard, all major labs
  • Test-time compute scaling continues
  • 'Reasoning agents' — multi-LLM verification
  • AI safety implications: model 'thinks' before output, more controllable
✅ Ders 17.1 Özeti — Reasoning Models
Reasoning models 2024-2026 LLM frontier. o1 (OpenAI Eylül 2024): RL on reasoning tasks, hidden reasoning tokens, AIME %83. DeepSeek-R1 (Ocak 2025): open-source breakthrough, visible reasoning, comparable quality, GRPO RL. Test-time compute new scaling dimension — Kaplan'ın 4. boyutu. Cost: 4-6x more than GPT-4o. Use case: math, code, planning. Türkçe reasoning emerging. Ders 17.2'de DeepSeek-R1 deep dive ve self-host.

Sıradaki Ders: DeepSeek-R1 Self-Host#

Ders 17.2: DeepSeek-R1-distilled (7B, 32B) self-host, prompt patterns, Türkçe math reasoning deployment.

Sık Sorulan Sorular

Hayır. Math/code/planning'de çok iyi. Simple Q&A, creative writing, summarization'da overkill (yavaş + pahalı). Use-case matters.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler

Bağlantılı Pillar Konular

Bu yazının bağlandığı pillar konular