R1-Distill vs full R1 — quality farkı?

DeepSeek-R1 Self-Host + Türkçe Reasoning: Distilled Models, Prompt Patterns, Production Deployment

DeepSeek-R1-distilled (7B, 14B, 32B) self-host: vLLM deployment, hardware requirements, prompt patterns for reasoning, Türkçe math problem solving demo. Reasoning model production usage: when, how, cost-benefit.

Şükrü Yusuf KAYA

65 dakikalık okuma

13.05.2026

İleri

DeepSeek-R1 Self-Host + Türkçe Reasoning: Distilled Models, Prompt Patterns, Production Deployment

🧮 R1-distilled — kendi reasoning model'in single GPU'da

DeepSeek-R1 full model 671B MoE — birkaç GPU lazım. AMA distilled variants mevcut: R1-Distill-Qwen-7B, R1-Distill-Qwen-14B, R1-Distill-Llama-32B. Quality: o1-mini'ye yakın. Single H100 / RTX 4090'da çalışıyor. Production reasoning AI artık herkes için. 65 dakika sonra: R1-distilled deploy, Türkçe math reasoning demo, prompt patterns'ı öğrenmiş olacaksın.

Ders Haritası (8 Bölüm)#

R1 distilled variants — 7B/14B/32B karşılaştırma
Hardware requirements — VRAM math
vLLM deployment — production-ready
Reasoning prompt patterns — when CoT, when direct
Türkçe math demo — sayısal problem çözümü
Visible reasoning tokens — UX considerations
Cost analysis — self-host vs API
Limitation + future — küçük distilled vs full R1

python

# DeepSeek-R1-Distilled-Qwen-32B vLLM deployment
 
# 1. Install vLLM
# pip install vllm
 
# 2. Server (CLI)
# python -m vllm.entrypoints.openai.api_server \
#     --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
#     --tensor-parallel-size 1 \
#     --max-model-len 32768 \
#     --gpu-memory-utilization 0.9 \
#     --port 8000
 
# Hardware needs:
# - R1-Distill-7B: 24 GB VRAM (RTX 4090, A100 40GB) 
# - R1-Distill-14B: 40 GB VRAM (A100 40GB, H100 80GB)
# - R1-Distill-32B: 80 GB VRAM (H100 80GB) or 2x A100 40GB
 
# 3. Use as OpenAI API
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="any",
)
 
# Türkçe math reasoning example
messages = [
    {"role": "user", "content": """
    Bir hızlı tren İstanbul'dan Ankara'ya saatte 200 km hızla gidiyor.
    Aynı anda Ankara'dan İstanbul'a saatte 150 km hızla başka bir tren çıkıyor.
    İstanbul-Ankara arası 450 km. Trenler ne zaman karşılaşır?
    """},
]
 
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
    messages=messages,
    max_tokens=4096,
    temperature=0.6,  # R1 paper recommendation
)
 
# Response includes visible reasoning
print(response.choices[0].message.content)
# Example output:
# <think>
# İki tren karşı yönde geliyor. Toplam hız = 200 + 150 = 350 km/saat.
# Aralarındaki mesafe = 450 km.
# Karşılaşma zamanı = 450 / 350 = 9/7 saat ≈ 1.286 saat ≈ 1 saat 17 dakika.
# </think>
# 
# Trenler yaklaşık **1 saat 17 dakika** sonra karşılaşır.

DeepSeek-R1-Distilled Türkçe reasoning deployment

4-7. Prompt Patterns + Cost#

4.1 Reasoning prompt patterns#

R1 best with:

'Solve step-by-step'
Math problems explicit
'Think carefully'
Multi-step reasoning queries

R1 overkill for:

'What's 2+2?'
'Translate to English'
'Summarize this article'

For simple tasks: use Llama-3-Instruct (faster, cheaper).

4.2 Türkçe reasoning quality#

R1-Distill-32B Türkçe math:

Simple word problems: %85+ accuracy
AIME-level: %70+
Türkçe-specific math vocabulary: occasional confusion

Mixing: reasoning Türkçe + sometimes English mixed (typical of distilled models).

4.3 Cost analysis#

Self-host R1-Distill-32B:

H100 spot $2.5/hour
Throughput ~40 token/sec (reasoning intense)
Reasoning + output ~5000 tokens/query average
125 sec/query = 2 minutes
Cost per query: ~$0.087

OpenAI o1:

$15/1M input +$ 60/1M output
500 input + 5000 output = $0.30/query

Self-host 3-4x cheaper per query. For high-volume reasoning workload, self-host justified.

4.4 Latency consideration#

Reasoning models slow:

GPT-4o: 1-2 sec response
o1: 10-60 sec (sometimes minutes)
R1-Distill-32B self-host: 30-120 sec

UI considerations: progress indicators, streaming reasoning tokens.

🎉 Modül 17 Tamamlandı — Reasoning Models

2 ders boyunca: reasoning revolution (o1 + R1), test-time compute scaling, RL training. DeepSeek-R1-distilled (7B/14B/32B) self-host accessible. Türkçe math reasoning %85+ accuracy. Cost: 3-4x cheaper self-host vs OpenAI. Modül 17 envanteri: 2 ders, 140 dk. Genel müfredat: 18 modül, 89 ders, ~97 saat. Sıradaki: Modül 18 — Mixture of Experts.

Modül 17 Envanteri (Tamamlandı)#

#	Ders	Süre
17.1	Reasoning Devrim o1 + R1	75 dk
17.2	DeepSeek-R1 Self-Host Türkçe	65 dk
Toplam	2 ders	140 dk (~2.3 saat)

Sık Sorulan Sorular

Full R1 671B MoE > Distill 32B > Distill 14B > Distill 7B. Linear quality. Distill 32B o1-mini düzeyinde, full R1 o1 düzeyinde. Distill 7B/14B hobbyist için.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

Modül 0: Kurs Çerçevesi ve Atölye Kurulumu