İçeriğe geç

DeepSeek-R1 Self-Host + Türkçe Reasoning: Distilled Models, Prompt Patterns, Production Deployment

DeepSeek-R1-distilled (7B, 14B, 32B) self-host: vLLM deployment, hardware requirements, prompt patterns for reasoning, Türkçe math problem solving demo. Reasoning model production usage: when, how, cost-benefit.

Şükrü Yusuf KAYA
65 dakikalık okuma
İleri
DeepSeek-R1 Self-Host + Türkçe Reasoning: Distilled Models, Prompt Patterns, Production Deployment
🧮 R1-distilled — kendi reasoning model'in single GPU'da
DeepSeek-R1 full model 671B MoE — birkaç GPU lazım. AMA distilled variants mevcut: R1-Distill-Qwen-7B, R1-Distill-Qwen-14B, R1-Distill-Llama-32B. Quality: o1-mini'ye yakın. Single H100 / RTX 4090'da çalışıyor. Production reasoning AI artık herkes için. 65 dakika sonra: R1-distilled deploy, Türkçe math reasoning demo, prompt patterns'ı öğrenmiş olacaksın.

Ders Haritası (8 Bölüm)#

  1. R1 distilled variants — 7B/14B/32B karşılaştırma
  2. Hardware requirements — VRAM math
  3. vLLM deployment — production-ready
  4. Reasoning prompt patterns — when CoT, when direct
  5. Türkçe math demo — sayısal problem çözümü
  6. Visible reasoning tokens — UX considerations
  7. Cost analysis — self-host vs API
  8. Limitation + future — küçük distilled vs full R1
python
# DeepSeek-R1-Distilled-Qwen-32B vLLM deployment
 
# 1. Install vLLM
# pip install vllm
 
# 2. Server (CLI)
# python -m vllm.entrypoints.openai.api_server \
# --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
# --tensor-parallel-size 1 \
# --max-model-len 32768 \
# --gpu-memory-utilization 0.9 \
# --port 8000
 
# Hardware needs:
# - R1-Distill-7B: 24 GB VRAM (RTX 4090, A100 40GB)
# - R1-Distill-14B: 40 GB VRAM (A100 40GB, H100 80GB)
# - R1-Distill-32B: 80 GB VRAM (H100 80GB) or 2x A100 40GB
 
# 3. Use as OpenAI API
from openai import OpenAI
 
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="any",
)
 
# Türkçe math reasoning example
messages = [
{"role": "user", "content": """
Bir hızlı tren İstanbul'dan Ankara'ya saatte 200 km hızla gidiyor.
Aynı anda Ankara'dan İstanbul'a saatte 150 km hızla başka bir tren çıkıyor.
İstanbul-Ankara arası 450 km. Trenler ne zaman karşılaşır?
"""},
]
 
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
messages=messages,
max_tokens=4096,
temperature=0.6, # R1 paper recommendation
)
 
# Response includes visible reasoning
print(response.choices[0].message.content)
# Example output:
# <think>
# İki tren karşı yönde geliyor. Toplam hız = 200 + 150 = 350 km/saat.
# Aralarındaki mesafe = 450 km.
# Karşılaşma zamanı = 450 / 350 = 9/7 saat ≈ 1.286 saat ≈ 1 saat 17 dakika.
# </think>
#
# Trenler yaklaşık **1 saat 17 dakika** sonra karşılaşır.
DeepSeek-R1-Distilled Türkçe reasoning deployment

4-7. Prompt Patterns + Cost#

4.1 Reasoning prompt patterns#

R1 best with:
  • 'Solve step-by-step'
  • Math problems explicit
  • 'Think carefully'
  • Multi-step reasoning queries
R1 overkill for:
  • 'What's 2+2?'
  • 'Translate to English'
  • 'Summarize this article'
For simple tasks: use Llama-3-Instruct (faster, cheaper).

4.2 Türkçe reasoning quality#

R1-Distill-32B Türkçe math:
  • Simple word problems: %85+ accuracy
  • AIME-level: %70+
  • Türkçe-specific math vocabulary: occasional confusion
Mixing: reasoning Türkçe + sometimes English mixed (typical of distilled models).

4.3 Cost analysis#

Self-host R1-Distill-32B:
  • H100 spot $2.5/hour
  • Throughput ~40 token/sec (reasoning intense)
  • Reasoning + output ~5000 tokens/query average
  • 125 sec/query = 2 minutes
  • Cost per query: ~$0.087
OpenAI o1:
  • 15/1Minput+15/1M input + 60/1M output
  • 500 input + 5000 output = $0.30/query
Self-host 3-4x cheaper per query. For high-volume reasoning workload, self-host justified.

4.4 Latency consideration#

Reasoning models slow:
  • GPT-4o: 1-2 sec response
  • o1: 10-60 sec (sometimes minutes)
  • R1-Distill-32B self-host: 30-120 sec
UI considerations: progress indicators, streaming reasoning tokens.
🎉 Modül 17 Tamamlandı — Reasoning Models
2 ders boyunca: reasoning revolution (o1 + R1), test-time compute scaling, RL training. DeepSeek-R1-distilled (7B/14B/32B) self-host accessible. Türkçe math reasoning %85+ accuracy. Cost: 3-4x cheaper self-host vs OpenAI. Modül 17 envanteri: 2 ders, 140 dk. Genel müfredat: 18 modül, 89 ders, ~97 saat. Sıradaki: Modül 18 — Mixture of Experts.

Modül 17 Envanteri (Tamamlandı)#

#DersSüre
17.1Reasoning Devrim o1 + R175 dk
17.2DeepSeek-R1 Self-Host Türkçe65 dk
Toplam2 ders140 dk (~2.3 saat)

Sık Sorulan Sorular

Full R1 671B MoE > Distill 32B > Distill 14B > Distill 7B. Linear quality. Distill 32B o1-mini düzeyinde, full R1 o1 düzeyinde. Distill 7B/14B hobbyist için.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler