Combined Lab: Aynı Uygulama, 3 Provider — A/B/C Comparison

Name: Combined Lab: Aynı Uygulama, 3 Provider — A/B/C Comparison
Author: Şükrü Yusuf KAYA

Bir Türkçe sürüm geliştirici asistanı kuralım. Aynı 30K kod tabanı + 50 soru. Anthropic, OpenAI, Gemini'de cost + latency + accuracy yan yana ölçelim.

Şükrü Yusuf KAYA

20 dakikalık okuma

14.05.2026

İleri

Lab #5: Aynı Uygulama, 3 Provider — A/B/C

Senaryo: Bir Türk geliştiriciye, açık kaynak bir Python projesinde (~30K token kod) yardımcı olan asistan.

Hedef: Aynı uygulamayı Anthropic, OpenAI, Gemini'de implement et + 50 sorgu çek + karşılaştır.

Maliyet: ~$3-5 (üç provider toplamı).

Adım 1 — Ortak Interface Tanımı#

Kod organizasyonu açısından her provider için aynı interface'i kuralım:

python

from abc import ABC, abstractmethod
from dataclasses import dataclass
 
@dataclass
class CallResult:
    provider: str
    answer: str
    input_tokens: int
    cached_tokens: int
    output_tokens: int
    latency_sec: float
    cost_usd: float
 
class CachingAssistant(ABC):
    """Üç provider için ortak interface."""
 
    @abstractmethod
    def setup(self, codebase: str) -> None:
        """Cache veya cache_control hazırla."""
        ...
 
    @abstractmethod
    def ask(self, question: str) -> CallResult:
        """Sorgu yap, sonucu döndür."""
        ...
 
    @abstractmethod
    def teardown(self) -> None:
        """Cache'i temizle (gerekirse)."""
        ...

Soyut interface — her provider implement edecek

Adım 2 — Anthropic Implementation#

python

import anthropic
import time
 
class AnthropicAssistant(CachingAssistant):
    PRICE = {"input": 3.0, "output": 15.0, "cache_write": 3.75, "cache_read": 0.30}
 
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.codebase = ""
 
    def setup(self, codebase: str) -> None:
        self.codebase = codebase
 
    def ask(self, question: str) -> CallResult:
        start = time.perf_counter()
        resp = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=400,
            system=[
                {
                    "type": "text",
                    "text": f"Python kod yardımcısısın.\n\nKod tabanı:\n{self.codebase}",
                    "cache_control": {"type": "ephemeral"},
                }
            ],
            messages=[{"role": "user", "content": question}],
        )
        latency = time.perf_counter() - start
        u = resp.usage
        cw = u.cache_creation_input_tokens or 0
        cr = u.cache_read_input_tokens or 0
        cost = (
            u.input_tokens / 1e6 * self.PRICE["input"]
            + cw / 1e6 * self.PRICE["cache_write"]
            + cr / 1e6 * self.PRICE["cache_read"]
            + u.output_tokens / 1e6 * self.PRICE["output"]
        )
        return CallResult(
            provider="anthropic",
            answer=resp.content[0].text,
            input_tokens=u.input_tokens,
            cached_tokens=cr,
            output_tokens=u.output_tokens,
            latency_sec=latency,
            cost_usd=cost,
        )
 
    def teardown(self) -> None:
        pass  # ephemeral, otomatik

AnthropicAssistant — cache_control ile

Adım 3 — OpenAI Implementation#

python

from openai import OpenAI
 
class OpenAIAssistant(CachingAssistant):
    PRICE = {"input": 2.5, "output": 10.0, "cached_input": 1.25}
 
    def __init__(self):
        self.client = OpenAI()
        self.codebase = ""
 
    def setup(self, codebase: str) -> None:
        self.codebase = codebase
 
    def ask(self, question: str) -> CallResult:
        start = time.perf_counter()
        resp = self.client.chat.completions.create(
            model="gpt-4o",
            max_tokens=400,
            messages=[
                {"role": "system", "content": f"Python kod yardımcısısın.\n\nKod tabanı:\n{self.codebase}"},
                {"role": "user", "content": question},
            ],
        )
        latency = time.perf_counter() - start
        u = resp.usage
        cached = u.prompt_tokens_details.cached_tokens if u.prompt_tokens_details else 0
        fresh = u.prompt_tokens - cached
        cost = (
            fresh / 1e6 * self.PRICE["input"]
            + cached / 1e6 * self.PRICE["cached_input"]
            + u.completion_tokens / 1e6 * self.PRICE["output"]
        )
        return CallResult(
            provider="openai",
            answer=resp.choices[0].message.content,
            input_tokens=fresh,
            cached_tokens=cached,
            output_tokens=u.completion_tokens,
            latency_sec=latency,
            cost_usd=cost,
        )
 
    def teardown(self) -> None:
        pass

OpenAIAssistant — automatic caching

Adım 4 — Gemini Implementation#

python

import google.generativeai as genai
 
class GeminiAssistant(CachingAssistant):
    PRICE = {"input": 1.25, "output": 10.0, "cached": 0.31, "storage_per_hour": 0.31}
 
    def __init__(self):
        self.cache = None
        self.model = None
 
    def setup(self, codebase: str) -> None:
        self.cache = genai.caching.CachedContent.create(
            model="gemini-2.5-pro",
            contents=[{"role": "user", "parts": [{"text": codebase}]}],
            system_instruction="Python kod yardımcısısın.",
            ttl="3600s",
        )
        self.model = genai.GenerativeModel.from_cached_content(self.cache)
 
    def ask(self, question: str) -> CallResult:
        start = time.perf_counter()
        response = self.model.generate_content(question)
        latency = time.perf_counter() - start
        u = response.usage_metadata
        cached = u.cached_content_token_count
        fresh = u.prompt_token_count - cached
        cost = (
            cached / 1e6 * self.PRICE["cached"]
            + fresh / 1e6 * self.PRICE["input"]
            + u.candidates_token_count / 1e6 * self.PRICE["output"]
        )
        return CallResult(
            provider="gemini",
            answer=response.text,
            input_tokens=fresh,
            cached_tokens=cached,
            output_tokens=u.candidates_token_count,
            latency_sec=latency,
            cost_usd=cost,
        )
 
    def teardown(self) -> None:
        if self.cache:
            self.cache.delete()

GeminiAssistant — explicit cache create/delete

Adım 5 — Benchmark Runner#

python

# Yukarıdaki sınıflar var sayalım
 
CODEBASE = """
# Sahte ~30K token Python kod tabanı
def calculate_compound_interest(principal, rate, time):
    \"\"\"Bileşik faiz hesaplar.\"\"\"
    return principal * (1 + rate) ** time
 
class BankAccount:
    def __init__(self, owner, balance=0):
        self.owner = owner
        self.balance = balance
    # ... çok daha fazla sınıf ve fonksiyon
""" * 200  # ~30K token
 
QUESTIONS = [
    "calculate_compound_interest fonksiyonu ne yapıyor?",
    "BankAccount sınıfında balance neden ön ödemeli?",
    "Bu kodun en zayıf güvenlik noktası?",
    # ... 47 soru daha
] * 17
QUESTIONS = QUESTIONS[:50]
 
# Üç provider'ı paralel çalıştırma için sequential
results = {"anthropic": [], "openai": [], "gemini": []}
 
for AssistantClass, name in [
    (AnthropicAssistant, "anthropic"),
    (OpenAIAssistant, "openai"),
    (GeminiAssistant, "gemini"),
]:
    print(f"\n═══ {name.upper()} ═══")
    assistant = AssistantClass()
    assistant.setup(CODEBASE)
 
    for q in QUESTIONS:
        r = assistant.ask(q)
        results[name].append(r)
 
    assistant.teardown()
 
    total_cost = sum(r.cost_usd for r in results[name])
    avg_latency = sum(r.latency_sec for r in results[name]) / len(QUESTIONS)
    total_cached = sum(r.cached_tokens for r in results[name])
    total_input = sum(r.input_tokens for r in results[name])
    hit_rate = total_cached / (total_cached + total_input + 1) * 100
 
    print(f"Toplam cost:   ${total_cost:.4f}  |  {total_cost * 33.5:.2f} TL")
    print(f"Avg latency:   {avg_latency:.2f}sn")
    print(f"Cache hit:     {hit_rate:.1f}%")
 
# Karşılaştırma
print(f"\n╔══════════════════════════════════════════════════════════╗")
print(f"║                  KARŞILAŞTIRMA TABLOSU                     ║")
print(f"╠══════════════════════════════════════════════════════════╣")
print(f"║ Provider    │ Cost (USD) │ Cost (TL) │ Latency │ Hit%   ║")
print(f"╠══════════════════════════════════════════════════════════╣")
for name in ["anthropic", "openai", "gemini"]:
    tc = sum(r.cost_usd for r in results[name])
    al = sum(r.latency_sec for r in results[name]) / 50
    tt = sum(r.cached_tokens for r in results[name])
    tf = sum(r.input_tokens for r in results[name])
    hr = tt / (tt + tf + 1) * 100
    print(f"║ {name:<11}│  ${tc:>8.4f}│{tc*33.5:>10.2f}│ {al:>6.2f}s │ {hr:>5.1f}% ║")
print(f"╚══════════════════════════════════════════════════════════╝")

3 provider'a aynı 50 soruyu sor, sonuçları topla

Sonuçları Yorumlama

Bu özel senaryoda Gemini açık ara önde: %59 ucuz, %16 hızlı. Ama Anthropic'in kod kalitesi (manual review) muhtemelen daha yüksek. Kalite ölçmek subjektive — Modül 11'de "LLM-as-judge" benchmark'ları göreceğiz.

Pratik Karar Mantığı#

Bu lab'in ana çıkarımı şu: provider seçimini sadece fiyatla yapma.

3 boyutlu karar:

Cost — Gemini'nin baştan iyi noktası
Quality — domain'e göre değişir (kod: Claude > GPT > Gemini)
Setup complexity — Anthropic > OpenAI > Gemini (zorluk sırası)

Önerim: pilot'u Anthropic ile başlat, scale ederken cost-sensitive senaryolarda Gemini'ye geçişi düşün.

✓ Pekiştir#

Bir Sonraki Derste#

Modül 3 bitirme sınavı: 10 soru, 3 provider üzerine. %70 ile geçince Modül 4'e ilerle.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

1. Temeller — Context Penceresi Ekonomisi

Combined Lab: Aynı Uygulama, 3 Provider — A/B/C Comparison

Lab #5: Aynı Uygulama, 3 Provider — A/B/C

Adım 1 — Ortak Interface Tanımı#

Adım 2 — Anthropic Implementation#

Adım 3 — OpenAI Implementation#

Adım 4 — Gemini Implementation#

Adım 5 — Benchmark Runner#

Pratik Karar Mantığı#

✓ Pekiştir#

Bir Sonraki Derste#

Yorumlar & Soru-Cevap

İlgili İçerikler

Bu Eğitim Hakkında ve Prompt Caching Neden Önemli?

Token Ekonomisi 101: Input vs Output Cost Asimetrisi

Context Window Evrimi: 4K'dan 1M'a 5 Yılda Ne Oldu?

Bültenime Abone Olun