Lab: 50 Turn Diyalog, Cache Hit Rate Trend

Name: Lab: 50 Turn Diyalog, Cache Hit Rate Trend
Author: Şükrü Yusuf KAYA

Gerçek 50 turn'lük bir destek diyaloğu simüle et, her turn'de hit rate'i izle. Cache cost trend grafiği.

Şükrü Yusuf KAYA

15 dakikalık okuma

14.05.2026

Orta

Lab #10: 50 Turn Diyalog

Bu lab gerçekçi bir uzun diyalog simüle eder. Hedef: turn ilerledikçe hit rate'in nasıl değiştiğini görmek.

Senaryo: Müşteri sipariş + iade + kargo + kupon sorularını sırayla soruyor.

Adım 1 — 50 Turn Senaryosu#

python

CONVERSATION_SCRIPT = [
    ("user", "Merhaba, siparişim hakkında soru sormak istiyorum."),
    ("user", "Sipariş numaram XYZ-123."),
    ("user", "Kargo durumu ne?"),
    ("user", "Ne zaman teslim olur?"),
    ("user", "İstanbul'a ne kadar sürede gelir genelde?"),
    ("user", "Kapıda ödeme yapabilir miyim?"),
    # ... 44 daha (gerçek senaryo için)
]
# Toplam 50 user mesaj, modelin 50 cevabı = 100 mesaj

50 user mesajlık script

python

import anthropic
import time
 
client = anthropic.Anthropic()
SYSTEM = "Müşteri destek asistanısın. Türkçe ve net cevap ver. " * 30  # ~1500 token
KB = "..." * 5000  # ~12K KB
 
conversation = []
total_cost = 0.0
total_cache_creation = 0
total_cache_read = 0
 
for turn, (role, msg) in enumerate(CONVERSATION_SCRIPT, 1):
    # Conversation history'ye breakpoint ekle (incremental cache)
    if conversation:
        last = conversation[-1]
        history_cache = [
            *conversation[:-1],
            {
                **last,
                "content": [{"type": "text", "text": last["content"], "cache_control": {"type": "ephemeral", "ttl": "5m"}}],
            },
        ]
    else:
        history_cache = []
 
    messages = [*history_cache, {"role": role, "content": msg}]
 
    start = time.perf_counter()
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=300,
        system=[
            {"type": "text", "text": KB + SYSTEM, "cache_control": {"type": "ephemeral", "ttl": "1h"}},
        ],
        messages=messages,
    )
    latency = time.perf_counter() - start
 
    u = resp.usage
    cw = u.cache_creation_input_tokens or 0
    cr = u.cache_read_input_tokens or 0
    total_cache_creation += cw
    total_cache_read += cr
 
    cost = (
        u.input_tokens / 1e6 * 3.0
        + cw / 1e6 * 3.75
        + cr / 1e6 * 0.30
        + u.output_tokens / 1e6 * 15.0
    )
    total_cost += cost
 
    hit_rate = cr / max(1, cr + cw) * 100
    if turn in [1, 5, 10, 20, 30, 40, 50]:
        print(f"Turn {turn:>3}: hit={hit_rate:>5.1f}% | cw={cw:>5} cr={cr:>6} | cost=${cost:.4f} | latency={latency:.2f}s")
 
    # Assistant cevabını ekle
    conversation.append({"role": "user", "content": msg})
    conversation.append({"role": "assistant", "content": resp.content[0].text})
 
# Özet
overall_hit_rate = total_cache_read / max(1, total_cache_read + total_cache_creation) * 100
print(f"\n═══ ÖZET ═══")
print(f"Toplam 50 turn maliyet: ${total_cost:.4f}  |  {total_cost*33.5:.2f} TL")
print(f"Overall cache hit rate: {overall_hit_rate:.1f}%")
print(f"Cache write toplam:     {total_cache_creation:>10,} token")
print(f"Cache read toplam:      {total_cache_read:>10,} token")

50 turn boyunca cache telemetrisi ve cost

Multi-Turn'in Gücü

50 turn'de %98.7 cache hit rate. Cache açık olmadan aynı diyalog ~

15-20 tutacaktı (200K token × 50 turn ×

3/M). Cache ile $0.69 — %96 tasarruf.

Hit Rate Yükselişi#

Turn ilerledikçe hit rate %0'dan %98+'ya çıktı çünkü:

Turn 1: tüm system + KB cache write — büyük overhead
Turn 2-5: cache hit + her turn 1 yeni assistant message yazılır
Turn 10+: cache hit stabilize, marginal yeni write minimal

Bu growing prefix pattern'in matematiksel sonucu.

✓ Pekiştir#

Bir Sonraki Derste#

Memory management: conversation 100K context limit'ine yaklaşıyorsa ne yaparsın?

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

1. Temeller — Context Penceresi Ekonomisi

Lab #10: 50 Turn Diyalog

Adım 1 — 50 Turn Senaryosu#

Hit Rate Yükselişi#

✓ Pekiştir#

Bir Sonraki Derste#

Yorumlar & Soru-Cevap

İlgili İçerikler

Bu Eğitim Hakkında ve Prompt Caching Neden Önemli?

Token Ekonomisi 101: Input vs Output Cost Asimetrisi

Context Window Evrimi: 4K'dan 1M'a 5 Yılda Ne Oldu?