4 breakpoint'in sırası önemli mi?

Evet. Cache hierarchical olduğu için **statik olan üstte, dinamik olan altta** olmalı. system (en stabil) → tools → history → user query (en değişken). Aksi takdirde her dinamik değişiklikte statikler de invalide olur.

Cache breakpoint'i kaldırırsam ne olur?

Sonraki istekte o noktada cache yok, fresh input gibi işlenir. Yan etki: ondan sonra gelen cached breakpoint'ler de etkilenir çünkü prefix değişti. Cache_control'leri ekleme/çıkarma yaparken dikkatli ol.

Anthropic Prompt Caching: cache_control ve Breakpoint'ler

Name: Anthropic Prompt Caching: cache_control ve Breakpoint'ler
Author: Şükrü Yusuf KAYA

Anthropic'in caching mekanizması: cache_control breakpoint'leri, ephemeral TTL, 4 breakpoint limiti. Bu derste API yapısını, TTL stratejilerini ve telemetry'yi öğreneceksin.

Şükrü Yusuf KAYA

16 min read

5/14/2026

Intermediate

Anthropic Prompt Caching: API'nin Tam Mantığı

Modül 1+2'de neden caching var, nasıl çalışır — gördük. Modül 3'te artık hangi API'leri kullanacağımıza geçiyoruz.

İlk durak: Anthropic. Bana göre üç sağlayıcının en açık ve en kontrol edilebilir caching API'sine sahip. Bu derste cache_control
mekanizmasının her detayını çıkartacağız.

Anthropic Caching Felsefesi: "Explicit Breakpoint"#

OpenAI'da caching implicit — sen bir şey yapmıyorsun, prefix tekrarlanırsa otomatik cache hit. Anthropic'te ise explicit — "buradan öncesi cache'lenebilir" diye işaretliyorsun.

Bu daha çok iş gibi görünüyor ama büyük avantajları var:

Hangi katmanın cache'lendiğini sen seçiyorsun
TTL'i sen seçiyorsun (5dk veya 1saat)
Hangi sırayla cache'leneceğini sen kontrol ediyorsun
Telemetri'yi detaylı alıyorsun (cache_creation vs cache_read)

Kısaca: kontrol = sende.

Hatırlatma

Modül 1 Ders 5'te ilk lab'imizde tek bir cache_control kullandık. Şimdi 4 breakpoint mimarisine geçiyoruz — production sistemlerde standart pattern.

API Yapısı: Mesaj Yapısı#

Anthropic'te cache 4 farklı katmanda kullanılabilir:

system
— System prompt
tools
— Tool definitions
messages
— Conversation history içindeki content blokları
Doküman bloğu içinde —
{"type": "document", ...}
veya
{"type": "text", ...}

Her birinin son content block'una

cache_control

ekleyerek "buraya kadar olanı cache'le" diyorsun.

En Basit Örnek: Tek Breakpoint#

python

import anthropic
 
client = anthropic.Anthropic()
 
LONG_DOC = "..." * 10000  # ~1500+ token olmalı
 
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LONG_DOC,
            "cache_control": {"type": "ephemeral"},  # ← Sihirli satır
        }
    ],
    messages=[{"role": "user", "content": "Belgenin özeti nedir?"}]
)
 
# Telemetry
print(f"Cache creation: {response.usage.cache_creation_input_tokens}")
print(f"Cache read:     {response.usage.cache_read_input_tokens}")
print(f"Fresh input:    {response.usage.input_tokens}")

Tek breakpoint — system prompt'u cache'liyoruz

Telemetri'yi Çözmek#

Field	Anlamı	Fiyat (Sonnet 4.6)
`cache_creation_input_tokens`	İlk yazımda cache'e gönderilen	$3.75/M
`cache_read_input_tokens`	Cache hit ile gelen	$0.30/M
`input_tokens`	Hiç cache'lenmemiş, fresh	$3.00/M
`output_tokens`	Generation	$15/M

İlk istek: cache_creation > 0, cache_read = 0 Sonraki istekler (5dk içinde): cache_creation = 0, cache_read > 0 TTL bittikten sonra: cache_creation > 0 yeniden (cache hard reset)

4 Breakpoint Mimarisi (Production Standard)#

Tek breakpoint çok temel. Production'da 4 breakpoint kullanılır — her biri farklı değişim hızına sahip katmanlar için:

Katmanlar (Değişim Hızı)#

Statik bilgi — günlerde değişmez (şirket dokümanı, ürün kataloğu)
Tool definitions — haftada belki değişir (function schemas, MCP tools)
System prompt — haftada değişir (talimatlar, persona)
Conversation history — her turn değişir (büyüyen mesaj listesi)

Cache Strateji#

Statik bilgi → 1h TTL (uzun yaşar)
Tool defs → 1h TTL (nadir değişir)
System prompt → 5m TTL (orta sıklık)
History → 5m TTL (her turn yeni breakpoint)

Her katman için ayrı cache_control = 4 breakpoint

4 Breakpoint Limiti

Anthropic limit: Bir mesajda maksimum 4 cache_control breakpoint. Aşarsan API hata verir. 4 katmanlı mimari tam bu limitin sınırında.

Full Production Örnek: 4 Breakpoint#

python

import anthropic
 
client = anthropic.Anthropic()
 
# Katman 1: Statik bilgi (en uzun yaşamalı — 1 saat)
COMPANY_DOCS = open("company_handbook.txt").read()
 
# Katman 2: Tool definitions (uzun yaşamalı — 1 saat)
TOOLS = [
    {
        "name": "search_products",
        "description": "Şirket ürün katalogunda arama yapar.",
        "input_schema": {...},
        # Aşağıda son tool'a cache_control eklenecek
    },
    # ... ek tools
]
 
# Katman 3: System prompt (orta — 5 dakika)
SYSTEM_INSTRUCTIONS = """Sen müşteri destek asistanısın. Türkçe konuş.
Talimatlar: ..."""
 
# Katman 4: Conversation history (5 dakika)
conversation = [
    {"role": "user", "content": "Merhaba"},
    {"role": "assistant", "content": "Merhaba, nasıl yardımcı olabilirim?"},
    {"role": "user", "content": "Ürün X hakkında bilgi"},
    {"role": "assistant", "content": "Ürün X şu özelliklere sahip..."},
]
 
# Şimdi cache_control'ları yerleştir
system_blocks = [
    {
        "type": "text",
        "text": COMPANY_DOCS,
        "cache_control": {"type": "ephemeral", "ttl": "1h"},  # ← Breakpoint 1
    },
    {
        "type": "text",
        "text": SYSTEM_INSTRUCTIONS,
        "cache_control": {"type": "ephemeral", "ttl": "5m"},  # ← Breakpoint 3
    },
]
 
tools = [
    *TOOLS[:-1],
    {
        **TOOLS[-1],
        "cache_control": {"type": "ephemeral", "ttl": "1h"},  # ← Breakpoint 2
    },
]
 
# Conversation history'nin SON mesajına cache_control
# (yeni user query'den önceki tüm history cache'lensin)
messages_with_cache = [
    *conversation[:-1],
    {
        **conversation[-1],
        "content": [
            {
                "type": "text",
                "text": conversation[-1]["content"],
                "cache_control": {"type": "ephemeral", "ttl": "5m"},  # ← Breakpoint 4
            }
        ],
    },
    {"role": "user", "content": "Stok durumu ne?"},  # Dinamik query — cache YOK
]
 
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=system_blocks,
    tools=tools,
    messages=messages_with_cache,
)
 
u = response.usage
print(f"Cache create: {u.cache_creation_input_tokens:>6}")
print(f"Cache read:   {u.cache_read_input_tokens:>6}")
print(f"Fresh input:  {u.input_tokens:>6}")
print(f"Output:       {u.output_tokens:>6}")

Production-grade 4-breakpoint architecture

TTL Seçimi: 5m vs 1h#

Anthropic iki TTL sunuyor:

TTL	Cache Write Fiyatı	Cache Read Fiyatı	Ne Zaman?
5m (default)	1.25× input	0.1× input	Yoğun trafik, sık tekrar
1h (beta)	2× input	0.1× input	Seyrek ama tekrar eden sorgular

Break-even hesabı: 5m TTL'i ne zaman 1h'a değiştirmeli?

1h cache write 2× input, 5m cache write 1.25× input. Aradaki fark 0.75× input. Eğer 5dk'da cache miss olup yeniden yazıyorsan, 1h tek seferlik yazımla bu maliyeti karşılayabilirsin.

Yani 1 saatte 5m cache'i 3+ kez yeniden yazıyorsan, doğrudan 1h yapmak ucuz.

python

def best_ttl(refresh_count_per_hour: int) -> str:
    """5dk cache'i saat içinde kaç kez yenilemek gerekecekse, kıyasla."""
    cost_5m = refresh_count_per_hour * 1.25  # her refresh 1.25× input
    cost_1h = 1 * 2.0                         # bir kez 2× input
    if cost_5m > cost_1h:
        return "1h cache daha ucuz"
    elif cost_5m < cost_1h:
        return "5m cache daha ucuz"
    else:
        return "Eşit — kuruşa duyarsız"
 
for n in [1, 2, 3, 4, 5, 10]:
    print(f"Saat başına {n:>2} kez refresh → {best_ttl(n)}")

5m vs 1h break-even hesabı

Minimum Token Eşiği#

Anthropic'te cache 1024 token altıysa yazılmaz (Haiku için 2048). Sebep: çok kısa içerikler için cache infrastructure overhead'i kazançtan fazla.

Model	Minimum Cache Token
Claude Opus 4.7	1024
Claude Sonnet 4.6	1024
Claude Haiku 4.5	2048

Pratik: Cache'lemek istediğin içerik bu eşiğin altıysa, cache_control yazsan bile etkisiz — normal input gibi ücretlendirilir.

Cache Hierarchical Lookup#

Önemli bir özellik: Anthropic cache hiyerarşik çalışır. Daha kısa prefix'in cache hit'i, daha uzun prefix'in cache miss'inden bağımsız.

İstek 1: [System (cached)] [Tools (cached)] [History v1 (cached)] [Query A]
İstek 2: [System (cached)] [Tools (cached)] [History v1 (cached)] [Query B]
  → Hepsi cache hit (system, tools, history)

İstek 3: [System (cached)] [Tools (cached)] [History v2 (yeni)] [Query C]
  → System ve tools hit, history miss (yeniden yazılır)

Bu "incremental caching" sayesinde multi-turn conversation'da cache hit rate yüksek kalır. Modül 8'de detaylanacak.

✓ Pekiştir#

Bir Sonraki Derste#

Hands-on: 4 breakpoint mimarisini gerçek Anthropic API ile kuracağız ve %90+ cache hit rate elde edeceğiz. Hesap makineni hazırla.

Frequently Asked Questions

Üç olası neden: (1) İçerik 1024 token altında — cache yazılmaz. (2) Cache TTL geçmiş — yeniden write yapılıyor. (3) Önceki prefix'te tek karakter değişmiş — exact prefix match şart. cache_creation > 0 ise zaten yazıyor, sonraki istekte read görmelisin.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Anthropic Prompt Caching: cache_control ve Breakpoint'ler

Anthropic Prompt Caching: API'nin Tam Mantığı

Anthropic Caching Felsefesi: "Explicit Breakpoint"#

API Yapısı: Mesaj Yapısı#

En Basit Örnek: Tek Breakpoint#

Telemetri'yi Çözmek#

4 Breakpoint Mimarisi (Production Standard)#

Katmanlar (Değişim Hızı)#

Cache Strateji#

Full Production Örnek: 4 Breakpoint#

TTL Seçimi: 5m vs 1h#

Minimum Token Eşiği#

Cache Hierarchical Lookup#

✓ Pekiştir#

Bir Sonraki Derste#

Frequently Asked Questions

cache_control ekledim ama telemetry'de hiç cache yok, neden?

4 breakpoint'in sırası önemli mi?

Cache breakpoint'i kaldırırsam ne olur?

Yorumlar & Soru-Cevap

Related Content

Bu Eğitim Hakkında ve Prompt Caching Neden Önemli?

Token Ekonomisi 101: Input vs Output Cost Asimetrisi

Context Window Evrimi: 4K'dan 1M'a 5 Yılda Ne Oldu?

Subscribe to Newsletter