Adım 1: Context Taxonomisi ve Breakpoint Planlaması

Name: Adım 1: Context Taxonomisi ve Breakpoint Planlaması
Author: Şükrü Yusuf KAYA

Final projenin context yapısını tasarla. Hangi katman ne kadar token, hangi cache TTL, breakpoint sayısı?

Şükrü Yusuf KAYA

12 dakikalık okuma

14.05.2026

İleri

Adım 1: Context Taxonomisi

Bir asistan implement etmeden önce planla. Bu adım sonunda elinde:

6 katman tablosu (token boyutu, değişim hızı, TTL)
Breakpoint allocation (4 cache_control nereye)
Cache hit rate beklentisi
Cost estimation

Katman Tablosu#

Katman	İçerik	Token	Değişim	Cache TTL
System	Persona, kurallar, tone	1K	Haftalık	5m
Documentation	Software docs (200K)	200K	Aylık	1h
Tool defs	8 tool, sample inputs	4K	3 ayda 1	1h
Conversation history	Multi-turn	0-30K	Her turn	5m
User query	Aktif sorgu	0.1K	Anlık	Cache yok
Output	Citation + cevap	0.5K	—	—

Toplam input ≤ ~235K. 200K Claude için tampon dahil sığar.

Eğer 200K geçerse:

Dokümanı 2 parçaya böl (örn. v1 + v2)
Sadece relevant bölümü context'e al (RAG hybrid, Modül 7)
Daha agresif compression (summarization)

Breakpoint Allocation (4 BP)#

python

# 4 breakpoint mimarisi
system_blocks = [
    # Breakpoint 1: Documentation (200K, 1h TTL) — en stable
    {
        "type": "text",
        "text": DOCS,
        "cache_control": {"type": "ephemeral", "ttl": "1h"},
    },
    # Breakpoint 3: System instructions (1K, 5m TTL)
    {
        "type": "text",
        "text": SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral", "ttl": "5m"},
    },
]
 
tools = [
    *TOOLS[:-1],
    # Breakpoint 2: Tools (4K, 1h TTL)
    {
        **TOOLS[-1],
        "cache_control": {"type": "ephemeral", "ttl": "1h"},
    },
]
 
# Conversation history: son turn'e breakpoint
def make_messages(conversation, new_query):
    if conversation:
        last = conversation[-1]
        history_with_cache = [
            *conversation[:-1],
            {
                **last,
                "content": [{
                    "type": "text",
                    "text": last["content"],
                    # Breakpoint 4: History tail (5m TTL)
                    "cache_control": {"type": "ephemeral", "ttl": "5m"},
                }]
            }
        ]
    else:
        history_with_cache = []
 
    return [*history_with_cache, {"role": "user", "content": new_query}]

4-breakpoint mimari plan

Cache Hit Rate Beklentisi#

İlk Sorgu (Cold Start)#

Cache miss her şey
Doc + tools + system write
~205K token write
Cost: ~$0.85
Latency: ~25-30s

Çözüm: Cache warming (Modül 11)

Sonraki Sorgular#

Doc + tools cache hit (204K read)
System + history hit
Yeni: query + last assistant message
Cost: ~$0.025
Latency: ~1.5-2s
Hit rate: ~%95+

Aylık Cost Estimation#

python

# 50K sorgu/ay
QUERIES = 50_000
STATIC_TOKEN = 205_000  # docs + tools + system
DYNAMIC_TOKEN = 500     # user query + last assistant
OUTPUT_TOKEN = 800      # cevap + citations
HIT_RATE = 0.95
 
# Anthropic Sonnet 4.6
cost = (
    QUERIES * (1 - HIT_RATE) * STATIC_TOKEN / 1e6 * 3.75   # cache write (5m)
    + QUERIES * HIT_RATE * STATIC_TOKEN / 1e6 * 0.30       # cache read
    + QUERIES * DYNAMIC_TOKEN / 1e6 * 3.0                  # fresh input
    + QUERIES * OUTPUT_TOKEN / 1e6 * 15.0                  # output
)
print(f"Aylık: ${cost:,.2f} → {cost*33.5:,.2f} TL")
print(f"Sorgu başına: ${cost/QUERIES:.4f}")
 
# Beklenen çıktı:
# Aylık: ~$2,400-2,500 → ~80K-85K TL
# Sorgu başına: ~$0.048

Aylık cost projection

Hedef Karşılanır

Beklenen: ~ $0.05/sorgu,$ 2.5K/ay. Project requirement karşılandı. Hit rate yüksek olursa $1.5K/ay'a kadar iner.

✓ Pekiştir#

Bir Sonraki Derste#

Adım 2: Implementation + monitoring.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

1. Temeller — Context Penceresi Ekonomisi