Hot/Cold Cache Pattern: Production Hybrid Mimarisi

Frequently-retrieved doc'ları cache, long-tail için RAG. Bu pattern Perplexity, Notion AI'ın temel mimarisi. Implementasyon detayları.

Şükrü Yusuf KAYA

14 min read

6/23/2026

Advanced

Hot/Cold Pattern: Production Hybrid#

Yaygın gerçek: 20/80 kuralı. Bilgi tabanının %20'si sorguların %80'inde kullanılıyor.

Pattern: Hot 20%'yi cache, cold 80%'i RAG'da bırak.

Hot Doc Tespiti#

Hangi doc'lar "hot"?

Hit count — son 30 günde >%50 sorguda retrieve edildi
Recency — son hafta sürekli retrieve
Importance — manuel olarak "always include" işareti

def identify_hot_docs(threshold_hit_rate=0.30, days=30):
    """Son N günde >%X sorguda retrieve edilenleri bul."""
    total_queries = count_queries_last_n_days(days)
    hot = []
    for doc_id, hit_count in retrieve_stats(days):
        hit_rate = hit_count / total_queries
        if hit_rate > threshold_hit_rate:
            hot.append(doc_id)
    return hot[:50]  # max 50 doc, context bütçesi için

Hot Cache Yapısı#

python

# Periyodik (saatlik veya günlük) hot cache rebuild
def rebuild_hot_cache():
    hot_docs = identify_hot_docs()
    hot_text = "\n\n---\n\n".join([
        f"# Doc: {d.title}\n\n{d.content}"
        for d in hot_docs
    ])
    # Total: ~30K-50K token
    return hot_text
 
HOT_CACHE_BLOCK = rebuild_hot_cache()  # global, periyodik update
 
def hybrid_query(user_query: str):
    # Retrieve top-5, hot olanları skip
    retrieved = vector_db.search(
        user_query,
        k=5,
        exclude_ids=set(d.id for d in get_hot_docs())
    )
 
    return client.messages.create(
        system=[
            {"text": GENERAL_KB, "cache_control": {"ttl": "1h"}},
            {"text": HOT_CACHE_BLOCK, "cache_control": {"ttl": "1h"}},  # ← hot doc'lar
        ],
        messages=[
            {"role": "user", "content": (
                f"# Ek belgeler (sorgu spesifik)\n\n"
                + "\n\n".join(retrieved)
                + f"\n\n# Soru\n\n{user_query}"
            )}
        ],
    )

Hot/cold mimari implementasyonu

Hot Cache Rotation Sıklığı#

Trade-off:

Çok sık update (saatlik) → cache miss sık, yeniden write maliyetli
Çok seyrek update (haftalık) → değişen access pattern'i kaçırır

Pratik: Günlük rotation. Gece 03:00'te yeniden hesapla, cache'i fresh ile başla.

Real-World Examples#

Perplexity yaklaşımı (tahmin, kapalı kaynak):

Hot: En çok aranan konular (haberler, popular Wikipedia entry'leri) — saatlik refresh
Cold: Long-tail web pages, deep links → retrieval
Quote/citation için her zaman cold retrieval (gerçek-zamanlı)

Pareto in Caching

Hot/cold pattern, "Pareto principle"in (80/20) caching'e uygulanması. Çoğu production hybrid sistem bu paterndedir.

✓ Pekiştir#

Bir Sonraki Derste#

Pratik: 50 doc'luk knowledge base, saf RAG vs hybrid karşılaştırma.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Hot/Cold Cache Pattern: Production Hybrid Mimarisi

Hot/Cold Pattern: Production Hybrid#

Hot Doc Tespiti#

Hot Cache Yapısı#

Hot Cache Rotation Sıklığı#

Real-World Examples#

✓ Pekiştir#

Bir Sonraki Derste#

Yorumlar & Soru-Cevap

Related Content

Bu Eğitim Hakkında ve Prompt Caching Neden Önemli?

Token Ekonomisi 101: Input vs Output Cost Asimetrisi

Context Window Evrimi: 4K'dan 1M'a 5 Yılda Ne Oldu?

Subscribe to Newsletter