Lab: 10 Adımlı Agent'ın Cost'unu %85 Düşürme

Name: Lab: 10 Adımlı Agent'ın Cost'unu %85 Düşürme
Author: Şükrü Yusuf KAYA

Bir e-ticaret 'sipariş asistanı' agent'ı 10 adımda görev tamamlıyor. Caching açık vs kapalı maliyet karşılaştırması.

Şükrü Yusuf KAYA

16 min read

5/14/2026

Advanced

Lab #11: 10-Step Agent Caching

Senaryo: "Bir laptop araştır ve sipariş ver" agent'ı. 10 tool call ile tamamlanan bir görev.

Hedef: Cache ile %85+ tasarruf göster.

Agent Mantığı#

python

AGENT_TASK = "Macbook Air M4 16GB araştır, en uygun olanı bul, sipariş ver."
 
EXPECTED_STEPS = [
    "search_products('macbook air m4 16gb')",
    "get_product_details('SKU-MAC-001')",
    "get_product_details('SKU-MAC-002')",  # karşılaştırma
    "compare_prices(['SKU-MAC-001', 'SKU-MAC-002'])",
    "check_stock('SKU-MAC-001')",
    "calculate_shipping('İstanbul', 2.5)",
    "apply_coupon('cart_xyz', 'WELCOME10')",
    "create_order(...)",
    "send_confirmation_email(...)",
    "log_interaction(...)",
]

10-step agent task

python

import anthropic
import time
 
client = anthropic.Anthropic()
SYSTEM = "Sen bir e-ticaret asistanısın. Tool'ları kullan."
TOOLS = [...]  # 10 tool tanımı, ~3K token
 
def run_agent(use_cache: bool):
    """10 adımlı agent'ı çalıştır."""
    conversation = [{"role": "user", "content": AGENT_TASK}]
    total_cost = 0.0
    total_latency = 0.0
 
    for step in range(1, 11):
        # System + tools cache
        if use_cache:
            system_blocks = [{"type": "text", "text": SYSTEM, "cache_control": {"type": "ephemeral", "ttl": "1h"}}]
            tools_with_cache = [*TOOLS[:-1], {**TOOLS[-1], "cache_control": {"type": "ephemeral", "ttl": "1h"}}]
            # Son tool_result'a breakpoint
            if conversation and isinstance(conversation[-1].get("content"), list):
                last_content = conversation[-1]["content"]
                if last_content and last_content[-1].get("type") == "tool_result":
                    last_content[-1]["cache_control"] = {"type": "ephemeral", "ttl": "5m"}
        else:
            system_blocks = SYSTEM  # plain string, cache yok
            tools_with_cache = TOOLS
 
        start = time.perf_counter()
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=300,
            system=system_blocks,
            tools=tools_with_cache,
            messages=conversation,
        )
        latency = time.perf_counter() - start
        total_latency += latency
 
        u = resp.usage
        cost = (
            u.input_tokens / 1e6 * 3.0
            + (u.cache_creation_input_tokens or 0) / 1e6 * 3.75
            + (u.cache_read_input_tokens or 0) / 1e6 * 0.30
            + u.output_tokens / 1e6 * 15.0
        )
        total_cost += cost
 
        # Tool call ve simulated result ekle
        conversation.append({"role": "assistant", "content": resp.content})
        conversation.append({
            "role": "user",
            "content": [{"type": "tool_result", "tool_use_id": "...", "content": "simulated result"}]
        })
 
    return total_cost, total_latency
 
print("═══ Cache OFF ═══")
cost_off, lat_off = run_agent(use_cache=False)
print(f"Cost: ${cost_off:.4f}  |  Latency: {lat_off:.2f}s")
 
print("\n═══ Cache ON ═══")
cost_on, lat_on = run_agent(use_cache=True)
print(f"Cost: ${cost_on:.4f}  |  Latency: {lat_on:.2f}s")
 
savings = (cost_off - cost_on) / cost_off * 100
print(f"\nTasarruf: {savings:.1f}%")

Cache OFF vs ON karşılaştırma

Sonuç

%85.8 tasarruf, %33 hızlanma. Bu agent'lar için cache'in alma değerinin somut kanıtı. Production'a kesinlikle cache açık deploy edilir.

Optimizasyon İpuçları#

Tool sonuçlarını filtrele — gereksiz uzun result'ları truncate et veya summary yap
Skill kullanımı — Skills paterni ile system+tools cache'i daha yoğun
Parallel tool calls — Anthropic 'parallel_tool_calls' parametresi (bir sorguda birden çok tool)
Result caching DB — aynı tool argümanları varsa external cache (Redis) ile re-execution'dan kaç

✓ Pekiştir#

Bir Sonraki Derste#

Multi-agent / orchestrator caching — birden çok agent shared context'i nasıl paylaşır?

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Pillar topics this article maps to

Pillar Topic

Agentic AI and Autonomous Systems

Agentic AI is the architecture in which a large language model — instead of producing a single answer — autonomously completes multi-step tasks by combining planning, tool use, memory and feedback loops.

Lab: 10 Adımlı Agent'ın Cost'unu %85 Düşürme

Lab #11: 10-Step Agent Caching

Agent Mantığı#

Optimizasyon İpuçları#

✓ Pekiştir#

Bir Sonraki Derste#

Yorumlar & Soru-Cevap

Related Content

Bu Eğitim Hakkında ve Prompt Caching Neden Önemli?

Token Ekonomisi 101: Input vs Output Cost Asimetrisi

Context Window Evrimi: 4K'dan 1M'a 5 Yılda Ne Oldu?

Pillar topics this article maps to

Agentic AI and Autonomous Systems

Subscribe to Newsletter