Back to full roadmap
topiccore
Prompt Caching
Cache repeated long context — 50-90% cost + latency reduction.
2 hours2 resources1 prereqs
Anthropic prompt caching and OpenAI prompt caching work automatically (or via cache_control for manual control). Even if the same system prompt + RAG context + examples are sent each request, after the first call they're served from cache.
Pattern:
- Static parts first (system, examples, knowledge base)
- Variable parts last (user turn)
- TTL: 5 min (Claude) — for hot endpoints, always warm