Skip to content
Back to full roadmap
topiccore

Prompt Caching

Cache repeated long context — 50-90% cost + latency reduction.

2 hours2 resources1 prereqs

Anthropic prompt caching and OpenAI prompt caching work automatically (or via cache_control for manual control). Even if the same system prompt + RAG context + examples are sent each request, after the first call they're served from cache.

Pattern:

  • Static parts first (system, examples, knowledge base)
  • Variable parts last (user turn)
  • TTL: 5 min (Claude) — for hot endpoints, always warm

Prerequisites

Resources(2)