topiccore

Prompt Caching

Cache repeated long context — 50-90% cost + latency reduction.

2 hours2 resources1 prereqs

Anthropic prompt caching and OpenAI prompt caching work automatically (or via cache_control for manual control). Even if the same system prompt + RAG context + examples are sent each request, after the first call they're served from cache.

Pattern:

Static parts first (system, examples, knowledge base)
Variable parts last (user turn)
TTL: 5 min (Claude) — for hot endpoints, always warm

Prerequisites

Context Window

The total tokens the model can hold in one turn. Must-know.

→

Resources(2)

DDocs(2)

Anthropic — Prompt caching

· en

freeofficial

OpenAI — Prompt caching

· en

freeofficial

Constraint Decoding

Long-Context Techniques

Open the full interactive roadmap