Skip to content
Technical GlossaryGenerative AI and LLM

KV Cache

A mechanism that stores previous attention computations to reduce repeated work in autoregressive generation.

KV cache is one of the most fundamental components of LLM inference optimization. Avoiding recomputation of key and value representations for prior tokens yields major speed gains in long generations. However, memory usage grows with context length, so careful resource management is required.