Technical GlossaryGenerative AI and LLM
KV Cache
A mechanism that stores previous attention computations to reduce repeated work in autoregressive generation.
KV cache is one of the most fundamental components of LLM inference optimization. Avoiding recomputation of key and value representations for prior tokens yields major speed gains in long generations. However, memory usage grows with context length, so careful resource management is required.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
