# Key-Value Cache

> Source: https://sukruyusufkaya.com/en/glossary/key-value-cache
> Updated: 2026-05-13T19:59:34.033Z
> Type: glossary
> Category: derin-ogrenme
**TLDR:** A mechanism that speeds up autoregressive Transformer inference by storing previous attention representations.

<p>Key-value cache reduces repeated computation during token-by-token generation in large language models. Keys and values computed in earlier steps are stored so that the full past does not have to be recomputed for every new token. This makes a major difference in inference speed and is critical for practical deployment. It is one of the main efficiency mechanisms in long-context generation.</p>