Skip to content
Technical GlossaryGenerative AI and LLM

Paged Attention

An attention-management technique that handles KV cache memory more efficiently and improves resource use under multi-request serving.

Paged attention is important for improving LLM serving efficiency, especially under high concurrency. By managing memory in a paged structure, it enables more balanced resource use in long-context and multi-user scenarios. It is a good example of how deeply system engineering and model behavior are intertwined in large-model deployment.