Glossary Library

Technical GlossaryGenerative AI and LLM

Paged Attention

An attention-management technique that handles KV cache memory more efficiently and improves resource use under multi-request serving.

Paged attention is important for improving LLM serving efficiency, especially under high concurrency. By managing memory in a paged structure, it enables more balanced resource use in long-context and multi-user scenarios. It is a good example of how deeply system engineering and model behavior are intertwined in large-model deployment.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

yapay-zeka-temelleri

Generative AI

A class of AI systems capable of generating new content such as text, images, audio, video, or code.

Glossary Cover

veri-muhendisligi-ve-ai-altyapisi

Online Feature Store

A feature store layer optimized for low-latency feature serving at live prediction time.

Glossary Cover

veri-muhendisligi-ve-ai-altyapisi

Real-Time Feature Computation

An approach in which feature values are computed close to prediction time instead of being fully precomputed.