# KV Cache + Paged Attention: Inference Serving Optimization — vLLM Paged Attention and Continuous Batching

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/kv-cache-paged-attention-vllm-continuous-batching
> Updated: 2026-05-13T13:00:27.737Z
> Category: LLM Mühendisliği
> Module: Module 8: Attention Mathematics — The Heart of Transformer
**TLDR:** LLM inference serving optimization: KV cache anatomy (prefill vs decode phases), memory fragmentation problem, paged attention (vLLM 2023 Kwon), continuous batching, dynamic memory allocation. Llama-3 production serving math: throughput, latency trade-offs, multi-tenancy.

