# vLLM Internals: Continuous Batching + PagedAttention + Prefix Cache

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-vllm-internals-continuous-batching-paged
> Updated: 2026-05-14T14:43:00.866Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** vLLM (Kwon et al. 2023) — gold standard of production LLM serving. Continuous batching: requests added/removed dynamically → GPU idle ends. PagedAttention: KV-cache managed in fixed blocks → 0% fragmentation. Prefix cache: common system prompts not recomputed. Llama 3.1 8B serving on RTX 4090 (175 tok/s batch=1, 920 tok/s batch=16).

