# PagedAttention (vLLM): Block Table + Copy-on-Write + KV-Cache Fragmentation

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-paged-attention-vllm-internals
> Updated: 2026-05-14T14:42:59.652Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XIII — Custom Kernels & Performance Surgery
**TLDR:** Deep anatomy of vLLM's killer feature PagedAttention: split KV-cache into 16-token blocks, logical→physical block table, copy-on-write (prefix sharing), 0% fragmentation. CUDA implementation snippets, vLLM source reading. Prefix cache hit-rate 50%+ → throughput +60% on RTX 4090.

