# CUDA Graph Capture: Static-Shape Inference Graph + Eliminating Latency Tail

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-cuda-graph-capture
> Updated: 2026-05-14T14:42:59.823Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XIII — Custom Kernels & Performance Surgery
**TLDR:** CUDA Graph — technique to eliminate kernel launch overhead. 'Capture' a compute graph once, then 'replay' — each replay 5-10 µs (vs 30-50 µs kernel launch). Critical for inference latency (especially decode fast-path). vLLM uses it. Requires static shapes.

