# SGLang RadixAttention: Structured Output + JSON-Mode + Multi-Branch Caching

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-sglang-radixattention-structured-output
> Updated: 2026-05-14T14:43:01.039Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** SGLang (Zheng et al. 2024) — alternative competitor to vLLM. RadixAttention: prefix cache organized in Trie/Radix tree → multi-branch sharing. Constrained decoding (regex, JSON schema), native structured output, optimized for agent workflows. Llama 3.1 8B SGLang serving + JSON-only response on RTX 4090.

