İçeriğe geç

SGLang RadixAttention: Structured Output + JSON-Mode + Multi-Branch Caching

SGLang (Zheng et al. 2024) — vLLM'in alternatif rakibi. RadixAttention: prefix cache'in Trie/Radix tree'de organize edilmiş hali → multi-branch sharing. Constrained decoding (regex, JSON schema), structured output native, agent workflows için optimize. RTX 4090'da Llama 3.1 8B SGLang serving + JSON-only response.

Şükrü Yusuf KAYA
28 dakikalık okuma
İleri
SGLang RadixAttention: Structured Output + JSON-Mode + Multi-Branch Caching

1. SGLang vs vLLM — Hangisi?#

FeaturevLLMSGLang
Continuous batching
Prefix cachinglinearRadix tree (multi-branch)
LoRA hot-swap✅ (newer)
Structured output (JSON schema)xgrammar (Q4 2024)native, faster
Regex decodingyesyes, faster
Agent / chained callsbasicoptimized (forks share cache)
Throughput (single user)175 tok/s (AWQ)165 tok/s (AWQ)
Throughput (batch=16)920 tok/s950 tok/s
Ecosystemen yaygınbüyüyen
Cookbook tercih:
  • Genel chat API → vLLM (ecosystem)
  • Agent workflows + structured output → SGLang
  • JSON-mode kritik → SGLang
python
# === SGLang Structured Output Lab ===
import sglang as sgl
 
# Backend: Llama 3.1 8B
sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))
 
# JSON schema-constrained generation
@sgl.function
def extract_person(s, text):
s += sgl.user(f"Aşağıdaki cümleden kişi bilgilerini JSON olarak çıkar: {text}")
s += sgl.assistant(sgl.gen("output",
regex=r'\\{\\"name\\": \\"[a-zA-ZşıçöüğŞIÇÖÜĞ ]+\\", \\"age\\": \\d+\\}',
max_tokens=100))
 
# Run
result = extract_person.run(text="Ahmet 35 yaşında bir mühendistir.")
import json
data = json.loads(result["output"])
print(data) # {"name": "Ahmet", "age": 35}
 
# Server başlatma
# python -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct \
# --port 30000 --enable-radix-cache
SGLang JSON schema-constrained generation
✅ Teslim
  1. SGLang ile JSON-mode TR entity extraction. 2) Aynı prompt'u vLLM'de constrained ile karşılaştır. 3) Sonraki ders: 15.4 — TGI (Text Generation Inference).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler