İçeriğe geç

Liger Kernel Tour: RMSNorm + SwiGLU + GeGLU + Fused Linear+CE — Source Reading

Liger Kernel (LinkedIn, 2024) — production-grade Triton kernel suite. Fused RMSNorm + dropout, SwiGLU + GeGLU + GeLU, RoPE rotary, fused linear+CE (memory tasarrufu), CrossEntropy chunked. RTX 4090'da Llama 3.1 8B FT throughput +%20, memory %30 azalma. Source-code okuyarak ne öğreneceğin: production Triton patterns.

Şükrü Yusuf KAYA
26 dakikalık okuma
İleri
Liger Kernel Tour: RMSNorm + SwiGLU + GeGLU + Fused Linear+CE — Source Reading
python
# === Liger Kernel kullanım — Llama 3.1 8B FT ===
from liger_kernel.transformers import apply_liger_kernel_to_llama
from transformers import AutoModelForCausalLM
 
# Apply Liger replacements
apply_liger_kernel_to_llama(
rope=True, # Fused RoPE
cross_entropy=False, # Skip if not using lm_head CE
fused_linear_cross_entropy=True,# Fused linear + CE (huge memory save)
rms_norm=True, # Fused RMSNorm
swiglu=True, # Fused SwiGLU
)
 
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B-Instruct",
torch_dtype="bfloat16",
attn_implementation="flash_attention_2",
)
# Now model uses Liger kernels everywhere — drop-in replacement
 
# Bench (RTX 4090 + Llama 3.1 8B QLoRA):
# Vanilla HF + FA2: 1.78 step/s, peak 13.4 GB
# + Liger Kernel: 2.14 step/s (+%20), peak 9.5 GB (-%29)
# + Unsloth (everything): 3.10 step/s, peak 11.8 GB
Liger Kernel + Llama 8B FT
✅ Teslim
  1. Liger Kernel kur. 2) Llama 8B FT'inde Liger açık vs kapalı bench. 3) Sonraki ders: 13.5 — PagedAttention (vLLM).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler