Liger Kernel Tour: RMSNorm + SwiGLU + GeGLU + Fused Linear+CE — Source Reading

Liger Kernel (LinkedIn, 2024) — production-grade Triton kernel suite. Fused RMSNorm + dropout, SwiGLU + GeGLU + GeLU, RoPE rotary, fused linear+CE (memory savings), CrossEntropy chunked. Llama 3.1 8B FT throughput +20%, memory -30% on RTX 4090. Source reading: production Triton patterns.

Şükrü Yusuf KAYA

26 min read

5/14/2026

Advanced

Liger Kernel Tour: RMSNorm + SwiGLU + GeGLU + Fused Linear+CE — Source Reading

python

# === Liger Kernel kullanım — Llama 3.1 8B FT ===
from liger_kernel.transformers import apply_liger_kernel_to_llama
from transformers import AutoModelForCausalLM
 
# Apply Liger replacements
apply_liger_kernel_to_llama(
    rope=True,                      # Fused RoPE
    cross_entropy=False,            # Skip if not using lm_head CE
    fused_linear_cross_entropy=True,# Fused linear + CE (huge memory save)
    rms_norm=True,                  # Fused RMSNorm
    swiglu=True,                    # Fused SwiGLU
)
 
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    torch_dtype="bfloat16",
    attn_implementation="flash_attention_2",
)
# Now model uses Liger kernels everywhere — drop-in replacement
 
# Bench (RTX 4090 + Llama 3.1 8B QLoRA):
# Vanilla HF + FA2:        1.78 step/s, peak 13.4 GB
# + Liger Kernel:          2.14 step/s (+%20), peak 9.5 GB (-%29)
# + Unsloth (everything):  3.10 step/s, peak 11.8 GB

Liger Kernel + Llama 8B FT

✅ Teslim

Liger Kernel kur. 2) Llama 8B FT'inde Liger açık vs kapalı bench. 3) Sonraki ders: 13.5 — PagedAttention (vLLM).

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Liger Kernel Tour: RMSNorm + SwiGLU + GeGLU + Fused Linear+CE — Source Reading

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter