Skip to content

torch.compile + Inductor: Reduce-Overhead + Dynamic Shapes + Recompile Watcher

PyTorch 2.x's flagship feature: torch.compile. Inductor backend (Triton kernel generation), 3 modes (default, reduce-overhead, max-autotune), dynamic shapes (recompile watcher), CUDA graphs, integration into FT training pipeline. Llama 3.1 8B FT throughput +15% on RTX 4090.

Şükrü Yusuf KAYA
26 min read
Advanced
torch.compile + Inductor: Reduce-Overhead + Dynamic Shapes + Recompile Watcher
python
# === torch.compile training entegrasyonu ===
import torch
from transformers import AutoModelForCausalLM
 
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B-Instruct",
torch_dtype="bfloat16",
attn_implementation="flash_attention_2",
).cuda()
 
# Compile model
# mode options:
# "default" — safe, balanced
# "reduce-overhead" — CUDA graphs, en hızlı stable shapes için
# "max-autotune" — agresif autotune, ilk warmup uzun
model = torch.compile(
model,
mode="reduce-overhead",
fullgraph=False, # False — partial graph (safer)
dynamic=True, # Variable shapes (seq length)
)
 
# Training loop unchanged
for batch in loader:
loss = model(**batch).loss
loss.backward()
...
 
# Bench:
# Vanilla: 1.78 step/s
# + torch.compile default: 1.92 step/s (+%8)
# + reduce-overhead: 2.04 step/s (+%15)
torch.compile + training

1. Recompile Watcher#

torch.compile
shape değiştiğinde recompile yapar (overhead). Cookbook'un kuralı:
import torch._dynamo torch._dynamo.config.suppress_errors = True torch._dynamo.config.capture_scalar_outputs = True torch._dynamo.config.cache_size_limit = 64 # default 8, FT için yetmez # Log recompile events torch._dynamo.config.verbose = True
FT'de yaygın recompile sebepleri:
  • Sequence length değişimi (packing on/off)
  • Batch size dinamik
  • Grad-ckpt on/off mid-training
Çözüm: Sabit shapes (
max_seq_length=4096
,
batch_size=2
fixed) → recompile 0.
✅ Teslim
  1. Llama 8B FT'inde compile aç/kapat throughput karşılaştır. 2) Recompile log analiz. 3) Sonraki ders: 13.7 — CUDA Graph Capture.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content