# Custom Triton Kernel Lab: Cross-Entropy + Ignore-Index — Unsloth-Style Speedup

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-custom-triton-kernel-cross-entropy
> Updated: 2026-05-14T14:42:59.480Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XIII — Custom Kernels & Performance Surgery
**TLDR:** PyTorch native \`F.cross_entropy(ignore_index=-100)\` one of LLM training's most-called kernels. Naïve implementation can be 30% faster with Triton. Cookbook Lab: fused logits + softmax + CE + grad → single kernel. Pattern Unsloth uses. 8B model FT throughput +15% on RTX 4090.

