# Triton Crash Course: Block Pointer + Autotune + Masks — GPU Kernel in 50 Lines

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-triton-crash-course
> Updated: 2026-05-14T14:42:59.388Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XIII — Custom Kernels & Performance Surgery
**TLDR:** Triton (OpenAI, 2021) — GPU kernel framework as fast as CUDA, easy as Python. \`@triton.jit\`, \`tl.program_id\`, \`tl.arange\`, block pointer arithmetic, autotune decorator, mask-based load/store, shared memory abstraction. Write vector add → matmul → softmax kernels from scratch on RTX 4090.

