# FlashAttention v2/v3 Internals: Tile + Online Softmax + Hopper WGMMA

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-flashattention-internals-tile-online-softmax
> Updated: 2026-05-14T14:42:59.303Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XIII — Custom Kernels & Performance Surgery
**TLDR:** FlashAttention's mathematical heart: tile-by-tile attention compute, **online softmax** (incremental running max + sum), backward recomputation strategy. v2 → v3 difference: Hopper WGMMA, async memory, FP8 attention. Head-size constraint, deterministic mode, varlen variant.

