# Anatomy of Activation Memory: Why O(L·s·h) and the Real Savings of FlashAttention

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-activation-memory-anatomy-flashattention
> Updated: 2026-05-14T14:42:49.550Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part I — Hardware & Memory Engineering
**TLDR:** Activation memory: forward pass's most misleading memory consumer. Layer-by-layer breakdown (attn intermediates, FFN, norm, residual), FlashAttention's saved-memory math (O(s²)→O(s)), the 'sqrt(L) savings' myth of grad-checkpoint, packing + variable-length attention.

