# FSDP + ZeRO: Sharded Training — Memory Revolution from Rajbhandari 2020 to Llama-3

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/fsdp-zero-sharded-training-rajbhandari-2020
> Updated: 2026-05-13T13:00:29.222Z
> Category: LLM Mühendisliği
> Module: Module 13: Distributed Training — Multi-GPU/Multi-Node
**TLDR:** ZeRO (Zero Redundancy Optimizer, Rajbhandari 2020) — DeepSpeed library: optimizer state, gradients, parameters sharding stages 1/2/3. FSDP (Fully Sharded Data Parallel, PyTorch native) — ZeRO-3 implementation. Llama-3 production: FSDP + activation checkpointing. Memory math: 8B model trainable on 1 H100.

