# DeepSpeed ZeRO Stage 1/2/3 + ZeRO-Infinity: NVMe Offload + 70B on Single GPU?

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-deepspeed-zero-stages-infinity
> Updated: 2026-05-14T14:42:52.403Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part IV — Mid-Large Models (13B-70B+) + Distributed Internals
**TLDR:** ZeRO (Microsoft) — father of sharding, predates FSDP. Stage 1 (optimizer shard), 2 (+ gradient), 3 (+ param, FULL_SHARD equivalent). ZeRO-Infinity NVMe spillover → 70B single GPU theoretically possible (slow but possible). Decision matrix: ZeRO vs FSDP.

