# Optimization: From SGD to AdamW, Lion, Muon — All Modern LLM Optimizers

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/optimization-sgd-adam-adamw-lion-muon
> Updated: 2026-05-13T13:00:23.285Z
> Category: LLM Mühendisliği
> Module: Module 1: The AI Engineer's Mathematical Arsenal
**TLDR:** Past and future of the gradient descent family: GD, SGD, Momentum (Heavy ball, Nesterov), AdaGrad, RMSProp, Adam, AdamW, Lion, Muon. Learning rate schedules: linear warmup + cosine decay. Loss landscape: sharp vs flat minima.