Skip to content

Stochastic Gradient Descent

An optimization approach that updates parameters using single examples or small subsets instead of the full dataset at each step.

Stochastic Gradient Descent is a more practical and scalable variation of classical gradient descent. Instead of using the entire dataset at every step, it updates the parameters using individual examples or small subsets. This speeds up training and significantly reduces computational cost on large datasets. The randomness it introduces can also help the optimizer move toward better regions of the loss surface. However, because the updates are noisier, it can behave less stably. For that reason, SGD is both powerful and sensitive to proper tuning.