Skip to content
Technical GlossaryGenerative AI and LLM

Autoregressive Decoding

A generation mode in which the model produces output token by token using previous outputs as context.

Autoregressive decoding is the most common generation mechanism in modern large language models. Each new token is produced on top of the context generated so far. While this provides flexible and powerful generation, it also introduces major engineering issues such as latency, error accumulation, and the cost of long outputs.