Skip to content
Technical GlossaryDeep Learning

Decoder-Only Transformer

A modern large-language-model architecture that generates autoregressively by predicting the next token.

The decoder-only Transformer forms the basis of most modern large language models. It generates autoregressively by predicting the next token based on prior context. This creates a strong alignment between a simple training objective and large-scale pretraining. It has become the dominant architecture in text generation, code generation, and general-purpose language modeling.