Technical GlossaryDeep Learning
Causal Attention
An autoregressive attention structure that allows a token to attend only to positions at or before itself.
Causal attention preserves temporal direction in generative language models by preventing access to future tokens. This ensures that when predicting the next token, the model uses only past context. The logical consistency of autoregressive generation depends on this masking structure. Most modern large language models are trained under this attention constraint.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
