Skip to content
Technical GlossaryDeep Learning

Attention Mask

A control mechanism that determines which positions a model may or may not attend to during attention computation.

An attention mask makes context access in attention mechanisms rule-governed. It can be used to ignore padding tokens, hide future positions, or restrict focus to specific regions. Without it, the model may attend to irrelevant or prohibited information. In Transformer training, correct masking is therefore essential for the semantic correctness of the architecture.