Skip to content
Technical GlossaryDeep Learning

Self-Attention

An attention mechanism in which each element in a sequence directly models its relationship with all others.

Self-attention allows each token or element in a sequence to interact directly with all others. This makes it possible to capture long-range dependencies without relying on strictly sequential processing. It is one of the core ideas at the heart of the Transformer revolution. Because of its power for parallel computation and context modeling, it forms the backbone of modern NLP.