Technical GlossaryDeep Learning
Sparse Attention
An attention approach that reduces cost by allowing each element to attend only to selected regions rather than the full sequence.
Sparse attention was developed to reduce the quadratic complexity of standard self-attention. In long-context tasks, letting every token attend to every other token becomes expensive, so only selected attention patterns are used. This is important in long-document modeling, genomic data, and large-context language models. It creates new trade-offs between computational efficiency and representational richness.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
