Technical GlossaryDeep Learning
Multi-Head Attention
A structure that runs attention in parallel across multiple subspaces to learn different types of relationships.
Multi-head attention enables the model to learn multiple relationship patterns simultaneously instead of relying on a single attention map. Some heads may focus on local context, others on long-range dependencies, and still others on different semantic structures. This multiplicity significantly increases the representational power of Transformer models. It has become a standard architectural component in modern language and multimodal systems.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
