# Multi-Head Attention

> Source: https://sukruyusufkaya.com/en/glossary/multi-head-attention
> Updated: 2026-05-13T20:58:24.486Z
> Type: glossary
> Category: derin-ogrenme
**TLDR:** A structure that runs attention in parallel across multiple subspaces to learn different types of relationships.

<p>Multi-head attention enables the model to learn multiple relationship patterns simultaneously instead of relying on a single attention map. Some heads may focus on local context, others on long-range dependencies, and still others on different semantic structures. This multiplicity significantly increases the representational power of Transformer models. It has become a standard architectural component in modern language and multimodal systems.</p>