# Multi-Head Attention > Source: https://sukruyusufkaya.com/en/glossary/multi-head-attention > Updated: 2026-05-13T20:58:24.486Z > Type: glossary > Category: derin-ogrenme **TLDR:** A structure that runs attention in parallel across multiple subspaces to learn different types of relationships.

Multi-head attention enables the model to learn multiple relationship patterns simultaneously instead of relying on a single attention map. Some heads may focus on local context, others on long-range dependencies, and still others on different semantic structures. This multiplicity significantly increases the representational power of Transformer models. It has become a standard architectural component in modern language and multimodal systems.