# Scaled Dot-Product Attention

> Source: https://sukruyusufkaya.com/en/glossary/scaled-dot-product-attention
> Updated: 2026-05-13T21:07:53.633Z
> Type: glossary
> Category: derin-ogrenme
**TLDR:** The fundamental Transformer operation that computes attention weights by scaling similarity between query and key vectors.

<p>Scaled dot-product attention is the core attention computation inside a Transformer. Similarities between queries and keys are computed, scaled, and converted into weights through softmax. This enables the model to learn which tokens should focus more strongly on which others. Because it is both simple and amenable to parallel computation, it sits at the center of modern attention architectures.</p>