# Pre-Norm Transformer

> Source: https://sukruyusufkaya.com/en/glossary/pre-norm-transformer
> Updated: 2026-05-13T21:00:32.102Z
> Type: glossary
> Category: derin-ogrenme
**TLDR:** A Transformer design variant that places normalization before the main attention or FFN block.

<p>The pre-norm Transformer became important especially for maintaining stable gradient flow in deep-scale training. Moving layer normalization before the main block can make optimization more reliable in some architectures. This design is preferred in many large language models. It shows that Transformer success depends not only on attention itself, but also on subtle architectural arrangements.</p>