Skip to content
Technical GlossaryDeep Learning

Post-Norm Transformer

The classical Transformer variant that applies normalization after the attention or FFN block.

The post-norm Transformer reflects the arrangement used in the original Transformer design. While it works well in some tasks, it can introduce optimization-stability challenges in very deep models. The rise of pre-norm designs made the practical consequences of this difference more visible. Still, comparing the two remains important for understanding architectural behavior.