Skip to content
Technical GlossaryDeep Learning

Transformer Feed-Forward Network

A Transformer sub-block that operates independently on each token and strengthens representation transformation.

The feed-forward network inside a Transformer provides token-wise nonlinear transformation that attention alone does not supply. It typically consists of two linear layers and an activation function. Although it operates independently on each token, it contributes a major portion of the model’s overall capacity. In large language models, a substantial share of parameters resides in this substructure.