Technical GlossaryDeep Learning
Transformer Feed-Forward Network
A Transformer sub-block that operates independently on each token and strengthens representation transformation.
The feed-forward network inside a Transformer provides token-wise nonlinear transformation that attention alone does not supply. It typically consists of two linear layers and an activation function. Although it operates independently on each token, it contributes a major portion of the model’s overall capacity. In large language models, a substantial share of parameters resides in this substructure.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
