GELU Activation

A modern activation function that transforms inputs with probabilistic smoothness rather than a hard threshold.

GELU is a modern activation function that became especially common in Transformer-based models. Rather than applying a hard threshold like ReLU, it scales inputs in a smoother way, which can lead to more stable learning behavior in some architectures. It is frequently found in large language models and advanced attention-based systems. Although slightly more complex computationally, it is often preferred because of its performance benefits.