Glossary Library

Technical GlossaryDeep Learning

Mixture-of-Experts Transformer

A Transformer approach that improves scaling efficiency by activating selected expert subnetworks rather than the full model on every input.

Mixture-of-Experts Transformer architectures aim to increase model capacity without requiring all parameters to be active for every input. A routing mechanism decides which expert subnetworks should process the incoming example. This creates a new balance between computational efficiency and model scale. In large-scale systems, it embodies the idea of efficient specialization.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

veri-bilimi-ve-veri-yonetimi

Normalization

The process of bringing numerical variables to a defined scale to make them more suitable for modeling and comparison.

Glossary Cover

veri-bilimi-ve-veri-yonetimi

Standardization

The process of transforming a variable so that it has mean zero and standard deviation one.

Glossary Cover

veri-bilimi-ve-veri-yonetimi

Quantile Transformation

A transformation that reshapes data through rank-based mapping to make it more regular or closer to a target distribution.