Technical GlossaryDeep Learning
Mixture-of-Experts Transformer
A Transformer approach that improves scaling efficiency by activating selected expert subnetworks rather than the full model on every input.
Mixture-of-Experts Transformer architectures aim to increase model capacity without requiring all parameters to be active for every input. A routing mechanism decides which expert subnetworks should process the incoming example. This creates a new balance between computational efficiency and model scale. In large-scale systems, it embodies the idea of efficient specialization.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
