# Mixture-of-Experts Transformer

> Source: https://sukruyusufkaya.com/en/glossary/mixture-of-experts-transformer
> Updated: 2026-05-13T21:09:23.824Z
> Type: glossary
> Category: derin-ogrenme
**TLDR:** A Transformer approach that improves scaling efficiency by activating selected expert subnetworks rather than the full model on every input.

<p>Mixture-of-Experts Transformer architectures aim to increase model capacity without requiring all parameters to be active for every input. A routing mechanism decides which expert subnetworks should process the incoming example. This creates a new balance between computational efficiency and model scale. In large-scale systems, it embodies the idea of efficient specialization.</p>