# MoE Mathematics: Top-K Router + Softmax + Noise + Auxiliary Load-Balancing Loss

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-moe-mathematics-router-load-balancing
> Updated: 2026-05-14T14:42:53.209Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part V — MoE Internals & Fine-Tuning
**TLDR:** Router is the heart of MoE. Top-K routing math derivation (Shazeer 2017, Switch Transformer 2021), token-to-expert assignment, expert capacity factor (overflow vs underutilization), load balancing loss, softmax temperature, top-K=2 vs top-K=1. Mixtral 8×7B's actual router config.

