# MoE Quantization & Inference: Expert Offload + Dynamic Routing Under Quant

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-moe-quantization-expert-offload
> Updated: 2026-05-14T14:42:53.730Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part V — MoE Internals & Fine-Tuning
**TLDR:** MoE inference differs from dense: some experts 'cold' (rarely used) → CPU/disk offload. Dynamic routing × quantization interaction (router's quant tolerance), MoE-specific vLLM tuning, Mixtral AWQ + sparse expert loading. Mixtral 8×7B serving on RTX 4090 (~140 tok/s).

