# MoE History: From Jacobs 1991 to DeepSeek-V3 2024 — 33-Year Sparse Activation Revolution

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/moe-tarihce-jacobs-1991-deepseek-v3
> Updated: 2026-05-13T13:04:09.988Z
> Category: LLM Mühendisliği
> Module: Module 18: Mixture of Experts (MoE) — Sparse Activation Revolution
**TLDR:** 33-year intellectual journey of Mixture of Experts: Jacobs et al. 1991 original paper ('Adaptive Mixtures of Local Experts'), Shazeer et al. 2017 'Outrageously Large Neural Networks' — beginning of modern MoE, GShard 2020 Google scale, Switch Transformer 2021, Mixtral 8x7B (January 2024) open-source revolution, DeepSeek-V3 (December 2024) 671B active 37B. 'Why was it outside the door for 33 years, why did it come back now?'

