# Disaggregated Serving: Prefill/Decode Separation — Mooncake + DistServe

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-disaggregated-serving-prefill-decode
> Updated: 2026-05-14T14:43:01.569Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** Latest trend in modern LLM serving (2024-2026): prefill (input encoding) and decode (generation) on different GPUs. Prefill compute-bound, decode memory-bound — separation gives 30-50% throughput gain. Mooncake (Kimi), DistServe (UCB) recipes. Conceptual in RTX 4090 multi-GPU.

