# Speculative Decoding Production: Draft + Target Pairing + Accept Rate Measurement

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-speculative-decoding-production
> Updated: 2026-05-14T14:43:01.483Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** Speculative decoding (Leviathan et al. 2023, Chen et al. 2023) — small draft model predicts 4-8 tokens, target model **verifies**. High accept rate → 2-3x throughput. EAGLE-2 (Li et al. 2024), MEDUSA head training. Llama 3.1 8B target + Llama 3.2 1B draft on RTX 4090: 175 → 290 tok/s.

