Skip to content
Technical GlossaryGenerative AI and LLM

Speculative Decoding

A decoding approach that speeds up generation by validating proposals from a smaller fast model with a larger model.

Speculative decoding is one of the innovative inference techniques developed to reduce LLM generation latency. A smaller model proposes several tokens, and the larger model then accepts or rejects them in batches. When designed well, it can deliver meaningful speed gains while preserving quality to a large extent.