Glossary Library

Technical GlossaryGenerative AI and LLM

Speculative Decoding

A decoding approach that speeds up generation by validating proposals from a smaller fast model with a larger model.

Speculative decoding is one of the innovative inference techniques developed to reduce LLM generation latency. A smaller model proposes several tokens, and the larger model then accepts or rejects them in batches. When designed well, it can deliver meaningful speed gains while preserving quality to a large extent.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

yapay-zeka-temelleri

Generative AI

A class of AI systems capable of generating new content such as text, images, audio, video, or code.

Glossary Cover

veri-muhendisligi-ve-ai-altyapisi

Stream Lag

A core stream-health metric that expresses the delay gap between produced events and consumed events.

Glossary Cover

Decoder-Only Transformer

A modern large-language-model architecture that generates autoregressively by predicting the next token.