Glossary Library

Technical GlossaryGenerative AI and LLM

Direct Preference Optimization

A simpler alignment approach that learns directly from preference pairs.

DPO offers a more direct alignment method than the classical reward-model-plus-reinforcement-learning pipeline. Human or system preferences are communicated to the model through pairwise comparisons. This can provide alignment processes that are more stable and easier to optimize in practice.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

yapay-zeka-temelleri

Generative AI

A class of AI systems capable of generating new content such as text, images, audio, video, or code.

Glossary Cover

Decoder-Only Transformer

A modern large-language-model architecture that generates autoregressively by predicting the next token.

Glossary Cover

Additive Attention

An early attention approach that compares query and context representations through a learnable combination function.