Glossary Library

Technical GlossaryGenerative AI and LLM

Reward Model

An auxiliary model that estimates how preferable generated outputs are and provides signals for alignment.

A reward model is the learned evaluation mechanism used in RLHF-style alignment to estimate which output is better. It converts human preferences into a continuous training signal. However, if the reward model is biased or brittle, it can affect the entire alignment process. It is therefore a critical but sensitive component.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

Additive Attention

An early attention approach that compares query and context representations through a learnable combination function.

Glossary Cover

dogal-dil-isleme

Alignment in Translation

A concept that models which parts of the source text correspond to which parts in the target language.

Glossary Cover

dogal-dil-isleme

Preference Optimization

An alignment approach that makes model output more useful by optimizing against human or system preference signals.