Skip to content
Technical GlossaryGenerative AI and LLM

Reward Model

An auxiliary model that estimates how preferable generated outputs are and provides signals for alignment.

A reward model is the learned evaluation mechanism used in RLHF-style alignment to estimate which output is better. It converts human preferences into a continuous training signal. However, if the reward model is biased or brittle, it can affect the entire alignment process. It is therefore a critical but sensitive component.