Technical GlossaryGenerative AI and LLM
Reward Model
An auxiliary model that estimates how preferable generated outputs are and provides signals for alignment.
A reward model is the learned evaluation mechanism used in RLHF-style alignment to estimate which output is better. It converts human preferences into a continuous training signal. However, if the reward model is biased or brittle, it can affect the entire alignment process. It is therefore a critical but sensitive component.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
