# Direct Preference Optimization

> Source: https://sukruyusufkaya.com/en/glossary/direct-preference-optimization
> Updated: 2026-05-13T19:59:27.473Z
> Type: glossary
> Category: uretken-yapay-zeka-ve-llm
**TLDR:** A simpler alignment approach that learns directly from preference pairs.

<p>DPO offers a more direct alignment method than the classical reward-model-plus-reinforcement-learning pipeline. Human or system preferences are communicated to the model through pairwise comparisons. This can provide alignment processes that are more stable and easier to optimize in practice.</p>