# Direct Preference Optimization > Source: https://sukruyusufkaya.com/en/glossary/direct-preference-optimization > Updated: 2026-05-13T19:59:27.473Z > Type: glossary > Category: uretken-yapay-zeka-ve-llm **TLDR:** A simpler alignment approach that learns directly from preference pairs.

DPO offers a more direct alignment method than the classical reward-model-plus-reinforcement-learning pipeline. Human or system preferences are communicated to the model through pairwise comparisons. This can provide alignment processes that are more stable and easier to optimize in practice.