KTO (Kahneman-Tversky Optimization): Alignment from One-Sided (Unpaired) Feedback

KTO (Ethayarajh et al. 2024) — feedback you actually get in production: 'thumbs up' / 'thumbs down'. Not pairs. Classical DPO can't use this data. KTO fills the gap: utility function from prospect theory (Kahneman-Tversky). Continuous learning loop in production.

Şükrü Yusuf KAYA

28 min read

5/14/2026

Advanced

KTO (Kahneman-Tversky Optimization): Pair Değil Tek-Yönlü Feedback'ten Alignment

1. KTO Dataset Format#

{
  "prompt": "Ankara'nın nüfusu nedir?",
  "completion": "Yaklaşık 5.6 milyon (2024 verisi).",
  "label": true                  // true = thumbs-up (good), false = thumbs-down (bad)
}

Pair gerek yok. Production'da:

Kullanıcı response gördü, thumbs-up dedi →
label: true
Thumbs-down dedi →
label: false

Avantaj: Pair toplamak 5-10x daha pahalı (annotator iki cevap karşılaştırması) vs single label. KTO production-scale feedback için ideal.

2. KTO Loss — Prospect Theory'den#

Kahneman-Tversky'nin utility function (insan tercihinin asimetrik modellemesi):

Gains:
U(g) = g^α
(α < 1, concave)
Losses:
U(l) = -λ |l|^α
(λ > 1, kayıp 2.25x kazanç kadar acı verir)

KTO'ya uyarlanmış:

L_KTO = -E_yes[w_yes · σ(β · r(x, y) - z_yes)]
        -E_no [w_no · σ(z_no - β · r(x, y))]

r(x, y) = log(π_θ(y|x) / π_ref(y|x))    # DPO-style implicit reward
z_yes, z_no = utility thresholds (KTO hyperparams)
w_yes, w_no = sample weights (true/false ratio için balance)

Cookbook hyperparam: β=0.1, w_yes=w_no=1 (eğer dataset balanced ise).

python

# === KTO Lab — TRL KTOTrainer ===
from trl import KTOConfig, KTOTrainer
from datasets import Dataset
 
# 1. Production thumbs data (örnek)
production_data = [
    {"prompt": "İstanbul nüfusu?", "completion": "Yaklaşık 15 milyon.", "label": True},
    {"prompt": "İstanbul nüfusu?", "completion": "Bilmiyorum.", "label": False},
    # ... 5000+ örnek
]
dataset = Dataset.from_list(production_data)
 
# 2. Config
cfg = KTOConfig(
    output_dir="llama-3.1-8b-kto",
    num_train_epochs=2,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=5e-6,
    bf16=True, optim="paged_adamw_8bit",
    max_length=4096,
    max_prompt_length=2048,
    beta=0.1,
    desirable_weight=1.0,                  # w_yes
    undesirable_weight=1.0,                # w_no
    logging_steps=5, report_to="wandb",
)
 
trainer = KTOTrainer(model=model, args=cfg,
                     train_dataset=dataset, tokenizer=tok)
trainer.train()

KTO Lab — TRL

✅ Teslim

Sentetik thumbs-up/down dataset üret. 2) KTO ile FT et. 3) Aynı modelde DPO ile karşılaştır. 4) Sonraki ders: 11.6 — DPO Family: SimPO, IPO, CPO, RPO.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

KTO (Kahneman-Tversky Optimization): Alignment from One-Sided (Unpaired) Feedback

1. KTO Dataset Format#

2. KTO Loss — Prospect Theory'den#

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter