# LLM Fine-Tuning: A Comprehensive 2026 Guide to LoRA, QLoRA, DPO, and Modern Alignment

> Source: https://sukruyusufkaya.com/en/blog/llm-fine-tuning-lora-qlora-dpo
> Updated: 2026-05-13T19:58:09.245Z
> Type: blog
> Category: yapay-zeka
**TLDR:** The most current, detailed 2026 Turkish guide to adapting an LLM to your domain. Covers when fine-tuning is necessary, the math behind LoRA, 4-bit training with QLoRA, why DPO beats PPO, modern alternatives (ORPO/KTO/IPO), Turkish dataset sources, GPU/cloud cost modeling, production pipelines, 3 anonymized Turkish enterprise case studies, and KVKK-compliant training. For developers, MLOps engineers, and AI architects.

<tldr data-summary="[&#34;Fine-tuning is the additional training that locks specific dimensions of an LLM\&#39;s behavior — style, format, behavior, domain knowledge — without changing its core capabilities. It is the right answer for ~5% of needs.&#34;,&#34;LoRA (Low-Rank Adaptation) trains small adapter matrices instead of full weights; with 0.1-1% of parameters updated, it delivers 90-95% of full fine-tuning quality.&#34;,&#34;QLoRA pairs LoRA with 4-bit quantization, making a 70B model fine-tunable on a single A100 GPU — the engine behind the post-2023 personal/small-team fine-tuning boom.&#34;,&#34;DPO (Direct Preference Optimization) replaces classic RLHF\&#39;s PPO + reward-model loop with a simple supervised loss on preference pairs; the 2024-2026 modern alignment standard.&#34;,&#34;For Turkish enterprises, fine-tuning typically costs $200-$5,000; data preparation determines 70% of cost and quality — training is only the last step.&#34;]" data-one-line="Fine-tuning is the advanced AI-engineering discipline that, in the right situations — when RAG and prompt engineering fall short — permanently bends an LLM's behavior toward your organization's DNA."></tldr>

## 1. What is Fine-Tuning and When is it Necessary?

Three main strategies adapt LLMs to your use case: **prompt engineering**, **RAG**, and **fine-tuning**. The first two leave the model unchanged; fine-tuning **updates model weights through additional training**. In the right situations, it produces enormous value; in the wrong ones, it is a waste of money.

<definition-box data-term="Fine-Tuning" data-definition="The process of updating a pretrained language model's (foundation model's) weights via additional training on a custom dataset and task. Aligns the model to a specific domain, style, format, or behavior while preserving the existing knowledge base. Covers methods like full fine-tuning, LoRA, QLoRA, DPO, and ORPO." data-also="Model Adaptation"></definition-box>

### When to Fine-Tune?

A practical decision framework:

<comparison-table data-caption="Fine-Tuning vs Other Adaptation Methods" data-headers="[&#34;Need&#34;,&#34;Prompt Eng&#34;,&#34;RAG&#34;,&#34;Fine-tuning&#34;]" data-rows="[{&#34;feature&#34;:&#34;Lock in style/format&#34;,&#34;values&#34;:[&#34;Partial&#34;,&#34;-&#34;,&#34;Ideal&#34;]},{&#34;feature&#34;:&#34;Add domain knowledge&#34;,&#34;values&#34;:[&#34;-&#34;,&#34;Ideal&#34;,&#34;Limited&#34;]},{&#34;feature&#34;:&#34;Access fresh data&#34;,&#34;values&#34;:[&#34;-&#34;,&#34;Ideal&#34;,&#34;-&#34;]},{&#34;feature&#34;:&#34;Teach new behavior&#34;,&#34;values&#34;:[&#34;Partial&#34;,&#34;-&#34;,&#34;Ideal&#34;]},{&#34;feature&#34;:&#34;Reduce latency&#34;,&#34;values&#34;:[&#34;-&#34;,&#34;-&#34;,&#34;Yes (small model)&#34;]},{&#34;feature&#34;:&#34;Save tokens&#34;,&#34;values&#34;:[&#34;-&#34;,&#34;-&#34;,&#34;Ideal&#34;]},{&#34;feature&#34;:&#34;Setup time&#34;,&#34;values&#34;:[&#34;Hours&#34;,&#34;Weeks&#34;,&#34;Weeks-months&#34;]},{&#34;feature&#34;:&#34;Cost&#34;,&#34;values&#34;:[&#34;Very low&#34;,&#34;Medium&#34;,&#34;High (one-time)&#34;]}]"></comparison-table>

**Practical rule.** 70% of needs are solved by prompt engineering, 25% more by prompt + RAG. The remaining **5%** is where fine-tuning produces real value: locking in style/format, guaranteed structured output, lowering latency/cost (distillation), domain-specific language (Turkish law, medicine), and new behavior (agent tasks, tool use).

<stat-callout data-value="5%" data-context="The actual rate of production LLM applications that truly require fine-tuning —" data-outcome="the other 95% are solved by prompt engineering + RAG. Exhaust those two layers before reaching for fine-tuning." data-source="{&#34;label&#34;:&#34;OpenAI Cookbook + Anthropic Best Practices&#34;,&#34;url&#34;:&#34;https://platform.openai.com/docs/guides/fine-tuning&#34;,&#34;date&#34;:&#34;2025&#34;}"></stat-callout>

### Why Try Prompt and RAG First?

Fine-tuning has five side effects: high upfront cost (GPU hours, data, evals), model "freezing" (re-do work on each new base model), catastrophic forgetting risk, data-management complexity (KVKK + IP + quality), and harder evaluation. That is why OpenAI, Anthropic, and Google all officially recommend **prompt + RAG first, fine-tuning later**.

## 2. The Full LLM Training Pipeline

A modern LLM goes through four training stages, each with a distinct purpose, dataset type, and cost.

<comparison-table data-caption="LLM Training Stages (Full Picture)" data-headers="[&#34;Stage&#34;,&#34;Purpose&#34;,&#34;Data Type&#34;,&#34;Time/Cost&#34;]" data-rows='[{"feature":"1. Pretraining","values":["General language","Trillions of tokens (internet, books, code)","Months, millions $"]},{"feature":"2. Supervised Fine-Tuning (SFT)","values":["Instruction following","Thousands of high-quality Q&A pairs","Days, thousands $"]},{"feature":"3. Preference Optimization (RLHF/DPO/ORPO)","values":["Human preference","Preference pairs (A > B)","Days, thousands $"]},{"feature":"4. Continued Fine-tuning (yours)","values":["Domain/style alignment","Hundreds-thousands of examples","Hours-days, $50-5,000"]}]'></comparison-table>

Enterprise fine-tuning usually happens at **Stage 4**.

### Supervised Fine-Tuning (SFT)

The most basic form — standard next-token prediction training on instruction-response pairs. Most enterprise fine-tunes are SFT (style, format, domain knowledge).

### Preference Optimization

Human evaluators see two responses (A, B) for the same prompt and mark the better one. The model is then pushed toward "good" responses via:

- **RLHF (PPO)** — classic; trains a reward model and applies PPO. Complex and resource-heavy.
- **DPO** — skips the reward model; supervised loss directly on preference pairs. Simple, effective, the standard since 2024.
- **ORPO / KTO / IPO** — derivatives and alternatives detailed below.

## 3. PEFT — Parameter-Efficient Fine-Tuning

Fully fine-tuning a 70B-parameter model requires updating all 70B weights — needs 800GB+ VRAM, only large labs reach that. **PEFT** solves this by updating only a **small parameter subset**.

<definition-box data-term="PEFT (Parameter-Efficient Fine-Tuning)" data-definition="A family of techniques that fine-tune a small subset of parameters rather than the entire weights of pretrained large models. Includes LoRA, QLoRA, AdaLoRA, IA-3, Prefix Tuning, Prompt Tuning. Reduces compute by 10-100x with typically only 5-10% quality drop." data-also="Parameter-Efficient Fine-Tuning"></definition-box>

PEFT members: **LoRA**, **QLoRA**, **AdaLoRA**, **IA-3**, **Prefix Tuning**, **Prompt Tuning**, **DoRA** (2024), **MoRA** (2024).

## 4. LoRA — Low-Rank Adaptation

Published in 2021 by Microsoft researchers (Hu et al.), LoRA has become **the gold standard of modern fine-tuning**.

### 4.1. Math (Brief)

In full fine-tuning, a weight matrix <code>W</code> (e.g., 4096×4096) is updated directly: <code>W_new = W + ΔW</code>. LoRA's assumption: <code>ΔW</code> can be **low-rank**.

LoRA expresses <code>ΔW</code> as the product of two small matrices:

<pre><code>ΔW ≈ B × A
B: 4096 × r
A: r × 4096
r &lt;&lt; 4096 (usually 4, 8, 16, 32, 64)</code></pre>

Only **A and B are updated** during training; original <code>W</code> is frozen. At inference, <code>W + B × A</code> is computed (or merged).

### 4.2. LoRA Hyperparameters

**Rank (r)** — size of LoRA matrices. Common: 8 (default), 16, 32, 64. Higher rank = more capacity but overfitting risk.

**Alpha (α)** — scaling factor. <code>ΔW_effective = (α/r) × B × A</code>. Practical: <code>α = 2r</code>.

**Target modules** — which layers get LoRA?

- <code>q_proj, v_proj</code> — attention query/value only (minimal)
- <code>q_proj, k_proj, v_proj, o_proj</code> — all attention
- <code>q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj</code> — attention + MLP (most thorough)

**Tip.** All linear layers gives best results. Attention-only loses 5-10% quality on most tasks.

### 4.3. Full Fine-Tuning vs LoRA

<comparison-table data-caption="Full Fine-Tuning vs LoRA (Llama 3 70B Example)" data-headers="[&#34;Dimension&#34;,&#34;Full FT&#34;,&#34;LoRA&#34;]" data-rows="[{&#34;feature&#34;:&#34;Trained params&#34;,&#34;values&#34;:[&#34;70B (full)&#34;,&#34;~0.5B (0.7%)&#34;]},{&#34;feature&#34;:&#34;VRAM need&#34;,&#34;values&#34;:[&#34;800GB+&#34;,&#34;48-80GB&#34;]},{&#34;feature&#34;:&#34;Training time&#34;,&#34;values&#34;:[&#34;1x&#34;,&#34;0.5-0.7x&#34;]},{&#34;feature&#34;:&#34;Quality&#34;,&#34;values&#34;:[&#34;100% (baseline)&#34;,&#34;90-95%&#34;]},{&#34;feature&#34;:&#34;Data need&#34;,&#34;values&#34;:[&#34;More&#34;,&#34;Less (1K-10K samples)&#34;]},{&#34;feature&#34;:&#34;Output size&#34;,&#34;values&#34;:[&#34;~140GB&#34;,&#34;~50MB-1GB (adapter only)&#34;]},{&#34;feature&#34;:&#34;Multi-task&#34;,&#34;values&#34;:[&#34;Hard&#34;,&#34;Multi-adapter swap&#34;]}]"></comparison-table>

LoRA's **small output** (50MB-1GB) is especially valuable — you can run 10 different LoRA adapters on the same model, switching at runtime.

## 5. QLoRA — 4-bit Quantization + LoRA

Published in 2023 by Dettmers et al., QLoRA pairs LoRA with **quantization** to make 70B models trainable on **a single A100 GPU**. The engine of the personal/small-team fine-tuning explosion.

### 5.1. Three Main Components

**4-bit NF4 (Normal Float 4) quantization.** Model weights stored at 4-bit instead of 16-bit. NF4 is more accurate than standard 4-bit — optimized for normal-distributed data.

**Double Quantization (DQ).** Even the quantization constants are quantized for additional memory savings.

**Paged Optimizers.** Move optimizer state between RAM and GPU in pages to reduce OOM errors.

### 5.2. Practical QLoRA Cost (2026)

<comparison-table data-caption="QLoRA Cost Estimates (2026)" data-headers="[&#34;Model&#34;,&#34;GPU&#34;,&#34;Time (10K samples)&#34;,&#34;Est. Cost&#34;]" data-rows="[{&#34;feature&#34;:&#34;Llama 3 8B&#34;,&#34;values&#34;:[&#34;1x RTX 4090 (24GB)&#34;,&#34;2-4 hours&#34;,&#34;$5-15 (RunPod)&#34;]},{&#34;feature&#34;:&#34;Llama 3 70B&#34;,&#34;values&#34;:[&#34;1x A100 80GB&#34;,&#34;8-12 hours&#34;,&#34;$50-150 (Modal/RunPod)&#34;]},{&#34;feature&#34;:&#34;Llama 4 70B&#34;,&#34;values&#34;:[&#34;1x H100 80GB&#34;,&#34;6-10 hours&#34;,&#34;$80-200&#34;]},{&#34;feature&#34;:&#34;Mixtral 8x7B&#34;,&#34;values&#34;:[&#34;1x A100 80GB&#34;,&#34;10-15 hours&#34;,&#34;$80-200&#34;]},{&#34;feature&#34;:&#34;Qwen 2.5 72B&#34;,&#34;values&#34;:[&#34;1x H100 80GB&#34;,&#34;8-12 hours&#34;,&#34;$120-250&#34;]}]"></comparison-table>

**Costs are training only.** Data prep, eval, and iteration usually add 2-5x to total.

## 6. DPO — Direct Preference Optimization

Published in 2023 by Rafailov et al., DPO offers a **much simpler mathematical formulation** than classic RLHF/PPO. The 2024-2026 modern alignment standard.

<definition-box data-term="DPO (Direct Preference Optimization)" data-definition="A method that, on a human preference dataset (chosen/rejected pairs), skips reward-model training and PPO steps and uses a supervised-style loss directly. Published in 2023 by Stanford and CMU researchers; dramatically reduces the operational complexity of classic RLHF. Has been the standard in the open-model ecosystem since 2024." data-also="Direct Preference Optimization"></definition-box>

### 6.1. PPO (Classic RLHF) vs DPO

<comparison-table data-caption="RLHF (PPO) vs DPO" data-headers="[&#34;Dimension&#34;,&#34;RLHF (PPO)&#34;,&#34;DPO&#34;]" data-rows="[{&#34;feature&#34;:&#34;Reward Model&#34;,&#34;values&#34;:[&#34;Required (separate training)&#34;,&#34;Not needed&#34;]},{&#34;feature&#34;:&#34;Pipeline stages&#34;,&#34;values&#34;:[&#34;3 (SFT + RM + PPO)&#34;,&#34;2 (SFT + DPO)&#34;]},{&#34;feature&#34;:&#34;Training stability&#34;,&#34;values&#34;:[&#34;Low (hyperparam sensitive)&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;Compute cost&#34;,&#34;values&#34;:[&#34;~5x SFT&#34;,&#34;~1.5x SFT&#34;]},{&#34;feature&#34;:&#34;Code complexity&#34;,&#34;values&#34;:[&#34;High&#34;,&#34;Low&#34;]},{&#34;feature&#34;:&#34;Quality (frontier)&#34;,&#34;values&#34;:[&#34;Historically best&#34;,&#34;Equal or superior (recent research)&#34;]}]"></comparison-table>

### 6.2. DPO Dataset Structure

You need **chosen/rejected** pairs.

<pre><code>{
  "prompt": "How would you respond to a customer complaint?",
  "chosen": "An empathetic, solution-focused, short, clear response...",
  "rejected": "A defensive, generic, overly long response..."
}</code></pre>

Usually 500-5,000 preference pairs suffice; quality matters more than quantity.

### 6.3. DPO Derivatives (2024-2026)

After DPO, many derivatives appeared:

- **ORPO (Odds Ratio Preference Optimization)** — Combines SFT and preference optimization in one step. Hong et al. (2024).
- **KTO (Kahneman-Tversky Optimization)** — Uses **single-answer reward/penalty** signals instead of preference pairs. Ethayarajh et al. (2024).
- **IPO (Identity Preference Optimization)** — Regularization against DPO over-fitting. Azar et al. (2023).
- **CPO (Contrastive Preference Optimization)** — Stronger reject signal. Xu et al. (2024).
- **simPO (Simple Preference Optimization)** — Skips reference model. Meng et al. (2024).

<callout-box data-variant="tip" data-title="Practical Selection Guide">

For **standard enterprise fine-tuning**: **SFT + DPO** is the most stable 2026 choice.

For **combining SFT and DPO in one stage**: **ORPO**.

If **producing dual responses is expensive** (preference pairs hard to make): **KTO** (single-answer + binary feedback).

PPO is valuable only for academic research or frontier-model training — not worth the complexity for enterprise products.

</callout-box>

## 7. Practical Fine-Tuning Pipeline

A 7-stage pipeline from zero to production:

<howto-steps data-name="Production Fine-Tuning Pipeline — 7 Stages" data-description="A step-by-step path from zero to production-quality fine-tuning." data-time="P30D" data-steps="[{&#34;name&#34;:&#34;1. Use-Case Definition + Baseline&#34;,&#34;text&#34;:&#34;Why fine-tuning? How well does prompt + RAG work? Define baseline metrics.&#34;},{&#34;name&#34;:&#34;2. Data Collection&#34;,&#34;text&#34;:&#34;500-10,000 high-quality samples. Manual labeling, cleaned from existing data, or synthetic (a large model teaching a smaller one).&#34;},{&#34;name&#34;:&#34;3. Data Cleaning + QA&#34;,&#34;text&#34;:&#34;Dedupe, fix labels, strip PII (KVKK). Split train/val/test (usually 80/10/10).&#34;},{&#34;name&#34;:&#34;4. Format + Tokenization&#34;,&#34;text&#34;:&#34;Chat template (Llama, Mistral, ChatML), system prompt structure, sequence length, tokenizer checks.&#34;},{&#34;name&#34;:&#34;5. Training&#34;,&#34;text&#34;:&#34;Framework choice (Unsloth, Axolotl, LLaMA Factory). Hyperparams: learning rate (1e-4 LoRA, 5e-5 SFT), batch size, epochs (1-3), LoRA r/alpha. Cloud GPU or local.&#34;},{&#34;name&#34;:&#34;6. Evaluation&#34;,&#34;text&#34;:&#34;Automated metrics (perplexity, BLEU, custom) + LLM-as-judge + human eval. Pre-production eval set is mandatory.&#34;},{&#34;name&#34;:&#34;7. Deployment&#34;,&#34;text&#34;:&#34;Serve via vLLM, TGI, or Ollama. A/B test (existing vs fine-tune). Monitor performance + cost.&#34;}]"></howto-steps>

### 7.1. Training Frameworks

<comparison-table data-caption="2026 Fine-Tuning Framework Comparison" data-headers="[&#34;Framework&#34;,&#34;Speed&#34;,&#34;Ease&#34;,&#34;Scope&#34;]" data-rows="[{&#34;feature&#34;:&#34;Unsloth&#34;,&#34;values&#34;:[&#34;2-5x fast (Triton optimization)&#34;,&#34;High (simple Python)&#34;,&#34;LoRA, QLoRA, SFT, DPO&#34;]},{&#34;feature&#34;:&#34;Axolotl&#34;,&#34;values&#34;:[&#34;Standard&#34;,&#34;Medium (YAML config)&#34;,&#34;Full spectrum, including full FT&#34;]},{&#34;feature&#34;:&#34;LLaMA Factory&#34;,&#34;values&#34;:[&#34;Standard&#34;,&#34;High (CLI + UI)&#34;,&#34;LoRA, QLoRA, RLHF, DPO, ORPO, KTO&#34;]},{&#34;feature&#34;:&#34;Hugging Face TRL&#34;,&#34;values&#34;:[&#34;Standard&#34;,&#34;Medium (Python library)&#34;,&#34;Full spectrum, latest techniques&#34;]},{&#34;feature&#34;:&#34;Together / Replicate / Modal&#34;,&#34;values&#34;:[&#34;Cloud&#34;,&#34;Very high (managed)&#34;,&#34;LoRA, limited control&#34;]},{&#34;feature&#34;:&#34;OpenAI Fine-tuning API&#34;,&#34;values&#34;:[&#34;Cloud&#34;,&#34;Very high&#34;,&#34;SFT + limited DPO, closed-source&#34;]}]"></comparison-table>

**Practical pick.** **Unsloth** for developers/researchers (speed + ease). **LLaMA Factory** for production teams (broad scope). **Together** or **Modal** for cloud ease. **Axolotl + self-hosted GPU** for compliance-critical enterprises.

### 7.2. Data Preparation — The Invisible Success Factor

**Data quality determines 70% of fine-tune outcome.** Training is the last step. Practical advice: manual > synthetic for quality but 10-50x more costly; use Self-Instruct, DataDreamer, Distilabel, Lilac for modern data-prep; isolate eval from training set; ensure class balance.

## 8. Turkish Fine-Tuning — Practical Notes

5 key nuances absent from global guides:

### 8.1. Tokenizer Efficiency

Turkish morphology makes a word 2-5 tokens in typical tokenizers. In fine-tuning: 2x sequence length needed; 30-50% higher training cost; less content fits the context.

**Fix:** Turkish-specific tokenizer (BERTurk) or vocabulary extension. Adding 3K-5K Turkish tokens to Llama/Mistral BPE vocab improves Turkish efficiency 30-50%.

### 8.2. Turkish Dataset Sources

Belebele Turkish, Cosmos QA TR, xCOPA Turkish, WMT translation pairs, Wikipedia Turkish, MultiWOZ TR, Hugging Face Turkish datasets (100+), Cezeri instruction-tuning data, plus your enterprise data (most valuable).

### 8.3. Base Model Selection (For Turkish)

<comparison-table data-caption="Base Models for Turkish Fine-Tuning" data-headers="[&#34;Model&#34;,&#34;Turkish Score&#34;,&#34;Size&#34;,&#34;License&#34;,&#34;Fine-tune Friendly&#34;]" data-rows="[{&#34;feature&#34;:&#34;Llama 4 8B&#34;,&#34;values&#34;:[&#34;Medium-good&#34;,&#34;8B&#34;,&#34;Meta open&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;Llama 4 70B&#34;,&#34;values&#34;:[&#34;Good&#34;,&#34;70B&#34;,&#34;Meta open&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;Mistral Small 3&#34;,&#34;values&#34;:[&#34;Good&#34;,&#34;22B&#34;,&#34;Apache 2.0&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;Qwen 2.5 14B&#34;,&#34;values&#34;:[&#34;High (multilingual)&#34;,&#34;14B&#34;,&#34;Apache 2.0&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;Qwen 2.5 72B&#34;,&#34;values&#34;:[&#34;Very high&#34;,&#34;72B&#34;,&#34;Apache 2.0&#34;,&#34;High&#34;]},{&#34;feature&#34;:&#34;DeepSeek V3&#34;,&#34;values&#34;:[&#34;High&#34;,&#34;671B (MoE)&#34;,&#34;MIT&#34;,&#34;Medium (large)&#34;]},{&#34;feature&#34;:&#34;BERTurk&#34;,&#34;values&#34;:[&#34;Excellent (NLP)&#34;,&#34;Base&#34;,&#34;MIT&#34;,&#34;For NLP tasks&#34;]}]"></comparison-table>

**Practical pick.** General Turkish instruction-tune: **Qwen 2.5 14B** or **Llama 4 8B/70B**. NLP-specific: **BERTurk**.

### 8.4. Turkish Style Locking

"siz" vs "sen", tone (formal/informal), regional dialects, sentence-order preferences — must be controlled in fine-tuning. Editor-level quality QA is mandatory.

### 8.5. Domain-Specific Turkish Examples

Turkish law (TBK, TMK, KVKK + case law), tax (VUK, VAT, GVK), health (anonymized medical reports), e-commerce (Trendyol/Hepsiburada catalogs), banking (BDDK + customer interactions).

## 9. Hardware, Cloud, Cost

### 9.1. GPU Choice (2026)

<comparison-table data-caption="GPU Options for Fine-Tuning (2026)" data-headers="[&#34;GPU&#34;,&#34;VRAM&#34;,&#34;Typical Cloud Price (USD/hr)&#34;,&#34;Max Model with QLoRA&#34;]" data-rows="[{&#34;feature&#34;:&#34;RTX 4090&#34;,&#34;values&#34;:[&#34;24GB&#34;,&#34;$0.40-0.80&#34;,&#34;7B-13B&#34;]},{&#34;feature&#34;:&#34;RTX 5090&#34;,&#34;values&#34;:[&#34;32GB&#34;,&#34;$0.60-1.20&#34;,&#34;13B-22B&#34;]},{&#34;feature&#34;:&#34;A100 40GB&#34;,&#34;values&#34;:[&#34;40GB&#34;,&#34;$1.20-2.00&#34;,&#34;13B-34B&#34;]},{&#34;feature&#34;:&#34;A100 80GB&#34;,&#34;values&#34;:[&#34;80GB&#34;,&#34;$1.80-3.50&#34;,&#34;34B-70B&#34;]},{&#34;feature&#34;:&#34;H100 80GB&#34;,&#34;values&#34;:[&#34;80GB&#34;,&#34;$3.50-6.00&#34;,&#34;34B-70B (fast)&#34;]},{&#34;feature&#34;:&#34;H200&#34;,&#34;values&#34;:[&#34;141GB&#34;,&#34;$5-9&#34;,&#34;70B+ (comfortable)&#34;]},{&#34;feature&#34;:&#34;GB200/B200 (Blackwell)&#34;,&#34;values&#34;:[&#34;192GB&#34;,&#34;$8-15&#34;,&#34;100B+ MoE&#34;]}]"></comparison-table>

### 9.2. Cloud Platforms

**Modal** (Python-native, pay-as-you-go), **RunPod** (cheapest spot), **Together AI** (managed FT + inference), **Replicate** (ready templates), **AWS SageMaker / GCP Vertex AI / Azure ML** (enterprise), **Lambda Cloud** (on-demand H100/H200).

### 9.3. Typical Cost Scenarios

- **Turkish style alignment, Llama 4 8B QLoRA, 5K samples:** ~$15-40 training + ~$50-100 data + ~$30 eval = **~$100-200 total**
- **Domain-specific Mistral Small 3 fine-tune, 20K samples:** ~$80-200 training + ~$300-800 data + ~$100 eval = **~$500-1,200**
- **Llama 4 70B QLoRA + DPO, 50K samples:** ~$300-600 training (2 phases) + $1,000-3,000 data + $200-500 eval = **~$2,000-5,000**

**Reminder:** data prep + eval is 60-70% of cost. GPU hours are the smallest line item.

## 10. Case Studies (Anonymized Turkish Enterprises)

### Case 1 — Turkish Bank: Turkish Legal Document Assistant

**Problem.** Contract analysis on GPT-5 missed Turkish legal jargon (TBK, TMK references, court vocabulary).

**Solution.** Llama 4 70B QLoRA fine-tune:

- **Data:** 8,000 anonymized contracts + 3,000 Turkish Supreme Court decisions + 2,000 legal Q&A pairs
- **Method:** SFT + DPO (lawyers ranked 1,500 response pairs)
- **Duration:** 6 weeks (4 weeks data, 2 weeks training + eval)
- **Cost:** ~$8,000 (with labeling)

**Result.** Turkish legal accuracy 72% → 91%. Contract analysis time per lawyer 14 hours/week → 5 hours.

### Case 2 — E-Commerce: Category Classification + Description

**Problem.** Manual category selection + Turkish description writing took hours per new product. Prompt engineering on GPT-4o-mini was insufficient (12,000 sub-categories).

**Solution.** Qwen 2.5 14B QLoRA fine-tune:

- **Data:** 250,000 existing products (name + description → category + tags + SEO description)
- **Method:** SFT (DPO not needed)
- **Training:** 2x A100 80GB, 18 hours
- **Cost:** ~$1,200

**Result.** Category classification accuracy 78% → 96%. Average human-intervention time per product 15 min → 1 min. Monthly 80K products processed at 90% lower cost than ChatGPT API (self-hosted Qwen + LoRA).

### Case 3 — Healthcare: Medical-Report Structuring

**Problem.** Converting clinical notes to structured format (ICD-10 codes, diagnosis + treatment + medication) was 80% accurate on GPT-5; healthcare needs 95%+.

**Solution.** Mistral Small 3 ORPO fine-tune:

- **Data:** 15,000 anonymized clinical notes + expert-physician-approved structured outputs
- **Method:** ORPO (SFT + DPO in one stage)
- **KVKK safeguards:** all patient data anonymized; on-prem training; audit-logged eval
- **Cost:** ~$3,500 (with physician labeling)

**Result.** Medical-structuring accuracy 97%. KVKK + health regulation compliance. Enabled B2B integration with Turkish insurers.

## 11. Common Mistakes and Anti-Patterns

### 11.1. "Fine-Tune First, Ask Questions Later"

The most common mistake. Always **eval prompt + RAG first**; know how well those two layers do before reaching for fine-tuning.

### 11.2. Training with Too Little Data

Trying to style fine-tune with under 500 samples. Usually fails. Minimum 1,000 high-quality; ideal 5,000-10,000.

### 11.3. Catastrophic Forgetting

Wrong learning rate (too high) or too many epochs (3+) breaks the model's base capabilities. Track general benchmark performance during training.

### 11.4. Test Set Leakage

If part of the training data leaks into eval, the fine-tune score is artificially inflated but fails in production. Split at cleanup; never mix during training.

### 11.5. KVKK-Non-Compliant Data

Fine-tuning with prompts that contain customer/employee personal data. **KVKK breach + the learned personal data becomes embedded in model weights.** Always anonymize.

### 11.6. No Versioning

Not versioning fine-tune adapters and datasets. Use **HF Hub, W&B, MLflow** to track every experiment.

### 11.7. Shipping Without Eval

"Loss went down — it works" before going live. Loss is not eval; measure actual task success with an eval set.

### 11.8. Wrong Base Model Choice

Fine-tuning an English-only model for Turkish tasks. The base model **should already know Turkish**; fine-tuning adapts it to your domain, not teaches it Turkish from scratch.

## 12. Fine-Tuning vs Distillation

**Distillation** — training a small model (student) on the outputs of a large model (teacher). The 2025-2026 most practical fine-tune pattern:

1. Generate synthetic data with a large model (Claude Opus 4.7)
2. SFT the small model (Llama 4 8B) on that data
3. Small model = cheap + fast + 85-90% of the large model's quality

## 13. Modern Fine-Tuning Trends (2026)

- **Synthetic-data dominance** — generation with GPT-5/Claude/Gemini instead of human labeling
- **Distillation everywhere** — knowledge transfer from frontier to small models
- **Self-Reward models** — the model rates its own outputs to create training data
- **Verifier models** — automatic quality control on fine-tune outputs
- **RLAIF (RL from AI Feedback)** — another AI's preferences instead of humans
- **Continual learning** — keeping the model updated without catastrophic forgetting
- **PEFT advances** — DoRA, MoRA, LoftQ; 2024-2025 improvements over LoRA

## 14. KVKK-Compliant Fine-Tuning

### 14.1. Risks

- **Data embeds in the model** — practically impossible to "delete" after fine-tuning
- **Membership inference attacks** — training-set membership can be inferred from outputs
- **Data leakage** — the model sometimes regurgitates training data almost verbatim

### 14.2. Mitigations

1. **Anonymization** — strip PII (national ID, name, phone, email)
2. **Differential privacy** — add noise during training (quality vs privacy trade-off)
3. **Federated learning** — train without centralizing data (advanced)
4. **Data residency** — train on Turkey or EU GPUs
5. **Audit logs** — which data was used in which training

### 14.3. Under the EU AI Act

If the fine-tuned model is **high-risk** (credit scoring, HR selection, etc.):

- Technical documentation (Annex IV)
- Training-data governance
- Risk assessment
- Human oversight
- Conformity assessment

See our compliance guide on this site for details.

## 15. Frequently Asked Questions

<callout-box data-variant="answer" data-title="Fine-tune or RAG?">

**Try RAG first.** Fine-tune only for: (a) style/format/behavior locking, (b) teaching a small model a large model's behavior for low latency, (c) Turkish domain language (law, medicine), (d) guaranteed structured output. For knowledge base + fresh data, RAG is always faster/cheaper.

</callout-box>

<callout-box data-variant="answer" data-title="LoRA, QLoRA, or full FT?">

**QLoRA in 95% of cases.** Only full FT if: (a) working on a frontier model with large GPUs, (b) you genuinely need every bit of quality. LoRA (without quantization) when 16-bit GPU suffices and speed matters.

</callout-box>

<callout-box data-variant="answer" data-title="DPO, ORPO, or KTO?">

**DPO** is the standard enterprise pick. **ORPO** combines SFT + DPO into one stage. **KTO** when producing dual responses is expensive. In 2026, DPO or ORPO covers most needs.

</callout-box>

<callout-box data-variant="answer" data-title="Which base model should I start with?">

For Turkish: **Qwen 2.5 14B** or **Llama 4 8B/70B**. Prefer Apache 2.0/MIT licenses for commercial use. Do not pick without an eval set.

</callout-box>

<callout-box data-variant="answer" data-title="How much data is enough?">

Style alignment: 1,000-3,000 high-quality samples; domain knowledge: 5,000-15,000; behavior change: 10,000+. Quality > quantity, always.

</callout-box>

<callout-box data-variant="answer" data-title="What does fine-tuning cost?">

Typical Turkish range: **$200-$5,000** (model size + data labeling + eval). Synthetic data can cut cost by 60%. Data labeling is usually the most expensive line item.

</callout-box>

<callout-box data-variant="answer" data-title="How do I deploy a fine-tuned model?">

**vLLM** (fastest, production-grade), **TGI** (Hugging Face), **Ollama** (easy self-hosted), **LMDeploy** (TensorRT-LLM-based). LoRA adapters can be merged into the base model or loaded at runtime.

</callout-box>

<callout-box data-variant="answer" data-title="How do I prevent catastrophic forgetting?">

Low learning rate (1e-4 LoRA, 5e-5 SFT), few epochs (1-3), general-benchmark eval during training (MMLU, HumanEval), prefer LoRA (less forgetting than full FT). Mixed batches (new + general data) help.

</callout-box>

<callout-box data-variant="answer" data-title="Should I use OpenAI or Anthropic fine-tuning APIs?">

OpenAI has SFT + limited DPO via API; easy but closed-source (model doesn't leave their servers) + expensive. Anthropic has no public fine-tune API (limited Enterprise). Self-hosted is usually better for KVKK + cost control.

</callout-box>

<callout-box data-variant="answer" data-title="How do I evaluate a fine-tuned model?">

3 layers: **(1)** automated metrics (perplexity, exact match, BLEU/ROUGE for translation/summarization), **(2)** LLM-as-judge (pairwise compare with GPT-5 / Claude Opus 4.7), **(3)** human evaluation (50-200 samples). Combined they give reliable signal.

</callout-box>

<callout-box data-variant="answer" data-title="How safe is synthetic data?">

Synthetic data is widespread and effective in 2026. Risks: **(a)** teacher-model biases transfer, **(b)** diversity may shrink (model collapse). Hybrid recommended: 70% synthetic + 30% human-labeled.

</callout-box>

<callout-box data-variant="answer" data-title="Does the model size grow after fine-tuning?">

LoRA / QLoRA: NO. Adapter is ~50MB-1GB; once merged, base size stays. Full FT: stays at base size (~140GB for Llama 70B).

</callout-box>

<callout-box data-variant="answer" data-title="How do I manage LoRA adapters?">

Version with **Hugging Face Hub** (private repo), **MLflow Model Registry**, **W&B Artifacts**. vLLM and TGI support multi-adapter loading at runtime — swap 10 different LoRAs on one model quickly.

</callout-box>

<callout-box data-variant="answer" data-title="For Turkish, BERTurk or fine-tune an LLM?">

Depends on task: **classic NLP** (classification, NER, sentiment) → BERTurk (small + fast + enough). **Generative tasks** (writing, translation, Q&A) → fine-tune an LLM (Qwen, Llama, Mistral).

</callout-box>

<callout-box data-variant="answer" data-title="Can I automate fine-tuning?">

Yes. **Continuous fine-tuning** pipeline: collect user feedback → monitor eval scores → retrain automatically when below threshold → A/B test → rollout. MLflow + Argo Workflows + Modal/Together is a practical combo.

</callout-box>

## 16. Next Steps

To shape LLM fine-tuning strategy in your company or move an existing fine-tune to production quality:

1. **Fine-Tune Use-Case Assessment.** Is fine-tuning really needed? Is RAG/prompt enough? Investment math + 4-hour workshop.
2. **Data + Pipeline Setup.** Turkish data collection, labeling strategy, training-platform choice, eval harness — end-to-end pipeline design.
3. **Production Fine-Tune Audit.** For existing fine-tunes: 360° audit on quality, KVKK compliance, cost, observability.

Reach out via the contact form.

<references-list data-items="[{&#34;title&#34;:&#34;LoRA: Low-Rank Adaptation of Large Language Models&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2106.09685&#34;,&#34;author&#34;:&#34;Hu et al.&#34;,&#34;publishedAt&#34;:&#34;2021-06&#34;,&#34;publisher&#34;:&#34;Microsoft Research&#34;},{&#34;title&#34;:&#34;QLoRA: Efficient Finetuning of Quantized LLMs&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2305.14314&#34;,&#34;author&#34;:&#34;Dettmers et al.&#34;,&#34;publishedAt&#34;:&#34;2023-05&#34;,&#34;publisher&#34;:&#34;University of Washington&#34;},{&#34;title&#34;:&#34;DPO: Your Language Model is Secretly a Reward Model&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2305.18290&#34;,&#34;author&#34;:&#34;Rafailov et al.&#34;,&#34;publishedAt&#34;:&#34;2023-05&#34;,&#34;publisher&#34;:&#34;Stanford&#34;},{&#34;title&#34;:&#34;ORPO: Monolithic Preference Optimization without Reference Model&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2403.07691&#34;,&#34;author&#34;:&#34;Hong et al.&#34;,&#34;publishedAt&#34;:&#34;2024-03&#34;,&#34;publisher&#34;:&#34;KAIST&#34;},{&#34;title&#34;:&#34;KTO: Model Alignment as Prospect Theoretic Optimization&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2402.01306&#34;,&#34;author&#34;:&#34;Ethayarajh et al.&#34;,&#34;publishedAt&#34;:&#34;2024-02&#34;,&#34;publisher&#34;:&#34;Stanford&#34;},{&#34;title&#34;:&#34;IPO: A General Theoretical Paradigm&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2310.12036&#34;,&#34;author&#34;:&#34;Azar et al.&#34;,&#34;publishedAt&#34;:&#34;2023-10&#34;,&#34;publisher&#34;:&#34;Google DeepMind&#34;},{&#34;title&#34;:&#34;InstructGPT: Training language models with human feedback&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2203.02155&#34;,&#34;author&#34;:&#34;Ouyang et al.&#34;,&#34;publishedAt&#34;:&#34;2022-03&#34;,&#34;publisher&#34;:&#34;OpenAI&#34;},{&#34;title&#34;:&#34;DoRA: Weight-Decomposed Low-Rank Adaptation&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2402.09353&#34;,&#34;author&#34;:&#34;Liu et al.&#34;,&#34;publishedAt&#34;:&#34;2024-02&#34;,&#34;publisher&#34;:&#34;NVIDIA&#34;},{&#34;title&#34;:&#34;Constitutional AI: Harmlessness from AI Feedback&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2212.08073&#34;,&#34;author&#34;:&#34;Bai et al.&#34;,&#34;publishedAt&#34;:&#34;2022-12&#34;,&#34;publisher&#34;:&#34;Anthropic&#34;},{&#34;title&#34;:&#34;Self-Instruct: Aligning Language Models with Self-Generated Instructions&#34;,&#34;url&#34;:&#34;https://arxiv.org/abs/2212.10560&#34;,&#34;author&#34;:&#34;Wang et al.&#34;,&#34;publishedAt&#34;:&#34;2022-12&#34;,&#34;publisher&#34;:&#34;University of Washington&#34;},{&#34;title&#34;:&#34;Unsloth Documentation&#34;,&#34;url&#34;:&#34;https://unsloth.ai/&#34;,&#34;author&#34;:&#34;Unsloth AI&#34;,&#34;publishedAt&#34;:&#34;2025&#34;,&#34;publisher&#34;:&#34;Unsloth&#34;},{&#34;title&#34;:&#34;Hugging Face TRL&#34;,&#34;url&#34;:&#34;https://huggingface.co/docs/trl/&#34;,&#34;author&#34;:&#34;Hugging Face&#34;,&#34;publishedAt&#34;:&#34;2025&#34;,&#34;publisher&#34;:&#34;Hugging Face&#34;},{&#34;title&#34;:&#34;Axolotl&#34;,&#34;url&#34;:&#34;https://github.com/axolotl-ai-cloud/axolotl&#34;,&#34;author&#34;:&#34;Axolotl&#34;,&#34;publishedAt&#34;:&#34;2025&#34;,&#34;publisher&#34;:&#34;Axolotl&#34;},{&#34;title&#34;:&#34;LLaMA Factory&#34;,&#34;url&#34;:&#34;https://github.com/hiyouga/LLaMA-Factory&#34;,&#34;author&#34;:&#34;LLaMA Factory&#34;,&#34;publishedAt&#34;:&#34;2025&#34;,&#34;publisher&#34;:&#34;GitHub&#34;},{&#34;title&#34;:&#34;KVKK - Law No. 6698&#34;,&#34;url&#34;:&#34;https://www.kvkk.gov.tr/&#34;,&#34;author&#34;:&#34;Republic of Turkiye - KVKK&#34;,&#34;publishedAt&#34;:&#34;2016&#34;,&#34;publisher&#34;:&#34;Republic of Turkiye&#34;},{&#34;title&#34;:&#34;EU AI Act&#34;,&#34;url&#34;:&#34;https://artificialintelligenceact.eu/&#34;,&#34;author&#34;:&#34;European Commission&#34;,&#34;publishedAt&#34;:&#34;2024-03&#34;,&#34;publisher&#34;:&#34;EU&#34;}]"></references-list>

---

This is a living document; the fine-tuning ecosystem (new methods, frameworks, base models) shifts every quarter, so it is **updated quarterly**.