Skip to content

Capstone Module 14: Production Turkish Llama-3 8B Fine-Tune — QLoRA + SFT End-to-End

Module 14 capstone: Llama-3-8B base + Turkish SFT + QLoRA = production-quality Turkish Llama-3-Instruct. Dataset curation (50K Turkish instructions), QLoRA training (single H100 8 hours), evaluation (MT-Bench-TR), HuggingFace Hub publish, vLLM inference deployment.

Şükrü Yusuf KAYA
85 min read
Advanced
Capstone Modül 14: Türkçe Llama-3 8B Production Fine-Tune — QLoRA + SFT End-to-End
🏆 Capstone — kendi Türkçe ChatGPT'ini kur
Modül 14 capstone: production-grade Türkçe Llama-3-8B-Instruct. Llama-3-8B base + Türkçe instruction dataset + QLoRA + SFT pipeline. 50K Türkçe örnek, single H100, 8 saat training, $50 cost. Sonuç: ChatGPT-comparable Türkçe quality, HuggingFace Hub'da public, vLLM ile serving-ready. Bu, müfredatın fourth real-world artifact'i — TurkTokenizer-tr (Modül 6), Türkçe Semantic Search (Modül 7), Mini Llama-3 Pre-train (Modül 11) ile birlikte. 85 dakika sonra: kendi Türkçe production LLM'inin sahibisin.

Capstone Akışı (8 Aşama)#

  1. Hedef tanımı — TR-Llama-3-8B-Instruct, quality target
  2. Dataset curation — 50K Türkçe instruction (Alpaca-TR + OASST + manual)
  3. QLoRA config — 4-bit base, r=16 adapter
  4. SFT training — single H100, 8 saat, $50
  5. Evaluation — MT-Bench-TR, qualitative samples
  6. LoRA merge — production-ready model
  7. HuggingFace Hub publish — sukruyusufkaya/llama-3-8b-tr-instruct
  8. vLLM serving — production deployment
python
# Türkçe Llama-3-8B Fine-Tune — Production-Grade Capstone Script
import torch
from transformers import (
AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig, TrainingArguments,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset, concatenate_datasets
from trl import SFTTrainer
 
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
OUTPUT = "./llama-3-8b-tr-instruct"
 
# 1. Quantization (QLoRA)
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
 
# 2. Load model + tokenizer
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
quantization_config=bnb,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
 
model = prepare_model_for_kbit_training(model)
 
# 3. LoRA config
lora = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora)
model.print_trainable_parameters()
# trainable params: ~40M | all params: 8B | trainable%: 0.5
 
# 4. Dataset curation (Türkçe instruction mix)
alpaca_tr = load_dataset("merve/turkish_instructions", split="train")
oasst_tr = load_dataset("OpenAssistant/oasst1", split="train").filter(lambda x: x["lang"] == "tr")
manual_tr = load_dataset("sukruyusufkaya/turkish-manual-50k", split="train") # custom
 
full_dataset = concatenate_datasets([alpaca_tr, oasst_tr, manual_tr])
print(f"Total examples: {len(full_dataset):,}") # ~50K
 
def format_chat(example):
messages = [
{"role": "user", "content": example["instruction"]},
{"role": "assistant", "content": example["response"]},
]
return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}
 
full_dataset = full_dataset.map(format_chat)
 
# 5. Training arguments
training_args = TrainingArguments(
output_dir=OUTPUT,
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
warmup_steps=100,
lr_scheduler_type="cosine",
weight_decay=0.0,
bf16=True,
logging_steps=10,
save_steps=500,
save_total_limit=3,
optim="paged_adamw_8bit",
report_to="wandb",
run_name="llama-3-8b-tr-qlora",
)
 
# 6. SFT trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=full_dataset,
args=training_args,
max_seq_length=2048,
dataset_text_field="text",
packing=True,
)
 
# 7. Train (~8 hours single H100)
trainer.train()
trainer.save_model(OUTPUT + "/final-adapter")
 
# 8. Merge LoRA into base
merged = model.merge_and_unload()
merged.save_pretrained(OUTPUT + "/merged")
tokenizer.save_pretrained(OUTPUT + "/merged")
 
# 9. Push to HuggingFace Hub
merged.push_to_hub("sukruyusufkaya/llama-3-8b-tr-instruct")
tokenizer.push_to_hub("sukruyusufkaya/llama-3-8b-tr-instruct")
 
print("Done! Türkçe Llama-3-8B-Instruct ready on HF Hub.")
Türkçe Llama-3-8B Fine-Tune — Full Capstone Pipeline
🎉 Modül 14 Tamamlandı — Fine-tuning
3 ders boyunca: SFT (base → instruct), LoRA (Hu 2021 PEFT), QLoRA (Dettmers 2023 4-bit + LoRA), capstone Türkçe Llama-3-Instruct. Modül 14 envanteri: 3 ders, 235 dk. Genel müfredat: 15 modül, 83 ders, ~79 saat. Sıradaki: Modül 15 — RLHF + DPO (Ouyang 2022 InstructGPT, Rafailov 2023 DPO).

Modül 14 Envanteri (Tamamlandı)#

#DersSüre
14.1SFT — Base'den Instruct'a75 dk
14.2LoRA + QLoRA PEFT75 dk
14.3Capstone Türkçe Llama-3 Fine-Tune85 dk
Toplam3 ders235 dk (~3.9 saat)

Frequently Asked Questions

Quality 'comparable to OpenChat or smaller Turkish models'. ChatGPT-3.5 quality reachable with careful dataset. NOT ChatGPT-4 quality — that needs reasoning model from scratch (Module 17+ topic).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content