Capstone Module 14: Production Turkish Llama-3 8B Fine-Tune — QLoRA + SFT End-to-End
Module 14 capstone: Llama-3-8B base + Turkish SFT + QLoRA = production-quality Turkish Llama-3-Instruct. Dataset curation (50K Turkish instructions), QLoRA training (single H100 8 hours), evaluation (MT-Bench-TR), HuggingFace Hub publish, vLLM inference deployment.
Şükrü Yusuf KAYA
85 min read
Advanced🏆 Capstone — kendi Türkçe ChatGPT'ini kur
Modül 14 capstone: production-grade Türkçe Llama-3-8B-Instruct. Llama-3-8B base + Türkçe instruction dataset + QLoRA + SFT pipeline. 50K Türkçe örnek, single H100, 8 saat training, $50 cost. Sonuç: ChatGPT-comparable Türkçe quality, HuggingFace Hub'da public, vLLM ile serving-ready. Bu, müfredatın fourth real-world artifact'i — TurkTokenizer-tr (Modül 6), Türkçe Semantic Search (Modül 7), Mini Llama-3 Pre-train (Modül 11) ile birlikte. 85 dakika sonra: kendi Türkçe production LLM'inin sahibisin.
Capstone Akışı (8 Aşama)#
- Hedef tanımı — TR-Llama-3-8B-Instruct, quality target
- Dataset curation — 50K Türkçe instruction (Alpaca-TR + OASST + manual)
- QLoRA config — 4-bit base, r=16 adapter
- SFT training — single H100, 8 saat, $50
- Evaluation — MT-Bench-TR, qualitative samples
- LoRA merge — production-ready model
- HuggingFace Hub publish — sukruyusufkaya/llama-3-8b-tr-instruct
- vLLM serving — production deployment
python
# Türkçe Llama-3-8B Fine-Tune — Production-Grade Capstone Scriptimport torchfrom transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments,)from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_trainingfrom datasets import load_dataset, concatenate_datasetsfrom trl import SFTTrainer MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"OUTPUT = "./llama-3-8b-tr-instruct" # 1. Quantization (QLoRA)bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True,) # 2. Load model + tokenizermodel = AutoModelForCausalLM.from_pretrained( MODEL_ID, quantization_config=bnb, device_map="auto", torch_dtype=torch.bfloat16,)tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)tokenizer.pad_token = tokenizer.eos_token model = prepare_model_for_kbit_training(model) # 3. LoRA configlora = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM",)model = get_peft_model(model, lora)model.print_trainable_parameters()# trainable params: ~40M | all params: 8B | trainable%: 0.5 # 4. Dataset curation (Türkçe instruction mix)alpaca_tr = load_dataset("merve/turkish_instructions", split="train")oasst_tr = load_dataset("OpenAssistant/oasst1", split="train").filter(lambda x: x["lang"] == "tr")manual_tr = load_dataset("sukruyusufkaya/turkish-manual-50k", split="train") # custom full_dataset = concatenate_datasets([alpaca_tr, oasst_tr, manual_tr])print(f"Total examples: {len(full_dataset):,}") # ~50K def format_chat(example): messages = [ {"role": "user", "content": example["instruction"]}, {"role": "assistant", "content": example["response"]}, ] return {"text": tokenizer.apply_chat_template(messages, tokenize=False)} full_dataset = full_dataset.map(format_chat) # 5. Training argumentstraining_args = TrainingArguments( output_dir=OUTPUT, num_train_epochs=3, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=2e-5, warmup_steps=100, lr_scheduler_type="cosine", weight_decay=0.0, bf16=True, logging_steps=10, save_steps=500, save_total_limit=3, optim="paged_adamw_8bit", report_to="wandb", run_name="llama-3-8b-tr-qlora",) # 6. SFT trainertrainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=full_dataset, args=training_args, max_seq_length=2048, dataset_text_field="text", packing=True,) # 7. Train (~8 hours single H100)trainer.train()trainer.save_model(OUTPUT + "/final-adapter") # 8. Merge LoRA into basemerged = model.merge_and_unload()merged.save_pretrained(OUTPUT + "/merged")tokenizer.save_pretrained(OUTPUT + "/merged") # 9. Push to HuggingFace Hubmerged.push_to_hub("sukruyusufkaya/llama-3-8b-tr-instruct")tokenizer.push_to_hub("sukruyusufkaya/llama-3-8b-tr-instruct") print("Done! Türkçe Llama-3-8B-Instruct ready on HF Hub.")Türkçe Llama-3-8B Fine-Tune — Full Capstone Pipeline
🎉 Modül 14 Tamamlandı — Fine-tuning
3 ders boyunca: SFT (base → instruct), LoRA (Hu 2021 PEFT), QLoRA (Dettmers 2023 4-bit + LoRA), capstone Türkçe Llama-3-Instruct. Modül 14 envanteri: 3 ders, 235 dk. Genel müfredat: 15 modül, 83 ders, ~79 saat. Sıradaki: Modül 15 — RLHF + DPO (Ouyang 2022 InstructGPT, Rafailov 2023 DPO).
Modül 14 Envanteri (Tamamlandı)#
| # | Ders | Süre |
|---|---|---|
| 14.1 | SFT — Base'den Instruct'a | 75 dk |
| 14.2 | LoRA + QLoRA PEFT | 75 dk |
| 14.3 | Capstone Türkçe Llama-3 Fine-Tune | 85 dk |
| Toplam | 3 ders | 235 dk (~3.9 saat) |
Frequently Asked Questions
Quality 'comparable to OpenChat or smaller Turkish models'. ChatGPT-3.5 quality reachable with careful dataset. NOT ChatGPT-4 quality — that needs reasoning model from scratch (Module 17+ topic).
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Module 0: Course Framework & Workshop Setup
Who Is an LLM Engineer? The AI Engineering Career Ladder from Junior to Staff
Start LearningModule 0: Course Framework & Workshop Setup
Course Philosophy: Why This Path, Why This Order — The Skeleton of an 8-Month Curriculum
Start LearningModule 0: Course Framework & Workshop Setup