Skip to content

About this training

A 3-day advanced Turkish training that covers end to end the Continued Pretraining + Domain Adaptation discipline for those wishing to train a Turkish LLM (Cosmos Llama, Trendyol AI, KUIS-AI, Aya Expanse) or produce a custom LLM for legal/healthcare/finance/code domains. Includes catastrophic-forgetting mitigation, vocabulary expansion, YaRN long-context extension, DoReMi/RegMix data mixing, LoRA/DoRA/QLoRA/GaLore efficient CPT, and domain-benchmark production.

This training is designed for: ML Engineers and AI Researchers who want to train a Turkish LLM (Cosmos / Trendyol AI / KUIS-AI / Aya Expanse style) Enterprise AI teams who want to produce a custom LLM for legal / healthcare / finance / code domains Startup technical leaders who want to build domain-specific LLM architectures like BloombergGPT, Med-PaLM, Harvey AI ML Platform engineers who want to adapt Llama 3.3 / Qwen3 / Gemma 3 / Mistral base to their sector University research groups that want to lead on Turkish LLM benchmarks Data Engineers who need to perform long-context (128K-1M) extension + KVKK-compliant Turkish deployment

Why this course matters: The only program in Turkey that covers the CPT + Domain Adaptation discipline end to end with math + data + mitigation + eval. Comparatively analyzes Cosmos Llama, Trendyol AI, KUIS-AI, Aya Expanse from the CPT methodology perspective. Makes BloombergGPT, Med-PaLM, Harvey AI-style domain-specific LLM recipes Turkish + KVKK-compliant. Covers 2024-2026 frontier techniques like vocabulary expansion + YaRN long-context + DoReMi data mixing. Instills compute-optimal CPT-selection discipline via LoRA / DoRA / QLoRA / GaLore comparison. Deeply covers catastrophic forgetting with Fisher Information Matrix + EWC + replay buffer. Through the capstone project, equips the participant with a CPT pipeline + cost analysis + roadmap applicable in their own domain. Together with RLHF + Reasoning Models + Mech Interp + CPT, completes a four-training frontier set covering the alignment + reasoning + interpretability + knowledge injection ecosystem.

Learning outcomes by the end of the programme: Apply the CPT vs SFT vs RAG decision matrix at enterprise scale. Build a FineWeb-style data pipeline for Turkish + domain. Evidence-based-select catastrophic-forgetting mitigation recipes. Double Turkish efficiency via vocabulary expansion + tokenizer adaptation. Train a Turkish LLM at the Cosmos / Trendyol AI / KUIS-AI / Aya Expanse level. Build a CPT pipeline in the legal, healthcare, finance, or code domain. Make compute-optimal choices among LoRA, DoRA, QLoRA, GaLore. Perform 128K-1M long-context extension with YaRN. Estimate optimal data mix with DoReMi/RegMix. Build a 4-dimensional (domain gain + forgetting + long-context + production) post-CPT eval framework.

Prerequisites and recommended background: Active Python experience (intermediate to advanced), basic use of PyTorch and HuggingFace Transformers Basic experience with LLM fine-tuning (at least conceptual familiarity with SFT, LoRA) Foundational ML math: linear algebra, probability, gradient descent Basic knowledge of transformer architecture (attention, residual stream, RoPE) GPU access: H100 (80GB) or 2-4x A100 recommended for the capstone HuggingFace + Weights & Biases account before the training

  • The only advanced program in Turkey that covers Turkish LLM (Cosmos, Trendyol AI, KUIS-AI, Aya Expanse) and domain-specific LLM CPT end to end
  • Production-grade data engineering with FineWeb pipeline + Turkish corpus + KVKK-compliant PII detection
  • Mathematical construction of catastrophic forgetting + EWC + replay buffer + LoRA-CPT + model souping mitigation
  • 2x token efficiency for Turkish via vocabulary expansion + tokenizer adaptation (FOCUS, Aya Expanse approach)
  • CPT recipes for legal (Yargıtay/Danıştay), healthcare (DSM-5-TR), finance (TCMB/BIST), code (DeepSeek-Coder) domains
  • 128K-1M long-context extension + RoPE scaling techniques with YaRN
  • DoReMi + RegMix data mixing + Llama 3.1/Qwen3 cooldown/annealing recipe
  • Compute-optimal-selection discipline via Full FT, LoRA, DoRA, QLoRA, GaLore comparison

Key Takeaways

  1. Apply the CPT vs SFT vs RAG decision matrix at enterprise scale.
  2. Build a FineWeb-style data pipeline for Turkish + domain.
  3. Evidence-based-select catastrophic-forgetting mitigation recipes.
  4. Double Turkish efficiency via vocabulary expansion + tokenizer adaptation.
  5. Train a Turkish LLM at the Cosmos / Trendyol AI / KUIS-AI / Aya Expanse level.
  6. Build a CPT pipeline in the legal, healthcare, finance, or code domain.
  7. Make compute-optimal choices among LoRA, DoRA, QLoRA, GaLore.
  8. Perform 128K-1M long-context extension with YaRN.
  9. Estimate optimal data mix with DoReMi/RegMix.
  10. Build a 4-dimensional (domain gain + forgetting + long-context + production) post-CPT eval framework.
Hero Background
Advanced Level3 Gün

LLM Continued Pretraining and Domain Adaptation Engineering Training (Turkish LLM + Legal/Healthcare/Finance Domain)

A 3-day advanced Turkish training that covers end to end the Continued Pretraining + Domain Adaptation discipline for those wishing to train a Turkish LLM (Cosmos Llama, Trendyol AI, KUIS-AI, Aya Expanse) or produce a custom LLM for legal/healthcare/finance/code domains. Includes catastrophic-forgetting mitigation, vocabulary expansion, YaRN long-context extension, DoReMi/RegMix data mixing, LoRA/DoRA/QLoRA/GaLore efficient CPT, and domain-benchmark production.

About This Course

This training is a 3-day advanced Continued Pretraining (CPT) program designed end to end for ML Engineers, AI Researchers, Data Engineers, and ML Platform engineers who want to adapt open-source base LLMs (Llama 3.3, Qwen3, Gemma 3, Mistral) to the Turkish language or to domains like legal, healthcare, finance, and code. In Turkey, projects like Cosmos / Trendyol AI / KUIS-AI that want to train Turkish LLMs are growing rapidly; similarly, law firms need Harvey AI-style case-law reasoning; healthcare institutions need Med-PaLM-style medical expertise; finance companies need BloombergGPT-style sectoral intelligence in custom LLMs. However, a Turkish-language training that covers this discipline end to end with math + data pipeline + mitigation + eval is virtually nonexistent — existing content either stays at the level of academic-paper summaries or remains shallow at example-copy script level. This program is designed to fill that gap as Turkey's most comprehensive production-grade CPT reference training.



The strategic backbone of the program is the first module, which clearly frames the place of the Continued Pretraining discipline in the pre-training → CPT → SFT → DPO/RLHF → deployment flow and its difference from SFT / RLHF / RAG. An evidence-based decision matrix is provided: knowledge injection (teaching new knowledge, learning a new language, static domain knowledge) → CPT optimal; behavior shaping (response style, formatting, instruction following) → SFT sufficient; dynamic / frequently updated knowledge → RAG mandatory; very high-volume + static domain knowledge → CPT + RAG hybrid. Production case studies — BloombergGPT (50B-token finance CPT), Med-PaLM (healthcare), Code Llama (code), Cosmos Llama / Trendyol AI / KUIS-AI / Aya Expanse (Turkish), DeepSeek-Math / Qwen3-Math / Llemma (math) — are analyzed from a strategic perspective.



The second module is dedicated to the data-engineering discipline that determines 70% of CPT success. HuggingFace FineWeb (15T tokens) and FineWeb-Edu methodology; Common Crawl WARC processing and HTML cleaning with Trafilatura; comparison of Cosmopedia, RefinedWeb, RedPajama, DOLMA datasets; for Turkish: Turkish FineWeb, mC4-tr, OSCAR-tr, Wikipedia-tr, Boğaziçi/İTÜ/KUIS open corpus sources; deduplication strategies (exact hash, MinHash LSH fuzzy dedup, embedding-based semantic dedup); quality filtering (Gopher rules + Cosmopedia + fastText classifier); KVKK-compliant PII detection (Turkish TC kimlik, IBAN, phone-number detection); toxicity and contamination detection — every stage is hands-on. The practical recipe for producing 100B-500B tokens from Turkish raw data is provided.



The third module analyzes mathematically the fundamental challenge of CPT — the catastrophic-forgetting problem. From the loss-landscape perspective: drift from pre-train minimum to domain minimum, identification of important parameters with the Fisher Information Matrix, and the plasticity-stability dilemma are covered in detail. Classical mitigation: replay buffer (mix ratio of domain data + pre-training data — practical recommendation 5-20% pre-training mix), EWC (Kirkpatrick 2017 Fisher-weighted L2 regularization), layer-wise learning rate, embedding freeze. Modern approaches: LoRA-based CPT (a small adapter prevents catastrophic forgetting but capacity is limited), model souping / weight averaging (Wortsman 2022), Branch-Train-Merge (BTM, Li 2022) and domain-expert routing — each is evidence-based compared on trade-offs.



The fourth module addresses vocabulary expansion and tokenizer adaptation techniques — especially critical for Turkish. Turkish fertility analysis of Llama 3, Qwen3, Gemma 3 tokenizers (measurement of how many tokens a Turkish word splits into on average; 1.0-1.3 in English, 1.8-2.5 in Turkish — this doubles cost and latency). Mean initialization (new token embedding as the average of existing tokens), FOCUS (Dobler 2023 semantic-aware initialization), and the Aya Expanse 2024 approach (23-language multilingual expansion + frozen base) are covered in detail. Training Turkish + domain tokenizers with SentencePiece, merging and extending with the Hugging Face Tokenizers library, and the impact of tokenizer change on embedding + lm_head are shown practically. The vocabulary expansion vs no-expansion CPT trade-off is evidence-based decided.



The fifth module comparatively analyzes Turkey's four prominent open-source Turkish LLM projects from the CPT methodology perspective. Cosmos Llama 3.3 / 3.1 series (base, CPT data, SFT, instruct variants); Trendyol AI Llama 3 8B / 70B (Trendyol dataset + domain adaptation); KUIS-AI Turkish-Llama (Koç University contributions); Cohere Aya Expanse 8B / 32B (23-language multilingual CPT approach). For each, the base-model selection, CPT data strategy, vocabulary-expansion decision, training compute, and eval results are analyzed in detail. Comparison on Turkish MMLU, MMLU-Pro-tr, Belebele-tr, TruthfulQA-tr, Hellaswag-tr, ARC-tr benchmarks and Open LLM Leaderboard Turkish ranking analysis is performed. Boğaziçi, METU, İTÜ Turkish LLM research is also examined.



The sixth module provides CPT recipes for the four most-demanded domains in Turkey. Legal domain: Turkish case-law (Yargıtay, Danıştay, Constitutional Court decisions), Legislation (laws, regulations), Official Gazette archive CPT pipeline; Harvey AI approach (legal exam + risk assessment); KVKK-compliant data collection. Healthcare domain: DSM-5-TR + medical guidelines + patient records (anonymized) CPT; Med-PaLM (Google 2023) and Med-PaLM 2 approach; HIPAA + KVKK biomedical compliance. Finance domain: replication of the BloombergGPT (50B-token finance) recipe; Turkish finance CPT with TCMB reports + KAP disclosures + BIST data + Turkish balance-sheet corpus. Code domain: comparison of Code Llama, DeepSeek-Coder V3, Qwen2.5-Coder recipes. For each domain, benchmark production (bar-exam simulation, USMLE-tr, FinanceBench-tr, HumanEval-tr, MBPP-tr, BigCodeBench-tr) and sector-regulation-compliant deployment discipline are provided.



The seventh module deeply addresses the parameter-efficient + memory-efficient approaches that determine compute efficiency in production CPT. Full fine-tuning, LoRA (Hu 2021 low-rank decomposition W = W_0 + B·A formulation), DoRA (Liu 2024 magnitude + direction separation), QLoRA (Dettmers 2023 4-bit NF4 quantization + LoRA), ReFT (representation fine-tuning), and GaLore (Zhao 2024 memory-efficient full pre-training via gradient low-rank projection) approaches are evidence-based compared. LoRA capacity limitations for CPT — at what rank LoRA is sufficient for knowledge injection, in which scenario full FT is mandatory — are taught with a practical cookbook. 30B+ model FT on a single H100 with DeepSpeed ZeRO-3 + offload, and FSDP2 (PyTorch 2.x) + activation-checkpointing CPT are shown practically.



The eighth module addresses techniques for extending a base model's context window via CPT. RoPE (Rotary Position Embeddings) is built at the mathematical level (rotation matrix per dimension); Linear interpolation, NTK-aware scaling, Dynamic NTK, YaRN (Yet another RoPE extensioN, Peng 2023 — attention scaling correction), Position Interpolation (Chen 2023), and LongRoPE (Microsoft 2024) are comparatively covered. The Llama 3.1 128K extension recipe (Meta 2024), the Gemini 2.5 Pro 1M-10M context production approach, and the Mistral interleaved sliding-window attention are analyzed with practical examples. Curriculum: 4K → 16K → 64K → 1M token progressive extension strategy; needle-in-a-haystack and multi-needle eval; NVIDIA RULER benchmark (retrieval + reasoning long-context); LongBench, InfiniteBench for real-world long-context eval are taught.



The ninth module is dedicated to the discipline of how much data to use from which domain in CPT (domain mixing ratios) — a first-order determinant of final model quality. DoReMi (Xie 2023 — domain reweighting via worst-domain minimax optimization), RegMix (Liu 2024 — regression-based mix prediction with small-scale proxy), and DataMix approaches are covered at the mathematical level. In Turkish CPT, the Turkish vs English ratio decision matrix (recommended starting at 70/30 → cooldown at 50/50), the recipe for preventing catastrophic forgetting via a domain + general data mix, and the DeepSeek-Coder recipe in the code + math + general triangle are shown practically. Curriculum learning (easy → hard data ordering), and final-stage high-quality data injection in the Llama 3.1 and Qwen3 cooldown/annealing stages for MMLU boost strategies are addressed.



The tenth module is dedicated to the engineering side of CPT. Learning-rate selection (basic principle: 1/10 → 1/100 of pre-training LR); warmup steps; comparison of cosine decay vs constant LR vs WSD (Warmup-Stable-Decay) schedules; max LR, min LR tuning cookbook; batch-size scaling (global batch size 1M-4M tokens), gradient accumulation, mixed precision (bf16, fp8 — Blackwell B200/GB200); the DeepSpeed ZeRO-3 vs FSDP2 vs Megatron-LM distributed-setup decision matrix; mix of TP (tensor parallel) + PP (pipeline parallel) + DP; training-run monitoring (loss curves, gradient norm, weight stats); loss spikes and divergence-recovery strategies; checkpoint frequency, async checkpointing, and eval-on-checkpoint pipeline are covered in detail.



The eleventh module addresses the four-dimensional post-CPT evaluation discipline. (1) Domain gain: Turkish MMLU, MMLU-Pro-tr, Belebele-tr, ARC-tr; producing domain-specific benchmarks (Turkish bar-exam simulation, FinanceBench-tr, USMLE-tr); chat-ability eval with MT-Bench Turkish and AlpacaEval Turkish. (2) Catastrophic forgetting: regression tests on general MMLU, HellaSwag, ARC, TruthfulQA; regression on code benchmarks (HumanEval, MBPP). (3) Long-context regression: RULER, needle-in-a-haystack, LongBench. (4) Production eval: production comparison of base model vs CPT model via A/B testing, online eval with user feedback (thumbs up/down), business metrics (conversion, satisfaction, task completion rate). All reporting formats are tied to enterprise compliance discipline.



In the capstone module, each participant designs an end-to-end CPT pipeline tailored to their own scenario: scenario selection (Turkish LLM, legal, healthcare, finance, code, or the participant's own domain), base-model selection (Llama 3.3, Qwen3, Gemma 3, Mistral, DeepSeek base), Turkish and/or domain data collection (50B-200B tokens), vocabulary-expansion decision, mitigation strategy (replay ratio + LoRA / full FT / hybrid), training stack (TRL + Axolotl or OpenRLHF + DeepSpeed), compute budget (single H100, 8x H100, multi-node planning), eval framework (4 dimensions), 90-day production deployment roadmap (including post-CPT SFT + DPO + RAG integration). By the end of the training, participants reach a level of technical competence to apply the CPT vs SFT vs RAG decision matrix at enterprise scale; build a FineWeb-style data pipeline for Turkish + domain; evidence-based-select catastrophic-forgetting mitigation recipes; double Turkish efficiency via vocabulary expansion + tokenizer adaptation; perform 128K-1M long-context extension with YaRN; estimate optimal data mix with DoReMi/RegMix; make compute-optimal choices among LoRA/DoRA/QLoRA/GaLore; and build production-grade CPT pipelines at Cosmos / Trendyol AI / Aya Expanse / BloombergGPT level. The training consists of 3 days, 12 modules, and over 100 hands-on lessons.

Training Methodology

The only advanced program in Turkey that covers Turkish LLM (Cosmos, Trendyol AI, KUIS-AI, Aya Expanse) and domain-specific LLM CPT end to end

Production-grade data engineering with FineWeb pipeline + Turkish corpus + KVKK-compliant PII detection

Mathematical construction of catastrophic forgetting + EWC + replay buffer + LoRA-CPT + model souping mitigation

2x token efficiency for Turkish via vocabulary expansion + tokenizer adaptation (FOCUS, Aya Expanse approach)

CPT recipes for legal (Yargıtay/Danıştay), healthcare (DSM-5-TR), finance (TCMB/BIST), code (DeepSeek-Coder) domains

128K-1M long-context extension + RoPE scaling techniques with YaRN

DoReMi + RegMix data mixing + Llama 3.1/Qwen3 cooldown/annealing recipe

Compute-optimal-selection discipline via Full FT, LoRA, DoRA, QLoRA, GaLore comparison

Who Is This For?

ML Engineers and AI Researchers who want to train a Turkish LLM (Cosmos / Trendyol AI / KUIS-AI / Aya Expanse style)
Enterprise AI teams who want to produce a custom LLM for legal / healthcare / finance / code domains
Startup technical leaders who want to build domain-specific LLM architectures like BloombergGPT, Med-PaLM, Harvey AI
ML Platform engineers who want to adapt Llama 3.3 / Qwen3 / Gemma 3 / Mistral base to their sector
University research groups that want to lead on Turkish LLM benchmarks
Data Engineers who need to perform long-context (128K-1M) extension + KVKK-compliant Turkish deployment

Why This Course?

1

The only program in Turkey that covers the CPT + Domain Adaptation discipline end to end with math + data + mitigation + eval.

2

Comparatively analyzes Cosmos Llama, Trendyol AI, KUIS-AI, Aya Expanse from the CPT methodology perspective.

3

Makes BloombergGPT, Med-PaLM, Harvey AI-style domain-specific LLM recipes Turkish + KVKK-compliant.

4

Covers 2024-2026 frontier techniques like vocabulary expansion + YaRN long-context + DoReMi data mixing.

5

Instills compute-optimal CPT-selection discipline via LoRA / DoRA / QLoRA / GaLore comparison.

6

Deeply covers catastrophic forgetting with Fisher Information Matrix + EWC + replay buffer.

7

Through the capstone project, equips the participant with a CPT pipeline + cost analysis + roadmap applicable in their own domain.

8

Together with RLHF + Reasoning Models + Mech Interp + CPT, completes a four-training frontier set covering the alignment + reasoning + interpretability + knowledge injection ecosystem.

Learning Outcomes

Apply the CPT vs SFT vs RAG decision matrix at enterprise scale.
Build a FineWeb-style data pipeline for Turkish + domain.
Evidence-based-select catastrophic-forgetting mitigation recipes.
Double Turkish efficiency via vocabulary expansion + tokenizer adaptation.
Train a Turkish LLM at the Cosmos / Trendyol AI / KUIS-AI / Aya Expanse level.
Build a CPT pipeline in the legal, healthcare, finance, or code domain.
Make compute-optimal choices among LoRA, DoRA, QLoRA, GaLore.
Perform 128K-1M long-context extension with YaRN.
Estimate optimal data mix with DoReMi/RegMix.
Build a 4-dimensional (domain gain + forgetting + long-context + production) post-CPT eval framework.

Requirements

Active Python experience (intermediate to advanced), basic use of PyTorch and HuggingFace Transformers
Basic experience with LLM fine-tuning (at least conceptual familiarity with SFT, LoRA)
Foundational ML math: linear algebra, probability, gradient descent
Basic knowledge of transformer architecture (attention, residual stream, RoPE)
GPU access: H100 (80GB) or 2-4x A100 recommended for the capstone
HuggingFace + Weights & Biases account before the training

Course Curriculum

105 Lessons
01
Module 1: Strategic Introduction to the Continued Pretraining Discipline — CPT vs SFT vs RAG10 Lessons
02
Module 2: Data Engineering for Continued Pretraining — FineWeb, Turkish Corpora, and Quality Filtering9 Lessons
03
Module 3: The Catastrophic Forgetting Problem and Mitigation Strategies9 Lessons
04
Module 4: Vocabulary Expansion and Tokenizer Adaptation — Token Efficiency for Turkish9 Lessons
05
Module 5: Turkish LLM CPT Case Studies — Cosmos, Trendyol AI, KUIS-AI, Aya Expanse9 Lessons
06
Module 6: Domain-Specific CPT — Legal, Healthcare, Finance, and Code Domains9 Lessons
07
Module 7: Efficient CPT — Full FT, LoRA, DoRA, and QLoRA Comparison9 Lessons
08
Module 8: Long-Context Extension — RoPE Scaling, YaRN, and 1M-10M Token CPT9 Lessons
09
Module 9: Data Mixing Strategies — DoReMi, RegMix, and Optimal Domain Mix9 Lessons
10
Module 10: Training Engineering — LR Schedule, Warmup, Hyperparams, and Distributed Setup9 Lessons
11
Module 11: Post-CPT Evaluation — Domain Benchmark, Forgetting Tests, and Production Eval9 Lessons
12
Module 12: Capstone — Producing a Turkish or Domain-Specific LLM5 Lessons

Instructor

Şükrü Yusuf KAYA

Şükrü Yusuf KAYA

AI Architect | Enterprise AI & LLM Training | Stanford University | Software & Technology Consultant

Şükrü Yusuf KAYA is an internationally experienced AI Consultant and Technology Strategist leading the integration of artificial intelligence technologies into the global business landscape. With operations spanning 6 different countries, he bridges the gap between the theoretical boundaries of technology and practical business needs, overseeing end-to-end AI projects in data-critical sectors such as banking, e-commerce, retail, and logistics. Deepening his technical expertise particularly in Generative AI and Large Language Models (LLMs), KAYA ensures that organizations build architectures that shape the future rather than relying on short-term solutions. His visionary approach to transforming complex algorithms and advanced systems into tangible business value aligned with corporate growth targets has positioned him as a sought-after solution partner in the industry. Distinguished by his role as an instructor alongside his consulting and project management career, Şükrü Yusuf KAYA is driven by the motto of "Making AI accessible and applicable for everyone." Through comprehensive training programs designed for a wide spectrum of professionals—from technical teams to C-level executives—he prioritizes increasing organizational AI literacy and establishing a sustainable culture of technological transformation.

Frequently Asked Questions