Chat Template Anatomi: Jinja, Special Tokens ve Token-by-Token Açılım

Chat template = LLM'in 'konuşma'yı anladığı format. Llama-3, Qwen 2.5, Gemma 3, Mistral, Phi-4 chat template'lerinin token-by-token anatomisi. apply_chat_template'in arka planda ne yaptığı, system/user/assistant role'lerinin token ID'leri, tool-calling extensions, multimodal turn formatları.

Şükrü Yusuf KAYA

30 dakikalık okuma

14.05.2026

İleri

Chat Template Anatomi: Jinja, Special Tokens ve Token-by-Token Açılım

🎯 Bu ders

Çoğu cookbook 'apply_chat_template' deyip geçer. Cookbook bu ders'te her token'ı tek tek açar: niye `<|im_start|>user\n` 4 token, `<|begin_of_text|>` 1 token. Bu ezberden uzak, ergonomik kavrayış.

1. 5 Popüler Model Chat Template#

Llama 3.1/3.2/3.3#

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{assistant_message}<|eot_id|>

Qwen 2.5#

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>

Gemma 3#

<start_of_turn>user
{system_message}\n\n{user_message}<end_of_turn>
<start_of_turn>model
{assistant_message}<end_of_turn>

Not: Gemma 3'te ayrı "system" role yok — system prompt user message'a prepend edilir.

Mistral Small 3#

[INST] {system_message}\n\n{user_message} [/INST] {assistant_message}</s>

Phi-4#

<|im_start|>system<|im_sep|>{system_message}<|im_end|>
<|im_start|>user<|im_sep|>{user_message}<|im_end|>
<|im_start|>assistant<|im_sep|>{assistant_message}<|im_end|>

Önemli: Her modelin kendi special token'ları var. Yanlış template ile FT yaparsan model "konuşma" formatını kaybeder.

python

# === Chat template token-by-token açılım ===
from transformers import AutoTokenizer
 
models = {
    "Llama 3.1": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "Qwen 2.5":  "Qwen/Qwen2.5-7B-Instruct",
    "Gemma 3":   "google/gemma-3-8b-it",
    "Mistral":   "mistralai/Mistral-Small-3-Instruct",
    "Phi-4":     "microsoft/phi-4",
}
 
messages = [
    {"role": "system", "content": "Sen bir Türk yardımcısısın."},
    {"role": "user", "content": "İstanbul'un nüfusu nedir?"},
    {"role": "assistant", "content": "Yaklaşık 15 milyon."},
]
 
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    text = tok.apply_chat_template(messages, tokenize=False)
    ids = tok.apply_chat_template(messages, tokenize=True)
    print(f"\n{'='*50}\n{name}\n{'='*50}")
    print(f"Text:\n{text}")
    print(f"\nToken IDs ({len(ids)} total):")
    for i, tok_id in enumerate(ids):
        tok_str = tok.convert_ids_to_tokens(tok_id)
        marker = " ←" if tok_str in tok.special_tokens_map.values() else ""
        print(f"  [{i:3d}] {tok_id:7d}  '{tok_str}'{marker}")

5 modelin chat template'ini token-by-token görme

2. Tool-Calling Extension (Llama 3.1+, Qwen 2.5)#

Modern modeller chat template'inde tool/function-calling desteği:

messages = [
    {"role": "user", "content": "Bugün Istanbul'da hava nasıl?"},
]
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Şehir için anlık hava durumu",
        "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
    }
}]

text = tok.apply_chat_template(messages, tools=tools, tokenize=False)
# Output (Llama 3.1):
# <|begin_of_text|><|start_header_id|>system<|end_header_id|>
#
# Environment: ipython
# Tools: get_weather
# ...
# Cutting Knowledge Date: ...
# Today Date: ...
#
# # Tool Instructions
# - Always execute python code in messages...
# <|eot_id|><|start_header_id|>user<|end_header_id|>
#
# Bugün Istanbul'da hava nasıl?
# <|eot_id|>

Tool-calling FT için cookbook'un Part XV'inde detay.

🐛 FMD — 'Llama-3'ü Qwen template ile FT ettim, model garbage output'

Hipotez: Yanlış template ile pre-trained model'in 'beklediği' format değişti. Llama-3 `<|eot_id|>` ile turn sonunu öğrenmiş; Qwen `<|im_end|>` ile. `<|im_end|>` Llama-3 vocab'ında bile yok — unk token'a düşer. Çözüm: Her zaman base model'in kendi chat template'i ile FT et. Drill: hangi modelin template'i hangi token ile turn sonunu işaretliyor — tablo çıkar.

✅ Teslim

5 model için chat template token-by-token analiz et. 2) Kendi base model'inin tokenizer.chat_template Jinja string'ini oku. 3) Sonraki ders: 2.5 — Loss Masking Implementation.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

Part 0 — Engineering Foundations

Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı

Öğrenmeye Başla

Part 0 — Engineering Foundations

Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir

Öğrenmeye Başla

Part 0 — Engineering Foundations

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix ve Container Reçeteleri

Öğrenmeye Başla

Bağlantılı Pillar Konular

Bu yazının bağlandığı pillar konular

Pillar Konusu

Prompt ve Bağlam Mühendisliği

Prompt mühendisliği; büyük dil modelinden tutarlı, doğru ve maliyet-verimli çıktı almak için talimatların, örneklerin, bağlamın ve format kontrolünün bilimsel olarak tasarlandığı uygulamalı disiplindir.