Chat Template Anatomy: Jinja, Special Tokens, and Token-by-Token Breakdown

Chat template = the format LLM understands as 'conversation'. Token-by-token anatomy of Llama-3, Qwen 2.5, Gemma 3, Mistral, Phi-4 chat templates. What apply_chat_template does under the hood, token IDs of system/user/assistant roles, tool-calling extensions, multimodal turn formats.

Şükrü Yusuf KAYA

30 min read

5/14/2026

Advanced

Chat Template Anatomi: Jinja, Special Tokens ve Token-by-Token Açılım

🎯 Bu ders

Çoğu cookbook 'apply_chat_template' deyip geçer. Cookbook bu ders'te her token'ı tek tek açar: niye `<|im_start|>user\n` 4 token, `<|begin_of_text|>` 1 token. Bu ezberden uzak, ergonomik kavrayış.

1. 5 Popüler Model Chat Template#

Llama 3.1/3.2/3.3#

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{assistant_message}<|eot_id|>

Qwen 2.5#

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>

Gemma 3#

<start_of_turn>user
{system_message}\n\n{user_message}<end_of_turn>
<start_of_turn>model
{assistant_message}<end_of_turn>

Not: Gemma 3'te ayrı "system" role yok — system prompt user message'a prepend edilir.

Mistral Small 3#

[INST] {system_message}\n\n{user_message} [/INST] {assistant_message}</s>

Phi-4#

<|im_start|>system<|im_sep|>{system_message}<|im_end|>
<|im_start|>user<|im_sep|>{user_message}<|im_end|>
<|im_start|>assistant<|im_sep|>{assistant_message}<|im_end|>

Önemli: Her modelin kendi special token'ları var. Yanlış template ile FT yaparsan model "konuşma" formatını kaybeder.

python

# === Chat template token-by-token açılım ===
from transformers import AutoTokenizer
 
models = {
    "Llama 3.1": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "Qwen 2.5":  "Qwen/Qwen2.5-7B-Instruct",
    "Gemma 3":   "google/gemma-3-8b-it",
    "Mistral":   "mistralai/Mistral-Small-3-Instruct",
    "Phi-4":     "microsoft/phi-4",
}
 
messages = [
    {"role": "system", "content": "Sen bir Türk yardımcısısın."},
    {"role": "user", "content": "İstanbul'un nüfusu nedir?"},
    {"role": "assistant", "content": "Yaklaşık 15 milyon."},
]
 
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    text = tok.apply_chat_template(messages, tokenize=False)
    ids = tok.apply_chat_template(messages, tokenize=True)
    print(f"\n{'='*50}\n{name}\n{'='*50}")
    print(f"Text:\n{text}")
    print(f"\nToken IDs ({len(ids)} total):")
    for i, tok_id in enumerate(ids):
        tok_str = tok.convert_ids_to_tokens(tok_id)
        marker = " ←" if tok_str in tok.special_tokens_map.values() else ""
        print(f"  [{i:3d}] {tok_id:7d}  '{tok_str}'{marker}")

5 modelin chat template'ini token-by-token görme

2. Tool-Calling Extension (Llama 3.1+, Qwen 2.5)#

Modern modeller chat template'inde tool/function-calling desteği:

messages = [
    {"role": "user", "content": "Bugün Istanbul'da hava nasıl?"},
]
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Şehir için anlık hava durumu",
        "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
    }
}]

text = tok.apply_chat_template(messages, tools=tools, tokenize=False)
# Output (Llama 3.1):
# <|begin_of_text|><|start_header_id|>system<|end_header_id|>
#
# Environment: ipython
# Tools: get_weather
# ...
# Cutting Knowledge Date: ...
# Today Date: ...
#
# # Tool Instructions
# - Always execute python code in messages...
# <|eot_id|><|start_header_id|>user<|end_header_id|>
#
# Bugün Istanbul'da hava nasıl?
# <|eot_id|>

Tool-calling FT için cookbook'un Part XV'inde detay.

🐛 FMD — 'Llama-3'ü Qwen template ile FT ettim, model garbage output'

Hipotez: Yanlış template ile pre-trained model'in 'beklediği' format değişti. Llama-3 `<|eot_id|>` ile turn sonunu öğrenmiş; Qwen `<|im_end|>` ile. `<|im_end|>` Llama-3 vocab'ında bile yok — unk token'a düşer. Çözüm: Her zaman base model'in kendi chat template'i ile FT et. Drill: hangi modelin template'i hangi token ile turn sonunu işaretliyor — tablo çıkar.

✅ Teslim

5 model için chat template token-by-token analiz et. 2) Kendi base model'inin tokenizer.chat_template Jinja string'ini oku. 3) Sonraki ders: 2.5 — Loss Masking Implementation.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Pillar topics this article maps to

Pillar Topic

Prompt and Context Engineering

Prompt engineering is the applied discipline of designing instructions, examples, context and output controls so that an LLM produces consistent, accurate and cost-efficient outputs.