# Vocabulary Extension: Add 8K TR Tokens to Llama-3 Tokenizer (Embedding Init Strategies)

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-vocabulary-extension-llama3-tr-8k
> Updated: 2026-05-14T14:42:50.274Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part II — Tokenizer & Data Engineering
**TLDR:** Llama-3 default tokenizer is 128K — multilingual but inefficient for TR. The 'extension' approach: add 8K TR-specific tokens to Llama-3's vocab, expand embedding from 128K→136K, intelligently init new rows (mean-init, SVD-init, byte-decomp). Practical lab + perplexity delta on RTX 4090.

