# Write BPE from Scratch in 200 Lines: Training + Encoding + Decoding + Turkish Corpus

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/bpe-sifirdan-200-satir-turkce-corpus
> Updated: 2026-05-13T13:00:26.162Z
> Category: LLM Mühendisliği
> Module: Module 6: Tokenization Microsurgery
**TLDR:** Karpathy minbpe-style from-scratch implementation: pure Python BPE training (Sennrich algorithm), encoding/decoding, regex pre-tokenization, byte-level extension, train on Turkish corpus + compare with Trendyol-LLM. Practical understanding of modern LLM tokenizers.

