# Character, Word, Subword: Tokenization Design Constraints and Decision Matrix

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/tokenization-karakter-sozcuk-subword-karar
> Updated: 2026-05-13T13:00:25.982Z
> Category: LLM Mühendisliği
> Module: Module 6: Tokenization Microsurgery
**TLDR:** Tokenization design space: character-level (UTF-8, byte), word-level (whitespace, morphology), subword (BPE, WordPiece, Unigram). Mathematical and pragmatic trade-offs of each choice, OOV problem, vocabulary size decision matrix, multilingual challenges, Turkish characteristics.

