# Tokenization

> Source: https://sukruyusufkaya.com/en/glossary/tokenization
> Updated: 2026-05-13T19:58:53.091Z
> Type: glossary
> Category: dogal-dil-isleme
**TLDR:** The core language processing step that splits text into units that a model can process.

<p>Tokenization is one of the foundational decisions that determines how an NLP system sees text. The split can happen at the word, subword, character, or special-symbol level. This choice affects not only model input, but also vocabulary size, error tolerance, and multilingual behavior. Although it may look like a technical detail, tokenization is one of the structural components at the core of model behavior.</p>