# Subword Tokenization

> Source: https://sukruyusufkaya.com/en/glossary/subword-tokenization
> Updated: 2026-05-13T20:01:45.119Z
> Type: glossary
> Category: dogal-dil-isleme
**TLDR:** An approach that splits rare words into smaller meaningful pieces to balance vocabulary size and coverage.

<p>Subword tokenization has become standard in modern NLP and large language models. It reduces the rare-word problem of word-level approaches while avoiding the extreme fragmentation of character-level methods. It is especially advantageous in agglutinative languages such as Turkish and in multilingual systems. It is one of the key design choices shaping how a model behaves in the face of unfamiliar words.</p>