# Unigram Language Model Tokenization

> Source: https://sukruyusufkaya.com/en/glossary/unigram-language-model-tokenization
> Updated: 2026-05-13T20:58:45.401Z
> Type: glossary
> Category: dogal-dil-isleme
**TLDR:** A method that learns a subunit vocabulary probabilistically to make token segmentation more data-aligned.

<p>Unigram tokenization optimizes the subword vocabulary based on the probabilistic contribution of individual units. Unlike BPE, it is not driven by merge order but by a more general probabilistic model. It is widely used in the SentencePiece family and enables more flexible vocabulary design.</p>