# Unigram Language Model Tokenization > Source: https://sukruyusufkaya.com/en/glossary/unigram-language-model-tokenization > Updated: 2026-05-13T20:58:45.401Z > Type: glossary > Category: dogal-dil-isleme **TLDR:** A method that learns a subunit vocabulary probabilistically to make token segmentation more data-aligned.

Unigram tokenization optimizes the subword vocabulary based on the probabilistic contribution of individual units. Unlike BPE, it is not driven by merge order but by a more general probabilistic model. It is widely used in the SentencePiece family and enables more flexible vocabulary design.