Technical GlossaryNatural Language Processing
Tokenization
The core language processing step that splits text into units that a model can process.
Tokenization is one of the foundational decisions that determines how an NLP system sees text. The split can happen at the word, subword, character, or special-symbol level. This choice affects not only model input, but also vocabulary size, error tolerance, and multilingual behavior. Although it may look like a technical detail, tokenization is one of the structural components at the core of model behavior.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
