Skip to content
Technical GlossaryNatural Language Processing

Token Alignment

The problem of preserving the mapping between subword tokens and original word or span structures.

Token alignment becomes critical especially in NER, span extraction, and document-processing tasks. Outputs produced at the subword level must be mapped back to word-level or field-level units that are meaningful to humans. Poor alignment can make even a correct model appear wrong. For that reason, tokenization is not only an input issue, but also an output interpretation issue.