Skip to content
Technical GlossaryNatural Language Processing

Late-Interaction Embeddings

A retrieval approach that matches queries and documents through token-level interaction instead of compressing each into a single vector.

Late-interaction embedding methods offer a strong balance between dense retrieval efficiency and cross-encoder quality. They encode queries and documents separately, but compute final similarity through richer token-level interactions. They are especially attractive in systems that require high-quality semantic retrieval. They represent an important middle design space between efficiency and expressiveness.