Skip to content
Technical GlossaryComputer Vision

Image-Text Retrieval

A task that retrieves relevant images from text or relevant text from images through a shared multimodal representation space.

Image-text retrieval is one of the strongest applications of multimodal information access. A user can search for images with natural language, or retrieve relevant descriptions and documents from an image. It creates strong value in e-commerce, media archiving, design search, and content discovery. It is a practical manifestation of shared semantic space design.