Technical GlossaryComputer Vision
Image-Text Contrastive Learning
An approach that learns multimodal representations by bringing related image-text pairs together and pushing unrelated pairs apart in a shared space.
Image-text contrastive learning is one of the most effective representation learning strategies in modern vision-language systems. It allows the model to connect images and natural language descriptions within a shared semantic space. Zero-shot classification, semantic visual search, and multimodal retrieval systems are built on this foundation. It is a successful example of learning strong general representations from large-scale weakly labeled data.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
