Technical GlossaryComputer Vision
Vision Transformer Features
A modern visual feature structure that splits images into patch tokens and learns representations through global attention.
Vision Transformer features are among the strongest examples of a representation learning paradigm outside CNNs. The image is split into fixed-size patches, which are then processed like tokens. This approach is especially strong at learning global contextual relations. In recent years, it has become a powerful and increasingly standard representation family for classification, segmentation, and multimodal systems.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
