# Vision Transformer Features

> Source: https://sukruyusufkaya.com/en/glossary/vision-transformer-features
> Updated: 2026-05-13T20:58:41.090Z
> Type: glossary
> Category: bilgisayarli-goru
**TLDR:** A modern visual feature structure that splits images into patch tokens and learns representations through global attention.

<p>Vision Transformer features are among the strongest examples of a representation learning paradigm outside CNNs. The image is split into fixed-size patches, which are then processed like tokens. This approach is especially strong at learning global contextual relations. In recent years, it has become a powerful and increasingly standard representation family for classification, segmentation, and multimodal systems.</p>