# Video Transformer > Source: https://sukruyusufkaya.com/en/glossary/video-transformer > Updated: 2026-05-23T19:52:34.623Z > Type: glossary > Category: bilgisayarli-goru **TLDR:** A modern architectural approach that tokenizes video across time and space and models it with attention mechanisms.

Video Transformer architectures go beyond CNN-based video modeling by learning long-range spatio-temporal relations through attention mechanisms. This can be especially powerful for complex action sequences, long video context, and global scene interactions. However, computational cost and context-length management remain central challenges in this area.