Transformer in Computer Vision

Yannic and Hugo discuss the evolution of transformers in computer vision, from initial struggles to current applications like Vit, Deit, and Clip. They explore the challenges of different token lengths in images and text and consider adapting NLP techniques for image transformers like Linformer and Performer.