Transformer in Computer Vision
Yannic and Hugo discuss the evolution of transformers in computer vision, from initial struggles to current applications like Vit, Deit, and Clip. They explore the challenges of different token lengths in images and text and consider adapting NLP techniques for image transformers like Linformer and Performer.In this clip
From this podcast

Machine Learning Street Talk (MLST)
#044 - Data-efficient Image Transformers (Hugo Touvron)
Related Questions