Distillation Through Attention
Hugo discusses the challenges of matching features between convnets and transformers for distillation. Exploring the idea of distilling attention matrices and intermediate representations, Hugo raises questions about how hard to teach students in this process.In this clip
From this podcast

Machine Learning Street Talk (MLST)
#044 - Data-efficient Image Transformers (Hugo Touvron)
Related Questions