Transformer Biases and Distillation
Hugo discusses inducing biases in transformers to achieve flexibility and the benefits of distilling convolutional models into transformers. The conversation delves into the challenges of reducing assumptions in machine learning models and the potential for improved performance with more data.In this clip
From this podcast

Machine Learning Street Talk (MLST)
#044 - Data-efficient Image Transformers (Hugo Touvron)
Related Questions