Transformer Biases and Distillation

Hugo discusses inducing biases in transformers to achieve flexibility and the benefits of distilling convolutional models into transformers. The conversation delves into the challenges of reducing assumptions in machine learning models and the potential for improved performance with more data.