Evolution of Transformers
Jonathan discusses the fundamental nature of self-attention in transformers and its effectiveness in NLP models. He predicts the longevity of transformers in the field, drawing parallels to the enduring success of convolutional networks in vision tasks.In this clip
From this podcast

Gradient Dissent - A Machine Learning Podcast
Jonathan Frankle of MosiacML— Neural Network Pruning and Training
Related Questions