Evolution of Transformers

Jonathan discusses the fundamental nature of self-attention in transformers and its effectiveness in NLP models. He predicts the longevity of transformers in the field, drawing parallels to the enduring success of convolutional networks in vision tasks.