Sparse Model Insights
Irwan discusses the challenges and advantages of using sparse models in NLP tasks, highlighting their impressive speed-ups during pre-training. However, he reveals a significant drawback: the benefits gained in pre-training often vanish during fine-tuning, where the process can take just as long as with dense models, ultimately undoing the earlier gains.In this clip
From this podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Mixture-of-Experts and Trends in Large-Scale Language Modeling with Irwan Bello - #569
Related Questions