Sparse Model Insights

Irwan discusses the challenges and advantages of using sparse models in NLP tasks, highlighting their impressive speed-ups during pre-training. However, he reveals a significant drawback: the benefits gained in pre-training often vanish during fine-tuning, where the process can take just as long as with dense models, ultimately undoing the earlier gains.