Model Training Insights
Akshita discusses the complexities of training larger models, highlighting the impact of wait time on loss curves. She notes that while sharing weights between the embedding and output layers works effectively for smaller models, it presents challenges at the seven billion parameter scale. The findings suggest that different model sizes require distinct approaches to achieve stability in training.In this clip
From this podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
OLMo: Everything You Need to Train an Open Source LLM with Akshita Bhagia - 674
Related Questions