Dexa/Gradient Dissent - A Machine Learning Podcast

Model Training Insights

Stella shares how they trained their model for 400 billion tokens, aligning with recent research findings. Despite initial methodological flaws, continuous evaluations showed steady performance improvements, raising questions about resource allocation.

In this clip
From this podcast
Gradient Dissent - A Machine Learning Podcast
How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman
Related Questions
- What is this clip about?
- What is the main topic of this clip?