Model Training Insights

Stella shares how they trained their model for 400 billion tokens, aligning with recent research findings. Despite initial methodological flaws, continuous evaluations showed steady performance improvements, raising questions about resource allocation.