Delayed Generalization

Grokking reveals a fascinating phenomenon where a network's test performance can improve significantly after a prolonged training period, even when training metrics plateau. This suggests that during training, the network retains gradient information that eventually allows it to better generalize on unseen data. As training progresses, the network shifts from learning simple features to more complex ones, challenging the conventional understanding of how learning dynamics operate.

In this clip
From this podcast
Machine Learning Street Talk (MLST)
Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero
Related Questions
- Is the basic pattern of learning in the episode Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind and the clip Teachability Revolution - fast repetitions, slow down for refinement, and then possibly speed up again for repetition including refinement?

Delayed Generalization

In this clip

From this podcast

Machine Learning Street Talk (MLST)

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Related Questions

Is the basic pattern of learning in the episode Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind and the clip Teachability Revolution - fast repetitions, slow down for refinement, and then possibly speed up again for repetition including refinement?