Evaluating Algorithm Bias

Rishabh discusses the complexities of evaluating algorithms, particularly in the context of Atari tasks. He highlights a significant issue with the bias in performance metrics, revealing that the difference between median scores and average of medians can lead to a bias as large as 30%. This insight challenges the reliability of reported results and emphasizes the importance of proper evaluation methods in machine learning research.