Evaluating Algorithm Bias
Rishabh discusses the complexities of evaluating algorithms, particularly in the context of Atari tasks. He highlights a significant issue with the bias in performance metrics, revealing that the difference between median scores and average of medians can lead to a bias as large as 30%. This insight challenges the reliability of reported results and emphasizes the importance of proper evaluation methods in machine learning research.In this clip
From this podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559
Related Questions
Why is quantifying results important in the episode Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559 and the clip Evaluating Algorithm Bias?
Are the best practices for benchmark performance metrics average in the episode Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559 and the clip Benchmark Performance Metrics?
How do metrics break down in the episode Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559 and the clip Benchmark Performance Metrics?