Evaluating LLMs
Rosanne highlights the challenges of using benchmarks like Big Bench, emphasizing the rapid evolution of evaluation methods in AI. The conversation reveals a tension between transparency and the potential for bias, as closed evaluations can lack neutrality. Both discuss the irony of new models claiming state-of-the-art status, only to be surpassed moments later, showcasing the ongoing complexities in assessing AI performance.In this clip
From this podcast

Super Data Science: ML & AI Podcast with Jon Krohn
797: Deep Learning Classics and Trends — with Dr. Rosanne Liu
Related Questions