Evaluating LLMs

Rosanne highlights the challenges of using benchmarks like Big Bench, emphasizing the rapid evolution of evaluation methods in AI. The conversation reveals a tension between transparency and the potential for bias, as closed evaluations can lack neutrality. Both discuss the irony of new models claiming state-of-the-art status, only to be surpassed moments later, showcasing the ongoing complexities in assessing AI performance.

In this clip
From this podcast
Super Data Science: ML & AI Podcast with Jon Krohn
797: Deep Learning Classics and Trends — with Dr. Rosanne Liu
Related Questions
- What is this clip about?
- What is the main topic of this clip?

Evaluating LLMs

In this clip

From this podcast

Super Data Science: ML & AI Podcast with Jon Krohn

797: Deep Learning Classics and Trends — with Dr. Rosanne Liu

Related Questions

What is this clip about?

What is the main topic of this clip?