Published Aug 18, 2023

706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu

Caterina Constantinescu dives into the complexities of evaluating large language models, comparing innovative platforms like Chatbot Arena and HELM, and highlighting the importance of human feedback, benchmark diversity, and dataset integrity for fair model assessment.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

Related Episodes