Published Aug 18, 2023
706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu
Caterina Constantinescu dives into the complexities of evaluating large language models, comparing innovative platforms like Chatbot Arena and HELM, and highlighting the importance of human feedback, benchmark diversity, and dataset integrity for fair model assessment.

Topics covered
Popular Clips
Episode Highlights
Related Episodes


784: Aligning Large Language Models — with Sinan Ozdemir
Answers 383 questions
670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions

797: Deep Learning Classics and Trends — with Dr. Rosanne Liu
Answers 383 questions

SDS 549: Engineering Natural Language Models — with Lauren Zhu
Answers 383 questions

767: Open-Source LLM Libraries and Techniques — with Dr. Sebastian Raschka
Answers 383 questions

847: AI Engineering 101 — with Ed Donner
Answers 383 questions

787: MLOps: The Job and The Key Tools — with Demetrios Brinkmann
Answers 383 questions

747: Technical Intro to Transformers and LLMs — with Kirill Eremenko
Answers 383 questions

661: Designing Machine Learning Systems — with Chip Huyen
Answers 383 questions

695: NLP with Transformers — with Hugging Face's Lewis Tunstall
Answers 383 questions








