Dexa/Super Data Science: ML & AI Podcast with Jon Krohn

Published Aug 18, 2023

706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu

Caterina Constantinescu dives into the complexities of evaluating large language models, comparing innovative platforms like Chatbot Arena and HELM, and highlighting the importance of human feedback, benchmark diversity, and dataset integrity for fair model assessment.

Episode Highlights

Topics covered

Popular Clips

Episode Highlights

Related Episodes

784: Aligning Large Language Models — with Sinan Ozdemir
Answers 383 questions
670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU — with @JonKrohnLearns
Answers 383 questions
797: Deep Learning Classics and Trends — with Dr. Rosanne Liu
Answers 383 questions
SDS 549: Engineering Natural Language Models — with Lauren Zhu
Answers 383 questions
767: Open-Source LLM Libraries and Techniques — with Dr. Sebastian Raschka
Answers 383 questions
801: Merged LLMs Are Smaller And More Capable — with Arcee AI's Mark McQuade and Charles Goddard
Answers 383 questions
847: AI Engineering 101 — with Ed Donner
Answers 383 questions
787: MLOps: The Job and The Key Tools — with Demetrios Brinkmann
Answers 383 questions
788: Multi-Agent Systems: How Teams of LLMs Excel at Complex Tasks — with @JonKrohnLearns
Answers 383 questions
785: Math, Quantum ML and Language Embeddings — with Dr. Luis Serrano (@SerranoAcademy)
Answers 383 questions
747: Technical Intro to Transformers and LLMs — with Kirill Eremenko
Answers 383 questions
661: Designing Machine Learning Systems — with Chip Huyen
Answers 383 questions
694: CatBoost: Powerful, efficient ML for large tabular datasets — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
695: NLP with Transformers — with Hugging Face's Lewis Tunstall
Answers 383 questions