Holistic Language Model Evaluation
Percy discusses the creation of Helm, a comprehensive evaluation framework for language models, addressing capabilities and risks in a broad context. The initiative involved diverse areas like bias, reasoning, and robustness, resulting in an exhaustive benchmarking infrastructure.In this clip
From this podcast

Gradient Dissent - A Machine Learning Podcast
Shaping AI Benchmarks with Together AI Co-Founder Percy Liang
Related Questions