How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

Topics covered
Popular Clips
Episode Highlights
Model Scaling
Stella Biderman discusses the complexities of scaling large language models (LLMs) and the infrastructure required to support them. She notes that while there are several LLMs with over 100 billion parameters, the focus has shifted from merely training and releasing large models to understanding their properties and limitations 1. The challenge of running these massive models is highlighted by the limited availability of GPUs capable of handling them, with only a few models fitting on high-end GPUs like the A6000 or A40 2. Stella explains that EleutherAI's journey began with training smaller models, eventually leading to a 20 billion parameter model, thanks to partnerships with companies like CoreWeave 3.
We don't think that training and publicly releasing very large language models is an inherently good thing.
---
This shift in focus underscores the importance of studying LLMs beyond their size, emphasizing interpretability and alignment.
Fine-Tuning
Fine-tuning methods significantly impact the performance of LLMs, with multitask fine-tuning emerging as a beneficial approach. Stella Biderman explains that fine-tuning can be tailored to specific applications, such as creative storytelling or code writing, enhancing the model's effectiveness in those areas 4. Multitask fine-tuning, which involves training on task-like data rather than specific tasks, has shown to improve performance on standard NLP benchmarks 5. This approach allows models to perform well even without task-specific fine-tuning, offering versatility in various applications.
If there is something that's been fine-tuned to your application context, that's probably going to be the best.
---
Stella highlights the importance of choosing the right fine-tuning method based on the desired application and context.
Interpretability
Interpretability remains a significant challenge in the development of LLMs, with efforts focused on understanding model behavior and decision-making processes. Stella Biderman emphasizes the importance of mechanistic interpretability, which seeks to unravel what models do and why 6. She describes innovative approaches like circuit interpretability, which breaks down models into smaller components to better understand their interactions 6. Despite these efforts, challenges persist, such as models learning to obscure unwanted biases rather than eliminating them 7.
The thing that I'm most excited about is called mechanistic interpretability, which is a fancy way of saying, what does the model do and why does it do it?
---
These insights are crucial for improving model transparency and trustworthiness, guiding future research directions.
Related Episodes


Jerome Pesenti — Large Language Models, PyTorch, and Meta
Answers 383 questions

Emily M. Bender — Language Models and Linguistics
Answers 383 questions

Shaping AI Benchmarks with Together AI Co-Founder Percy Liang
Answers 383 questions

Scaling LLMs and Accelerating Adoption: Interview with Aidan Gomez
Answers 383 questions

Enabling LLM-Powered Applications with Harrison Chase of LangChain
Answers 383 questions

Richard Socher — The Challenges of Making ML Work in the Real World
Answers 383 questions

Johannes Otterbach — Unlocking ML for Traditional Companies
Answers 383 questions

Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex
Answers 383 questions

Elevating ML Infrastructure with Modal Labs CEO Erik Bernhardsson
Answers 383 questions

The Explainability Benefits of Open Source LLMs
Answers 383 questions

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez
Answers 383 questions

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next
Answers 383 questions

Jonathan Frankle of MosiacML— Neural Network Pruning and Training
Answers 383 questions

Transforming Search with Perplexity AI’s CTO Denis Yarats
Answers 383 questions













