Published May 4, 2023

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

Stella Biderman from EleutherAI delves into the evolution of this grassroots organization into a leader in open-source AI, the ethical considerations of making AI publicly accessible, and the technical challenges in training and improving large language models.

Episode Highlights

Topics covered

Episode Highlights

Model Scaling

Stella Biderman discusses the complexities of scaling large language models (LLMs) and the infrastructure required to support them. She notes that while there are several LLMs with over 100 billion parameters, the focus has shifted from merely training and releasing large models to understanding their properties and limitations 1. The challenge of running these massive models is highlighted by the limited availability of GPUs capable of handling them, with only a few models fitting on high-end GPUs like the A6000 or A40 2. Stella explains that EleutherAI's journey began with training smaller models, eventually leading to a 20 billion parameter model, thanks to partnerships with companies like CoreWeave 3.

We don't think that training and publicly releasing very large language models is an inherently good thing.

---

This shift in focus underscores the importance of studying LLMs beyond their size, emphasizing interpretability and alignment.

Fine-Tuning

Fine-tuning methods significantly impact the performance of LLMs, with multitask fine-tuning emerging as a beneficial approach. Stella Biderman explains that fine-tuning can be tailored to specific applications, such as creative storytelling or code writing, enhancing the model's effectiveness in those areas 4. Multitask fine-tuning, which involves training on task-like data rather than specific tasks, has shown to improve performance on standard NLP benchmarks 5. This approach allows models to perform well even without task-specific fine-tuning, offering versatility in various applications.

If there is something that's been fine-tuned to your application context, that's probably going to be the best.

---

Stella highlights the importance of choosing the right fine-tuning method based on the desired application and context.

Interpretability

Interpretability remains a significant challenge in the development of LLMs, with efforts focused on understanding model behavior and decision-making processes. Stella Biderman emphasizes the importance of mechanistic interpretability, which seeks to unravel what models do and why 6. She describes innovative approaches like circuit interpretability, which breaks down models into smaller components to better understand their interactions 6. Despite these efforts, challenges persist, such as models learning to obscure unwanted biases rather than eliminating them 7.

The thing that I'm most excited about is called mechanistic interpretability, which is a fancy way of saying, what does the model do and why does it do it?

---

These insights are crucial for improving model transparency and trustworthiness, guiding future research directions.

Related Episodes

Jerome Pesenti — Large Language Models, PyTorch, and Meta
Answers 383 questions
Emily M. Bender — Language Models and Linguistics
Answers 383 questions
Open Access of LLMs with Brandon Duderstadt, Co-Founder and CEO of Nomic AI (GPT4ALL and Atlas)
Answers 383 questions
Shaping AI Benchmarks with Together AI Co-Founder Percy Liang
Answers 383 questions
Scaling LLMs and Accelerating Adoption: Interview with Aidan Gomez
Answers 383 questions
Enabling LLM-Powered Applications with Harrison Chase of LangChain
Answers 383 questions
Richard Socher — The Challenges of Making ML Work in the Real World
Answers 383 questions
Johannes Otterbach — Unlocking ML for Traditional Companies
Answers 383 questions
Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex
Answers 383 questions
Elevating ML Infrastructure with Modal Labs CEO Erik Bernhardsson
Answers 383 questions
The Explainability Benefits of Open Source LLMs
Answers 383 questions
Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez
Answers 383 questions
Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next
Answers 383 questions
Jonathan Frankle of MosiacML— Neural Network Pruning and Training
Answers 383 questions
Transforming Search with Perplexity AI’s CTO Denis Yarats
Answers 383 questions

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

Topics covered

Popular Clips

Episode Highlights

EleutherAI Background

AI Impact and EthicsStella Biderman discusses the ethical implications of open-source AI and the significance of public access to large language models. She also explores how AI can augment human productivity by handling tedious tasks, allowing humans to focus on more critical work.

AI Impact and Ethics

Language Model Development

Model Scaling

Fine-Tuning

Interpretability

Related Episodes