Published May 4, 2023

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

Stella Biderman from EleutherAI delves into the evolution of this grassroots organization into a leader in open-source AI, the ethical considerations of making AI publicly accessible, and the technical challenges in training and improving large language models.
Episode Highlights
Gradient Dissent - A Machine Learning Podcast logo

Popular Clips

Episode Highlights

  • Model Scaling

    Stella Biderman discusses the complexities of scaling large language models (LLMs) and the infrastructure required to support them. She notes that while there are several LLMs with over 100 billion parameters, the focus has shifted from merely training and releasing large models to understanding their properties and limitations 1. The challenge of running these massive models is highlighted by the limited availability of GPUs capable of handling them, with only a few models fitting on high-end GPUs like the A6000 or A40 2. Stella explains that EleutherAI's journey began with training smaller models, eventually leading to a 20 billion parameter model, thanks to partnerships with companies like CoreWeave 3.

    We don't think that training and publicly releasing very large language models is an inherently good thing.

    ---

    This shift in focus underscores the importance of studying LLMs beyond their size, emphasizing interpretability and alignment.

       

    Fine-Tuning

    Fine-tuning methods significantly impact the performance of LLMs, with multitask fine-tuning emerging as a beneficial approach. Stella Biderman explains that fine-tuning can be tailored to specific applications, such as creative storytelling or code writing, enhancing the model's effectiveness in those areas 4. Multitask fine-tuning, which involves training on task-like data rather than specific tasks, has shown to improve performance on standard NLP benchmarks 5. This approach allows models to perform well even without task-specific fine-tuning, offering versatility in various applications.

    If there is something that's been fine-tuned to your application context, that's probably going to be the best.

    ---

    Stella highlights the importance of choosing the right fine-tuning method based on the desired application and context.

       

    Interpretability

    Interpretability remains a significant challenge in the development of LLMs, with efforts focused on understanding model behavior and decision-making processes. Stella Biderman emphasizes the importance of mechanistic interpretability, which seeks to unravel what models do and why 6. She describes innovative approaches like circuit interpretability, which breaks down models into smaller components to better understand their interactions 6. Despite these efforts, challenges persist, such as models learning to obscure unwanted biases rather than eliminating them 7.

    The thing that I'm most excited about is called mechanistic interpretability, which is a fancy way of saying, what does the model do and why does it do it?

    ---

    These insights are crucial for improving model transparency and trustworthiness, guiding future research directions.

Related Episodes