Published Dec 23, 2022

#92 - SARA HOOKER - Fairness, Interpretability, Language Models

Sara Hooker delves into the frontier of AI ethics, highlighting challenges in ensuring fairness and addressing bias in machine learning models. The discussion encapsulates the complexity of language models, emphasizing interpretability, adaptive computation, and the importance of aligning AI systems with human values.
Episode Highlights
Machine Learning Street Talk (MLST) logo

Popular Clips

Episode Highlights

  • Prompting

    The dynamics of prompting in language models present both opportunities and challenges. highlights the effectiveness of few-shot prompting, which allows models to adapt to new tasks without explicit gradient updates. However, she notes that the process is still akin to "tea leaf reading," as researchers probe models without fully understanding the high-dimensional spaces involved 1. adds that while some believe prompt engineering will become obsolete, he argues that prompts are crucial for structuring how models interpret tasks 2.

    Prompt engineering is less a science and more a probing role where you're using your human decision boundary to understand the model decision boundary.

    ---

    The conversation underscores the need for a deeper understanding of how prompts interact with model architectures and data sets.

       

    Adaptiveness

    Adapting language models to new data and user preferences remains a significant challenge. explains that current models often lack adaptive capacity, as they treat all data equally and rely on global updates that can override existing knowledge 3. She suggests that adaptive computation, which focuses on distinguishing between difficult and noisy examples, could improve model performance 4.

    One of the core limitations of our current model is that we treat all examples uniformly, and we show all examples the same amount of time.

    ---

    This approach could lead to more robust models capable of adjusting to new distributions and preferences without losing valuable learned information.

       

    RLHF

    Reinforcement Learning from Human Feedback (RLHF) offers insights into aligning language models with human values. questions whether RLHF robustifies or simplifies models, noting that it aligns model outputs with human preferences but may reduce creativity 5. argues that while RLHF is one method, the real gains come from comprehensive annotations and selective fine-tuning 6.

    RL, the way what it actually means, it feels like window dressing on this problem rather than like the core, core contribution of the set of techniques.

    ---

    She emphasizes the importance of data quality over the specific optimization techniques used, suggesting that better sampling could enhance model efficiency.

Related Episodes