Published Dec 23, 2022

#92 - SARA HOOKER - Fairness, Interpretability, Language Models

Sara Hooker delves into the frontier of AI ethics, highlighting challenges in ensuring fairness and addressing bias in machine learning models. The discussion encapsulates the complexity of language models, emphasizing interpretability, adaptive computation, and the importance of aligning AI systems with human values.

Episode Highlights

Topics covered

Episode Highlights

Prompting

The dynamics of prompting in language models present both opportunities and challenges. highlights the effectiveness of few-shot prompting, which allows models to adapt to new tasks without explicit gradient updates. However, she notes that the process is still akin to "tea leaf reading," as researchers probe models without fully understanding the high-dimensional spaces involved 1. adds that while some believe prompt engineering will become obsolete, he argues that prompts are crucial for structuring how models interpret tasks 2.

Prompt engineering is less a science and more a probing role where you're using your human decision boundary to understand the model decision boundary.

---

The conversation underscores the need for a deeper understanding of how prompts interact with model architectures and data sets.

Adaptiveness

Adapting language models to new data and user preferences remains a significant challenge. explains that current models often lack adaptive capacity, as they treat all data equally and rely on global updates that can override existing knowledge 3. She suggests that adaptive computation, which focuses on distinguishing between difficult and noisy examples, could improve model performance 4.

One of the core limitations of our current model is that we treat all examples uniformly, and we show all examples the same amount of time.

---

This approach could lead to more robust models capable of adjusting to new distributions and preferences without losing valuable learned information.

RLHF

Reinforcement Learning from Human Feedback (RLHF) offers insights into aligning language models with human values. questions whether RLHF robustifies or simplifies models, noting that it aligns model outputs with human preferences but may reduce creativity 5. argues that while RLHF is one method, the real gains come from comprehensive annotations and selective fine-tuning 6.

RL, the way what it actually means, it feels like window dressing on this problem rather than like the core, core contribution of the set of techniques.

---

She emphasizes the importance of data quality over the specific optimization techniques used, suggesting that better sampling could enhance model efficiency.

Related Episodes

Sara Hooker - The Hardware Lottery, Sparsity and Fairness
Answers 383 questions
Sara Hooker - Why US AI Act Compute Thresholds Are Misguided
Answers 383 questions
#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks
Answers 383 questions
#039 - Lena Voita - NLP
Answers 383 questions
047 Interpretable Machine Learning - Christoph Molnar
Answers 383 questions
#114 - Secrets of Deep Reinforcement Learning (Minqi Jiang)
Answers 383 questions
#032- Simon Kornblith / GoogleAI - SimCLR and Paper Haul!
Answers 383 questions
Explainability, Reasoning, Priors and GPT-3
Answers 383 questions
#046 The Great ML Stagnation (Mark Saroufim and Dr. Mathew Salvaris)
Answers 383 questions
#55 Self-Supervised Vision Models (Dr. Ishan Misra - FAIR).
Answers 383 questions
#48 Machine Learning Security - Andy Smith
Answers 383 questions
#80 AIDAN GOMEZ [CEO Cohere] - Language as Software
Answers 383 questions
#68 DR. WALID SABA 2.0 - Natural Language Understanding [UNPLUGGED]
Answers 383 questions
#91 - HATTIE ZHOU - Teaching Algorithmic Reasoning via In-context Learning #NeurIPS
Answers 383 questions
#040 - Adversarial Examples (Dr. Nicholas Carlini, Dr. Wieland Brendel, Florian Tramèr)
Answers 383 questions

#92 - SARA HOOKER - Fairness, Interpretability, Language Models

Topics covered

Popular Clips

Episode Highlights

Fairness and BiasSara Hooker, a leading figure in machine learning, discusses the evolving challenges of fairness in AI and the complexities of model bias. She also explores the distinction between types of errors in AI systems and their implications for data quality.

Fairness and Bias

Language Model DynamicsSara Hooker explores the intricacies of prompting dynamics, adaptiveness challenges, and RLHF in language models. She emphasizes the need for improved understanding and methodologies to enhance model performance and alignment with human values.

Language Model Dynamics

Prompting

Adaptiveness

RLHF

Interpretability and OptimizationSara Hooker explores the intricacies of interpretability techniques and ensemble methods in AI models. She advocates for adaptive computation and efficient data distribution strategies to enhance model performance and fairness.

Interpretability and Optimization

Related Episodes