Unraveling Models

Stella discusses the importance of mechanistic interpretability in understanding model behavior. She highlights research on circuit interpretability and the impact of training data on language model performance.