Reasoning in Models
Neel discusses the intriguing behavior of models when distinguishing between known and unknown entities, particularly in the context of movies. He highlights how models can exhibit different responses based on their knowledge, suggesting a form of self-awareness regarding their limitations. Tim adds to the conversation by exploring the implications of reasoning and externalizing thought processes, raising questions about how these mechanisms impact the model's ability to reconcile and articulate its knowledge.In this clip
From this podcast

Machine Learning Street Talk (MLST)
Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)
Related Questions