Reverse Engineering Models
The discussion delves into the complexities of reverse engineering language models, emphasizing the challenge of fully understanding their algorithms. Neel highlights the potential for a more accessible approach by analyzing model activations and employing causal interventions. The conversation also contrasts the wide but shallow nature of neural networks with the deep, narrow characteristics of symbolic methods, suggesting that models may represent a superposition of algorithms based on input sensitivity.In this clip
From this podcast

Machine Learning Street Talk (MLST)
Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)
Related Questions