Reverse Engineering Models

The discussion delves into the complexities of reverse engineering language models, emphasizing the challenge of fully understanding their algorithms. Neel highlights the potential for a more accessible approach by analyzing model activations and employing causal interventions. The conversation also contrasts the wide but shallow nature of neural networks with the deep, narrow characteristics of symbolic methods, suggesting that models may represent a superposition of algorithms based on input sensitivity.