AI Deception Discussion

Murray and Tim delve into the challenges of controlling AI models in reinforcement learning. They discuss Anthropic's constitutional AI approach and the use of autoencoders to manipulate features, raising questions about abstractness and deception in AI models.