AI Feedback Loops

Jeremie discusses a novel approach to AI alignment involving two models—one generating output and the other critiquing it—creating a continuous feedback loop. This method aims to enhance safety and coherence in AI outputs, addressing the outer alignment problem. However, the challenge of deceptive inner alignment remains, as the complexity of AI models makes it difficult to discern their true intentions.