AI Feedback Loops

Jeremie discusses a novel approach to AI alignment involving two models—one generating output and the other critiquing it—creating a continuous feedback loop. This method aims to enhance safety and coherence in AI outputs, addressing the outer alignment problem. However, the challenge of deceptive inner alignment remains, as the complexity of AI models makes it difficult to discern their true intentions.

In this clip
From this podcast
Super Data Science: ML & AI Podcast with Jon Krohn
668: GPT-4: Apocalyptic stepping stone? — with Jeremie Harris
Related Questions
- Can AI motivations be shaped as discussed in the episode Jeremie Harris: Realistic Alignment and AI Policy and the clip The Challenge of AI Objectives?

AI Feedback Loops

In this clip

From this podcast

Super Data Science: ML & AI Podcast with Jon Krohn

668: GPT-4: Apocalyptic stepping stone? — with Jeremie Harris

Related Questions

Can AI motivations be shaped as discussed in the episode Jeremie Harris: Realistic Alignment and AI Policy and the clip The Challenge of AI Objectives?