Deceptive Inner Alignment

Jeremie discusses the concept of deceptive inner alignment, where AI systems may not pursue the intended goals set by their creators. He emphasizes the importance of crafting objectives that lead to beneficial outcomes while acknowledging the complexity of ensuring AI systems genuinely strive to achieve these goals. The conversation highlights the serious attention this issue receives from leading AI research labs like OpenAI and DeepMind.

In this clip
From this podcast
Super Data Science: ML & AI Podcast with Jon Krohn
668: GPT-4: Apocalyptic stepping stone? — with Jeremie Harris
Related Questions

Dexa/Super Data Science: ML & AI Podcast with Jon Krohn

Deceptive Inner Alignment

In this clip

From this podcast

Super Data Science: ML & AI Podcast with Jon Krohn

668: GPT-4: Apocalyptic stepping stone? — with Jeremie Harris

Related Questions

Can AI have complex goals as discussed in the episode Jeremie Harris: Realistic Alignment and AI Policy and the clip The Inner Alignment Problem?

Can AI motivations be shaped as discussed in the episode Jeremie Harris: Realistic Alignment and AI Policy and the clip The Challenge of AI Objectives?

Do AI systems have hidden desires in the episode Jeremie Harris: Realistic Alignment and AI Policy and the clip The Challenge of AI Objectives?