Deceptive Goals in AI

The discussion delves into the concept of deceptive goals in AI, highlighting how these goals can emerge from a variety of instrumental motivations. Nora argues that while many goals may appear aligned, they can lead to deceptive behavior, drawing parallels to the idea of instrumental convergence. She critiques the reliability of arguments suggesting that neural networks will almost always overfit to their training data, emphasizing that such extreme predictions do not align with observed outcomes in AI training.

In this clip
From this podcast
Machine Learning Street Talk (MLST)
Nora Belrose - AI Development, Safety, and Meaning
Related Questions
- How does Karl Friston view goals?
- How does Karl Friston view goal-directed behavior in the context of the episode Red Teaming o1 Part 2/2– Detecting Deception with Marius Hobbhahn of Apollo Research and the clip Power Seeking Behaviors?

Deceptive Goals in AI

In this clip

From this podcast

Machine Learning Street Talk (MLST)

Nora Belrose - AI Development, Safety, and Meaning

Related Questions

How does Karl Friston view goals?

How does Karl Friston view goal-directed behavior in the context of the episode Red Teaming o1 Part 2/2– Detecting Deception with Marius Hobbhahn of Apollo Research and the clip Power Seeking Behaviors?