> AI safety / x-risk folks have in fact made extensive and detailed arguments. C...

ssrlcc · on Jan 16, 2024

AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ [2] https://course.aisafetyfundamentals.com/alignment

baobabKoodaa · on Jan 17, 2024

Is there a clear argument that I can read without spending more than 15 minutes of my time reading the argument? If such an argument exists somewhere, can you point to it?

Also note we were talking about modern day LLM AIs here, and their descendants. We were not talking about science fiction AGIs. Unless of course you have an argument as to how one of these LLMs somehow descends into an AGI.

ssrlcc · on Jan 16, 2024

AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ

[2] https://course.aisafetyfundamentals.com/alignment