Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> "This work is part of the third pillar of our approach to alignment research: we want to automate the alignment research work itself."

I feel like this isn't a Yud-approved approach to AI alignment.



Honestly, I think any foundational work on the topic is inherently Yud-favored, compared to the blithe optimism and surface-level analysis at best that is usually applied to the topic.

Ie, I think it's not that this shouldn't be done. This should certainly be done. It's just that so many more things than it should be done before we move forward.


DISCLAIMER: I think Yudkowsky is a serious thinker and his ideas should be taken seriously, regardless of whether not they are correct.

Your comment triggered a random thought: A perfect name for Yudkowsky et al and the AGI doomers is... wait for it... the Yuddites :)


Already used on 4chan :)


These were my thoughts exactly. On one hand, this can enable alignment research to catch up faster. On the other hand, if we are worried about homicidal AI, then putting it in charge of policing itself (and training it to find exploits in a way) is probably not ideal.


"Yud-approved?"



He's the one in the fedora who is losing patience that otherwise smart sounding people are seriously considering letting AI police itself https://www.youtube.com/watch?v=41SUp-TRVlg


That's not a fedora, that's his King of the Redditors crown



You mean Yudkowski? I saw him on Lex Fridman and he was entirely unconvincing. Why is everyone deferring to a bunch of effective altruism advocates when it comes to AI safety?


I heard him on Lex too, and it seemed to be just a given that AI is going to be deceptive and want to kill us all. I don't think there was a single example of how that could be accomplished given. I'm open to hearing thoughts on this, maybe I'm not creative enough to see the 'obvious' ways this could happen.


This is also why I go into chess matches against 1400 elo players. I cannot conceive of the specific ways in which they will beat me (a 600 elo player), so I have good reason to suspect that I can win.

I'm willing to bet the future of our species on my consistent victory in these types of matches, in fact.


Again, a given that AI is adversarial. Edit: In addition, as an 1100 elo chess player, I can very easily tell you how a 1600 player is going to beat me. The analogy doesn't hold. I'm in good faith asking how AI could destroy humanity. It seems given the confidence people who are scared of AI have in this, that they have some concrete examples in mind.


No it’s a given that some people who attempt to wield AI will be adversarial.

In any case a similar argument can be made with merely instrumental goals causing harm: “I am an ant and I do not see how or why a human would cause me harm, therefore I am not in danger.”


People wielding AI and destroying humanity is very different from AI itself, being a weird alien intelligence, destroying humanity.

Honestly if you have no examples you can't really blame people for not being scared. I have no reason to think this ant-human relationship is analogous.

And seriously, I've made no claims that AI is benign so please stop characterizing my claims thusly. The question is simple, give me a single hypothetical example of how an AI will destroy humanity?


Sure, here’s a trivial example: It radicalizes or otherwise deceives an employee at a virus research lab into producing and releasing a horrific virus.

The guy at Google already demonstrated that AIs are able to convince people of fairly radical beliefs (and we have proof that even humans a thousand years ago were capable of creating belief systems that cause people to blow themselves up and kill thousands of innocent people).

P.S. I was not characterizing your opinion, I was speaking in the voice of an ant.


If AI is general, it will also have the capability of wielding itself, and probably better than humans.


Other caveman use fire to cook food. Fire scary and hurt. No understand fire. Fire cavemen bad.


Other caveman use nuke to wipe out city. Nuke scary and hurt. No understand nuke. Nuke caveman bad.

Other caveman use anthrax in subway station. Anthrax scary and hurt…

Is AI closer to fire or closer to nukes and engineered viruses? Has fire ever invented a new weapon system?

By the way: we have shitloads of regulations and safety systems around fire due to, you guessed it, the amount of harm it can do by accident.


IMHO the argument isn't that AI is definitely going to be deceptive and want to kill us all, but rather that if you're 90% sure that AI is going to be just fine, that 10% of existential risk is simply not acceptable, so you should assume that this level of certainty isn't enough and you should act as if AI may be deceptive and may kill us all and take very serious preventive measures even if you're quite certain that it won't be needed - because "quite certain" isn't enough, you want to be at "this is definitely established to not lead to Skynet" level.


And even with all that, probably it's best to still exercise an abundance of caution, because you might have made a mistake somewhere.


Because they have arguments that AI optimists are unable to convincingly address.

Take this blog post for example, which between the lines reads: we don't expect to be able to align these systems ourselves, so instead we're hoping these systems are able to align each other.

Consider me not-very-soothed.

FWIW, there are plenty of AI experts who have been raising alarms as well. Hinton and Christiano, for example.


People won't care until an actually scary AI exists. Will be easy to stop at that point. Or you can just stop research here and hope another country doesn't get one first. Im personally skeptical it will exist. Honestly might be making it worse with the scaremongering coming from uncharismatic AI alignment people.


Are you kidding? "easy to stop"? When LLMs are integrated into law enforcement, banking, research, education, logistics... All these areas have people building backend systems leveraging the current llm tech and pipelines to plug in the coming tech. If we reach a tipping point where these things become aware/act truly autonomously, what are the chances they do it before we notice? People are renowned for implementing things before understanding the consequences.

And what does charisma of AI alignment folks have to do with anything?


Why would it be easy to stop at that point? The believable value prop will increase in lockstep with the believable scare factor, not to mention the (already significant) proliferation out of ultra expensive research orgs into open source repos.

Nuclear weapons proliferated explicitly because they proved their scariness.


If AI can exist humans have to figure it out. It’s what we do. Really shockingly delusional to think people are gonna use chatgpt for a few min get bored and then ban it like it’s a nuke. I’d rather the USA get it first anyways.


>If AI can exist humans have to figure it out. It’s what we do.

We have figured out stuff in the past, but we also came shockingly close to nuclear armageddon more than once.

I'm not sure I want to roll the dice again.


Where did I say we could or should ban it like a nuke?

Anyway this is a good example of the completely blind-faith reasoning that backs AI optimism: we’ll figure it out “because it’s what we do.”

FWIW we have still not figured out how to dramatically reduce nuclear risk. We’re here just living with it every single day still, and with AI we’re likely stepping onto another tightrope that we and all future generations have to walk flawlessly.


> Why is everyone deferring to a bunch of effective altruism advocates when it comes to AI safety?

I'm not sure Yudkowski is an EA, but the EAs want him in their polycule.


He posts on the forum. I'm not sure what more evidence is needed that he's part of it.

https://forum.effectivealtruism.org/users/eliezeryudkowsky


I guess it's true, not just a rationalist but also effective altruist!


Agreed, Yud does seem to have been right about the course things will take but I'm not confident he actually has any solutions to the problem to offer


His solution is a global regulatory regime to ban new large training runs. The tools required to accomplish this are, IMO, out of the question but I will give Yud credit for being honest about them while others who share his viewpoint try to hide the ball.


Good point, but I think if he was serious he would be looking to Serena Butler for inspiration not nuclear non-proliferation treaties


Which things has he been right about and when, if you recall?


At least to me, this current generation of AI progress is looking a lot more "foomy" than a lot of people besides him predicted


Why does this matter?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: