If humanity does end up building a dangerous superintelligent AI, how long do yo...

williamtrask · on March 17, 2017

It's a solid question. Only one way to find out ;)

JoshTriplett · on March 17, 2017

> Only one way to find out

When it comes to building smarter-than-human AI, "try it and see" is never the right answer. You may only get one attempt to get it right, and you don't take "try it and see" chances with existential risk.

(There's been some interesting research into making it possible to monitor and halt a rogue AI, but no matter how promising that looks, it should still be treated as one of many risk mitigation strategies rather than as a panacea. Still better to consider that you might only get one attempt.)

I don't think it makes sense to consider this kind of approach with superintelligence; either it understands and implements human values, in which case attempting to treat it as an adversary is counterproductive, or it fails to understand and implement human values, in which case you've utterly failed on a "better luck next universe" scale.

However, it does make sense to consider this kind of approach with machine learning in general. One of the problems with machine learning techniques is "give us all your data and we'll do smart things with it", which doesn't work out so well if you want to keep such data private. This approach might provide more options in that case, such as offloading some of your expensive computations and learnings without actually exposing your data.

AndrewKemendo · on March 17, 2017

When it comes to building smarter-than-human AI, "try it and see" is never the right answer.

Disagree emphatically. In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity. I go so far as to argue that it's not even necessary because there is no long term longevity for humanity anyway.

There is this implicit assumption that humans are, should and will always be the apex entity - and I think that is misguided.

If you instead view superhuman-AGI as our rightful offspring, something that we can't understand and is better than us, then all of the existential dread around it goes away.

Dying elderly often express "comfort" in dying when they see that their offspring are reproducing and are smarter than they were. We should see Superhuman-AGI the same way except towards all of humanity.

JoshTriplett · on March 17, 2017

1) Coping mechanisms around death aside, there's no "comfort" in building a "successor" in the form of a bot that tiles the universe with paperclips, or with neural networks that minimally satisfy some notion of "interesting" while taking up as few atoms as possible to maximize the number of them, or many other utter failure modes (which far outnumber successful outcomes). We're not talking about "smart alien-like intelligence that just doesn't care about humans", we're talking about the equivalent of an industrial accident but on a species-wide scale.

2) It's reasonable to think about how our values might change in the presence of superintelligence; we certainly shouldn't assume that our present values should forever dictate how everything works. That's different than allowing a view that sentient beings who exist today might have no value.

> In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity.

There's no way to know certainly; there are ways to know that the outcome has higher expected value than not having it, given the vast set of problems it can solve and the massive negative values associated with those problems.

AndrewKemendo · on March 17, 2017

I am crucially aware of all of the "failure" scenarios and I find few of them plausible - even respecting that they are simple thought experiments.

What every author Bostrom, Eliezer et al. seem to miss is that there will be a practical mechanism for a digital AGI taking physical control of systems out of the hands of humans. Eg. they would need to control the resources around mining or recovering the metal, then building production plants etc... So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better. Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.

There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours. They may or may not be aligned. However the definition of term is based on the fact that it is "better" in outcome than all manner of humans.

So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.

If we create a subhuman-AGI then we will be able to overcome it's goals by virtue of the fact that we are still superior.

I'll go back to a very old example. An ant can't determine if building the Large-Hadron Collider is an optimal global goal - it's inscrutable to the Ant. All it knows is that it's house and all it's friends were destroyed.

If it is the case that an AGI can in fact take the physical control reigns from humans, then by definition it is smarter and will make a more optimal long term goal than we could - to the point that we probably wouldn't understand what it's doing.

I think the true concern is that we will make something that is superhuman-powerful without being superhuman-intelligent. Like the doomsday machine in Dr. Strangelove, but to me that is an altogether different question.

parenthephobia · on March 17, 2017

How can a goal be optimal? What does that even mean?

Your argument seems to imply that if an AGI tricks us into giving it the ability to destroy us, that's basically okay because its goals are "better" than human goals.

Speaking as a human, I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.

AndrewKemendo · on March 17, 2017

Your argument seems to imply that if an AGI tricks us into giving it the ability to destroy us, that's basically okay because its goals are "better" than human goals.

Yea that's about right.

I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.

Well of course you wouldn't, neither you nor I would possibly understand what a superhuman-AGI does or thinks.

I don't think people realize that actually creating a superhuman-AGI is effectively creating a God in all the forms that people interpret it now.

pka · on March 18, 2017

You are implicitly talking about being able to objectively measure a goal's "optimality".

Unless you believe in absolute morality or the like, there's no such thing as an objective measure. A goal can only be optimal to an agent.

In your example, the fact that we pursue science and can destroy ants doesn't mean that their goal is "objectively less optimal". Their goal is absolutely optimal to them, though they can't reach it if it collides with ours.

Some goes for a superintelligent AI.

dorgo · on March 17, 2017

I understand what an "optimal solution" might be. But what is an optimal goal?

AndrewKemendo · on March 17, 2017

Great question. More than likely everyone has a different answer, but each person, group, nation etc... has some implicit goals. I would argue the current human goal is "reproduce successfully" as that is what is baked into our genes - I doubt that is actually optimal though.

The failed idea of a coherent extrapolated volition (CEV) that came up years ago was (roughly) the idea of using revealed preferences to understand what Humanity's goal is. This would give us a benchmark for what an AGI's goal should be.

So if you want to be able to measure the capability of an AGI in comparison to human system, you need to understand the set of goals in humanity and then compare them to the AGI outcomes.

The concept of goal direction in AGI is hugely contentious - but make no mistake there needs to be a goal if it is going to actually function at superhuman levels.

JoshTriplett · on March 18, 2017

> What every author Bostrom, Eliezer et al. seem to miss is that there will be a practical mechanism for a digital AGI taking physical control of systems out of the hands of humans.

No, that's a pretty widely assumed premise, and most authors specifically do anticipate that. (There's actually some dispute about whether any AGI will be capable of growing that capable that fast; the "fast singularity" scenario is not universally accepted. But many authors do recognize and discuss that scenario, and have not "missed" it.)

> So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better.

Humans are not universally rational, and even if the humans making the decision were rational, they can still make a horrible mistake. As one of many possible failure modes: a group of humans build an AGI and try to hardcode their particular values, utterly fail at extrapolating how the computer will interpret those values, and end up destroyed by it. Or humans build an AGI they think they've programmed appropriately, but fail to implement it correctly.

> Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.

A computer can play chess better than any human at this point, which makes it "smarter" in a way, but that doesn't make its values appropriate. If you somehow gave a chess computer enough flexibility in achieving its goal that it consumed the universe to build more computronium so that it can compute better chess solutions, that doesn't make it better than humans, just better at playing chess and building computronium.

In fact, a far more likely scenario than most of the "actively evil AGI" failure modes is the "accidentally broken" AGI: humans aren't its enemy, but we're made of matter that it could put to other purposes.

> There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours.

"Might makes right" is not a particularly good value system. Supervillains are typically depicted as smarter than the people they defeat. That doesn't make their goals or values better.

And an AGI doesn't even have to be "smart" in the way we normally conceive of intelligence to fail fatally; it doesn't even have to "think" at all to attempt to optimize the wrong value function.

> So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.

I can't even begin to imagine what value system you're using to reach that conclusion. I could imagine someone thinking "if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way. But directly describing a system that destroys the universe, including all of humanity, and replaces it with paperclips, as better...

AndrewKemendo · on March 18, 2017

While the AI FOOM and Hard takeoff options are discussed, I have yet to see a practical breakdown of HOW - like step by step from any of the existential warning people. It's al vagaries.

To your other points, you imply too much. The Chess AI that turns into AGI isn't realistic - it's values are "be the best at chess" which it can do with existing computing power. No need to tear the world apart - it would be inefficient.

I also never make the might makes right case. All if the examples you give are fantasy and don't reflect what an actual superintelligence might look like. Again, optimization to some narrow goal has too many weak points to take over all of humanity's functions.

"if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way

I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.

parenthephobia · on March 18, 2017

> The Chess AI that turns into AGI isn't realistic - it's values are "be the best at chess" which it can do with existing computing power.

The AGI has whatever values we give it. Existing chess AIs don't seek to maximize their ability to play chess, they seek merely to win the particular game of chess they're playing.

But suppose we build a chess-playing AGI and tell it to "be the best at chess". It must anticipate that we might build a second, superior, chess-playing AGI and give it the same goal. One way to be the best at chess would be to prevent that second AGI being built. One way to prevent that second AGI being built would be to destroy humanity's capability to build AGIs. That probably counts as a loss for humanity.

Suppose the second AGI gets built despite the first's efforts. Now both AGIs have an incentive to destroy both the other, and the possibility of a third. At any particular time, one or both of the AGIs won't be the best at chess, so they'll also have an incentive to get better at chess by actually improving their chess-playing capability. This will involve converting the Earth into processing power for it to use. That probably counts as a loss for humanity.

JoshTriplett · on March 18, 2017

> Again, optimization to some narrow goal has too many weak points to take over all of humanity's functions.

It doesn't have to take over all of humanity's functions to wreak havoc. A hypothetical AI disaster could be one goal-oriented system with a poorly constructed goal and enough initial resources.

> I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.

I think you're making a fundamental and unwarranted assumption here.

You're anthropomorphizing "superintelligence" as something vaguely human-like but better. A system doesn't have to be "intelligent" in a sense that relates at all to what humans think of as "intelligent" to be dangerous. It could simply be a "really powerful optimization process". You're romanticizing the notion of a superintelligent being discarding human values and inventing some new moral system that it then follows, and ignoring the possibility of an algorithm no "smarter" than a nanobot instructed to make a copy of itself. That nanobot doesn't have an interesting value system; it doesn't need one to kill everyone and everything, though. And that's not an outcome that, individually or as a species, we should take any pride or "comfort" in.

You're also assuming that the ability to destroy the world requires some kind of intelligent process or executive function, and could not possibly be discovered by an optimization process. It wouldn't necessarily come across such a mechanism at random, but may of the approaches we might apply towards the creation of useful AI could provide exceptionally powerful pattern recognition capabilities, and search abilities.

As a complete hypothetical off the top of my head, imagine a ridiculously powerful pattern-search program effectively recreating the idea of afl-fuzz ("throw input at a program and find interesting behavior"), and applying it against the mechanisms running it in a sandbox. Improbable, but not wildly impossible, and an agent that succeeded would gain access to additional computation resources that would allow it to do better than the algorithms it competes with. So, now you have a complex pattern-search engine trained to break out of sandboxes...

AndrewKemendo · on March 19, 2017

We both likely think the other person hasn't thought through every facet of this issue.

We're past the point of the discussion where we can lay fundamental foundations for the arguments that are making.

I'll just say that I'm sure it will be interesting watching/building the future of AGI and it's predecessors.

agumonkey · on March 17, 2017

we need to merge hippocratic and asimovian branches in future.git

williamtrask · on March 17, 2017

Great points.

Filligree · on March 17, 2017

I don't want to find out that way!

Ar-Curunir · on March 17, 2017

Forever. There is no structure to be found in the output of a PRF like AES. These functions are not going to be learnable.