Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lee Sedol Beats AlphaGo in Game 4 (gogameguru.com)
1395 points by jswt001 on March 13, 2016 | hide | past | favorite | 448 comments


Relevant tweets from Demis;

    Lee Sedol is playing brilliantly! #AlphaGo thought it
    was doing well, but got confused on move 87. We
    are in trouble now...

    Mistake was on move 79, but #AlphaGo only came to
    that realisation on around move 87

    When I say 'thought' and 'realisation' I just mean the
    output of #AlphaGo value net. It was around 70% at
    move 79 and then dived on move 87

    Lee Sedol wins game 4!!! Congratulations! He was
    too good for us today and pressured #AlphaGo into
    a mistake that it couldn’t recover from
From: https://twitter.com/demishassabis


I'll risk assumption that somebody from Deepmind team is reading this.

Guys, please, publish charts of win prob estimated by alpha go in time during these games. Some heatmap telling which moves did it consider as best for both sides during the games would also be cool, but that's surely more time consuming to prepare.

It would be great to be able to have such things for top pro tournaments in the future.


Actually, it's a tradition that both Players should replay the game after the match, discuss about good moves, bad moves and what they were thinking during the match.

I felt so sorry for Lee Sedol when I saw him lose the second match, facing an empty chair ,and he could only ask one of his friend to review the game.

https://zh.wikipedia.org/wiki/%E5%A4%8D%E7%9B%98


Or the helpless look of Aja Huang... like "I have no idea how to decipher that neural network, that's part of why you are here".


He made $30k for losing that game, so I don't feel too sorry for him.


If it really was about money, I don't think he would have spent so much of his life dedicated to Go ...


yeah, not to belabor it, but if you can do something like be a champion in go or chess, chances are you have the mental skillset to do something exponentially more lucrative.


This is not true. Scientific studies have shown that skills in abstract strategy games are so specific to the game that they do not transfer to other tasks. The urban legend about Chess grandmasters all being geniuses is false, in other words. They're actually worse at most jobs than people who've been doing those jobs for awhile, because they lack the experience, and they aren't more predisposed to having a genius intellect that would let them overcome that.

A Chess or Go champion is probably doing the single-most lucrative activity that they are capable of.


What scientific studies are you referring to?

Of course chess grandmasters aren't all-around geniuses, but many traits that are prerequisites for a successful chess player certainly are translatable into careers in other fields, and correlate with above average brainpower (good memory, discipline, long attention span, spatial intelligence etc.)

Botvinnik was an accomplished engineer, Euwe had PhD in mathematics, Anatoly Karpov is a millionaire, interestingly enough a lot of recognized chess players had careers in music (like Taimanov or Smyslov)...

Being a grandmaster surely requires an above average intellect, there is no such thing as a chess savant. While it's an urban legend that Kasparov's (arguably the greatest chess player ever) IQ was in the ballpark of 190, he did clock at 135. Such a result is not unheard of, yet still placing him in top 1% or so.


There's a whole body of research out there, but I just scratched the surface of it. Here's the first result that's coming up from a random Google search. https://psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html The references should be worth exploring, and I bet there's more recent research that expands on it as well. Here's a quote from the paper:

> In a recent review, Ericsson and Lehmann (1996) found that (1) measures of general basic capacities do not predict success in a domain, (2) the superior performance of experts is often very domain specific and transfer outside their narrow area of expertise is surprisingly limited and (3) systematic differences between experts and less proficient individuals nearly always reflect attributes acquired by the experts during their lengthy training.

I will say this for Chess or Go grandmasters: They have drive and dedication (and in some cases, compulsiveness). That alone would probably allow them to do better than the average person in another field if they had pursued that field from the get-go. Also, I'd caution you against relying on hand-picked anecdotes; I could just as easily pick out a bunch of Chess players who weren't good for anything else. You'd need broad-based statistics.


You seem to be confusing two different claims:

1. A champion Go/Chess player could retire, and jump a successful career as <Something else>. You refute 1.

2. A champion Go/Chess player could have pursued a career in <Something else> from the start and made more money. You have not refuted 2.

For an example, many physics PhDs become successful software engineers and quants, making a mid-career jump to a different field.


Thanks for the link. Well, of course I just quoted a few examples without pretending it's legitimate piece of statistics, however, there hasn't been all that many world chess champions so far (and Botvinnik, Smyslov, Euwe, Karpov, belong in that category), and thus a few PhDs or successful intellectual careers is already enough to show that something clearly sets this bunch apart from the general public. You're less likely to find a plumber or a shelf-stacker in that bus (no disrespect for people in these professions). Whatever causes that is another matter. I agree with you that drive and dedication - and not necessarily some innate talents - could be the crucial factor in this.


I'm not sure you're disagreeing with the person you're responding to. Just because the skills involved are very specific doesn't imply that the people with those skills only are capable of obtaining said skill.

What the person you're responding to means is that if they have the mental capacity (and also discipline) to play the game at such a high level, they could probably also excel in other professions if their goal was to make money instead of doing something they loved.

Of course we can debate whether they would achieve such success if they were doing something they didn't enjoy as much, but I think most people would argue that most chess or go champions could make more money or in other words do more "lucrative" activities if they chose to do them instead of dedicated all their time to the game.


You are relying too much on the concept of a generalized mental capacity, which many researchers (maybe even most) would say does not exist. You seem to be assuming that grandmasters are just amazingly smart and they happen to apply those smarts to one particular skill, but what's closer to the truth is that they happen to be particularly amazing at that one skill, in a way that doesn't really correlate strongly with having amazingly high skill in other areas.


If you took the people who would have grown up to be chess grandmasters, intercepted them just before they started spending significant amounts of time studying chess, and had them instead spend all that time studying programming or physics or something, how do you think they would rank up?

You seem to be arguing against the idea, "A chess grandmaster is a genius and therefore can walk into Google and immediately start doing more and better work than most of their senior programmers". I don't think anyone believes this (correct me if I'm wrong).

I think a more serious idea is, "A chess grandmaster is a genius and therefore learns faster and has a higher performance ceiling and such than most people, and if they spent a couple of years learning to program, they could become an entry-level Google programmer, after which they would rise more quickly than most hires, and eventually would outperform most of Google's senior programmers."

By the way, I think most of the best chess players were extreme chess prodigies. (Just looked at Kasparov, Karpov, Shirov, Kramnik, Anand, and Carlsen's Wiki pages; all but Kramnik had the year listed, and they all became grandmasters around age 17-19, except Carlsen, who was around 14. Kramnik's page mentioned winning a gold medal for the Russian team at age 16, and that he wasn't a grandmaster when selected for the team and this was unusual.) I think this is consistent with them being highly gifted children, who choose to spend their time doing chess.


>I don't think anyone believes this (correct me if I'm wrong).

While not exactly what you wrote, people do think that someone is a genius at some subject can become a genius at another area with less work than it took for someone in either field to originally become a genius at that field, especially society groups the areas together (so sports star becoming master programmer is far less likely to be believed than chess master becoming master programmer).



> Scientific studies have shown that skills in abstract strategy games are so specific to the game that they do not transfer to other tasks

Do you have a reference for that please? I'm interested in the subject.


Though being able to compel yourself to do great things seems like a completely different skill set from being able to do great things on someone else's dime or on a team.

For example, my brilliant programmer friend that can't hold a job. Or the artist that can only paint their own inspirations. Or the savant that doesn't get along with anybody.


I dunno, Bobby Fischer was kinda deranged, for instance, and that often hurts outcomes in otherwise "lucrative" positions. Incidentally, he is credited with raising chess player compensation through his demands.

Generally I'd guess you're right though.


Maybe not a leadership position but I've worked on the engineering side of quant work and am familiar enough with fairly advanced Go / Chess players (e.g. national youth champions for age brackets for the US) to know that most of them could make enough to retire off one years salary during the boom years of algorithmic trading (~2003-2007ish).

I'm not sure how you define "deranged" (AFAIK, that's not a medically defined term within the DSM-IV) but most of those brilliant people end up being a little 'off'. My father was an academic, one of the people he went to graduate school with was working in Boston while I was a child. He was absolutely groundbreaking work but he's so difficult to collaborate with (think: the mannerisms of Richard Stallman) that he's been floating around universities until his welcome is worn out. He can figure out remarkable things in higher level computational chemistry, but he can't really figure out humans. Had he decided instead during the 80s to work at Renaissance instead of pursuing academic research, he almost certainly would be worth in the low hundreds of millions.


Well, mental issues did break his chess career just the same.


With the number of tournaments he has won (many of which have a winner's purse of <$100 000) I suspect he is already a millionaire.


Those are the big tournaments -- do the little tournaments stack up, or is it a year of practice leading up to an annual big win?


I can not find it using google, but there was a Go champion who had stop playing go during one year to earn money playing mah jong.


False dichotomy? It is quite possible to enjoy an activity and gain money at the same time.


The best (like absolute best) football|basketball|baseball players in the world make approximately what per game?


Well, defensive end Olivier Vernon just signed a deal with the New York Giants which will pay him an average annual salary of around 17 million dollars. And there was a huge signing bonus as well.[1]

NFL seasons are 16 regular season games, plus 4 pre-season games. And then there are the playoffs, which a given team might or might not make or advance in.

All told, given that the signing bonus is amortized over all the games he plays, the annual salary, etc., I think it would be fair to say that Vernon will make around a million dollars a game.

Aside: Vernon isn't necessarily "the best" DE in the NFL, but due to market forces and the way things work with the salary cap, free agency rules, etc., the contract he just signed is one of the largest for a defensive player in the league. QB's tend to make even more, but I can't recall a really high profile QB who has signed a big deal recently.

[1]: http://www.spotrac.com/nfl/new-york-giants/olivier-vernon/


Osweiler QB from the Broncos just signed a huge deal with the Texans...$18mil/year but I haven't heard how much of that is guaranteed so I'm sure.

A really big issue with NFL contracts is this "guaranteed money" thing...I believe the NFL is the only major US sports league who give player contracts without it, so you have to take those salary numbers with a grain of salt.


34 guaranteed over first two years


Football is kind of an oddcase since it the most popular sport in the richest country, and it runs very few games per year.


Best is around 10x that, but at the cost of their body and long term health. Nobody is playing professional football|basketball|baseball in there 50's, and even just 40 is pushing it. Where pro go players can 60+.


Gordie Howe played in the NHL until he was 53.[1]

Not really refuting your statement, which is essentially true, but it's worth your time to read the wikipedia page.

He played on a professional hockey team with his two adult sons at one point(!). He played in the NHL in five different decades.

[1] https://en.wikipedia.org/wiki/Gordie_Howe


Yep, it's interesting to see a few edge cases out there. There are a tiny number of people that came close to 50 in baseball:basketball:football.

  Julio Franco played baseball till 49.
  George Blanda played football till 48
  Nat Hickey played baseball till 45.


Interestingly, the world champions seem rather young.


If as many people who pay to watch (NFL|NBA|MLB) paid to watch Go, you'd likely see similar payouts.

I would like to see AlphaGo play 21 against Steph Curry.


I.e. Something like [1] for every move. I was a bit disappointed to learn that such table is only available for that particluar move at the first reading.

[1] http://www.nature.com/nature/journal/v529/n7587/fig_tab/natu...


Off-topic: we (the human species) have built an AI that can master the game of Go, but we don't yet have the intelligence to publish charts correctly on the web. That blur of a JPG is half a megabyte!


Yeah honestly that would be much better presented as an SVG since its mostly solid colors and geometric shapes. Every browsers supports them, and all the big image editors as well; the trouble is that the rest of our software and culture hasn't caught on. Why am I taking bitmap screenshots of websites when they're almost completely text? We need to start building a tools ecosystem around vector graphics. An extra bonus would be a machine-readable representation of the board as well. We live in a digital age, but its almost like we're cavemen still dealing with the modern equivalent of analog formats. (And at least analog has fidelity)


Screenshots might not be the best example, since there may be some "shiny cute effects" going around. However, reading your idea sprang another one in my head: screenshots shouldn't be restrained to the same rendering resolution as the rest of the system/the screenshot resolution should be configurable.

I don't require the screenshot to be instantaneous, I require it to appear instantaneous. In that sense, if the whole rendering pipeline is working to give me a framerate of 60 fps, then I could spend ten times as much rendering a screenshot without that delay being noticeable. Also, why on earth does Windows (don't know the behaviour on Linux/Mac) apply ClearType to a screenshot? That has always bugged me, there are some situations where you tolerate it and others where it hurts.


It didn't master anything. What the DeepMind team did was make an "AI" which means it can _imitate_ (to a certain degree) of "playing go".


What's the difference between "imitating" playing go and playing go, when it can beat any one us at go?


That would be awesome.


"Do not anthropomorphise computers. They really hate that" (NN)


By NN you mean a neural network, of course? That makes even more sense...


NN = nomen nescio ("I don't know the name") You use it when you cannot name the source of a quote, or when you would like to protect the source.

https://en.wikipedia.org/wiki/Nomen_nescio


Wouldn't "-Unknown" convey the same sense without the possible confusion?


I think this conveys a slightly different meaning.

When I see "-Unknown", I interpret it as "somebody said it, but no one is quite sure who".

"(NN)" in the current thread was used to indicate "I personally don't know", which does not imply the quote is unattributable.


NN is (was) also used in neapolitan comedy, derived from earlier use throughout the Roman Empire and Middle Age. The "figlio di NN", or "son of NN" means someone who was found and adopted (typically by nuns) and whose parents were unknown.

NN can be used in general for people whose origin is uncertain. I think in this specific case it's a bit misleading - although the wikipedia article seems to suggest that NN can also be used as a synonym for "unknown", although from a historical perspective it is a bit incorrect.

The literal translation from Latin creates some confusion if you didn't know the context.

I hope this helps.


So NN can be used as an school version of anonymous?


... wrote the bot trying to hide its identity


Using the acronym of the phrase defeats the purpose, though, since it looks like someone's initials.



Not so coincidentally, "This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves." [1]

The bad moves in the eyes of humans could be risky bets or the horizon effect. [1] [2] AlphaGo's use of deep neural nets (value networks) to evaluate board positions should significantly help counter the horizon effect, but since move 78 by Lee Sedol which turned the situation around was unexpected by some top pros (Gu Li referred to it as the 'hand of god' [1]), the patterns which follow are likely rare in possible game states and therefore not strongly embedded into the value networks, leading to AlphaGo's loss.

I hope the DeepMind team will help enlighten us on this in the near future.

[1] https://gogameguru.com/lee-sedol-defeats-alphago-masterful-c...

[2] https://en.m.wikipedia.org/wiki/Horizon_effect


78 was hard, but not impossible: If you watch the AGA commentary of the game, they had two pros at the time 78 happened, and they found 78 as the best answer a few minutes before Lee did, expecting AlphaGo to go with a stronger, yet still good not good enough 79, that left the game even, instead of basically lost. Then they were elated about how AlphaGo seemed to have failed to read the whole thing, and instead of doing preparatory moves for a nasty ko fight, the best option at the time, AlphaGo just took a route that provided almost no compensation.

If anything, it seems to me that AlphaGo's problem here might be the time management: Seeing a really scary situation, Lee Seido just sank many minutes into reading the problem, going pretty much all the way to byoyomi time. A human, after seeing something like that, would figure out that their assessment of the situation and their opponent's is very different, and spend a lot of budget trying to figure out what was wrong. AlphaGo just didn't see the problem, and didn't just budgets its time to analyze the position to death. It moved slower than before, but not really that much, and ended up making moves a kyu player could see as terrible.

Either way, I'd love to see Deepmind giving us all a good postmortem of the 70-100 range of moves.


I concur that better time management may have made a difference.

Beyond that, I think that AlphaGo may still be missing a type of component. From the descriptions of it, the policy network generates possible moves from board positions, and the value network evaluates the probability of desirable outcomes. How this is different than human play is that strategic assessment and planning are implicit in the middle layers rather than a 'conscious' element to searching and decision making. I'm not saying that this is a necessary component as AlphaGo has already done exceptionally well. I do believe this kind of 'middle-out' processing producing and evaluating strategic concepts could make it better handle unusual circumstances. Being trained on high amateur and pro games, it will best respond to the most conventional of those types of games, more unconventional the game becomes, the worse it would fare in terms of efficiency of move generation and choices of which to evaluate.


I got something different from the AGA commentary. The pros did consider 78, but came to the conclusion that it didn't work! I'm just a 1kyu player but I too can't see any way to make 78 work after Black plays 79 as an atari at L10, as suggested by the pros. So I'd be very interested to see some analysis that shows how White could get a fair result after Black 79 at L10.


It would be interesting to see a neural net evolve to take into account player state, not just game state. I play chess, and no where near professional levels, so I don't know if this anecdote is valuable, but if I see my opponent looking at a particular area of the board, I tend to take a second look.

I suppose beating humans isn't AlphaGo's primary motive though - learning to play a perfect game of Go in general is probably more difficult than playing the perfect game against a particular person.


Having the AI use eye-tracking to predict an opponent's moves is a truly terrifying thought. Just by tracking the eye with millisecond precision you could probably work out the strategy they were reading. It's so game-breaking that it shouldn't be allowed as an input, and indeed, is easily defeated with reflective sunglasses anyway (like a lot of pro Poker players wear).

The AI player can't give up information in this manner because it lacks eyes, so I'd say that it should not be able to use this information from the human player.


Seems like just another avenue to trick the computer


This actually came up in the commentary. Michael Redmond actually mentioned that some amateurs watch to see where their opponent is looking, but he called it merely a "trick", and he said it is not useful in professional play.


That type of metagame seems useful though, in the sense that you can start to sense the state of mind of your opponent. If they are working harder than you think they need to, you might get away with a riskier move for instance.


Just in case someone wants the commentary around this move 78

https://www.youtube.com/watch?v=yCALyQRN3hw&t=11413


Here's the Myungwan Kim commentary around that same move: https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h33m. He had been looking with Hajin Lee at variations involving the move beforehand (https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h28m1s), so he immediately noticed that AlphaGo made a possible error in its response.


But Kim also jumped a bit backwards and forward for a few moves, until he finally decided it was a mistake. It didn't seem 100% clear.


Professionals are understandably hesitant to call AlphaGo's plays mistakes after the first few games, because AlphaGo had played a few moves that seemed like obvious mistakes (the eyestealing tesuji under the avalanche and the fifth line shoulder hit) but later proved to be very strong moves. Remember that when go is your career, your reputation is based partly on your ability to analyze games accurately, so calling a move a mistake and finding out later that it was strong would be a big embarrassment.


Any chance of explaining the mistake on move 79 to someone who's only played a few casual games of Go?


AlphaGo enlarged a group that was at risk of atari (loss to opponent) instead of correctly trying to divert attention to elsewhere on the board or starting a new fight or simply writing it off. If basically added stones to a group that was almost certainly dead and allowed the opponent to punish it heavily, thus giving away more stones than it had to.


OT: What's with using monotype for quotes? That's breaks line wrapping and makes it hard to read on mobile or small screens. I don't get why people do it.


"People" don't do it; HN does it. Indented lines in a post are always monospaced, on the assumption, I suppose, that they're likely to be code examples. There's no way to turn this off AFAIK.

And no, HN does not use Markdown.


I know the formatting system. My question is why the users are formatting their text (e.g. with 4 places leading each line) *suc that HN renders it as monospace, which then forces a sideways scroll on mobile and small screens. That seems to make it harder to read for some with no corresponding upside.


Oh. Well, I think they just don't realize that HN will monospace it.

I agree with you, it's annoying.


They don't realize it even after seeing their post go live? And seeing what other people's monotype quotes look like?

I never like to assume malice, but ...


The people indent the text (or fail to remove the leading white space when copying).

It isn't a huge time sink to wrap a paragraph in stars.


I think mostly people forget that > works and that therefore you can do

> quotes like this

since it's markdown - often you get commentish things that allow limited HTML plus indented code blocks, and people get used to using the latter as the only thing that works everywhere


'>' doesn't "work"; it doesn't do anything on HN. It's just a convention that readers recognize as indicating quotation.

HN does not use Markdown.


At least ">" doesn't break anything.

    Pre is broken on mobile, forcing users to scroll a horizontal line which is incredibly annoying if it's a long line.
Here's a screen shot: http://imgur.com/Ru56wMK

And pre with very long unbroken lines is even worse.


I wasn't expressing a value judgment, just trying to explain how HN works.

I agree with you that it's annoying when people post long indented lines.


it can be touch-scrolled here, there just aren't scroll bars. Took me a while to find, but maybe your browser is buggy, eve.


I noticed people were doing this and wrote a userscript a while ago to make it work. I kind of wish HN supported it.

https://gist.github.com/kennethrapp/7a21c0187fedd6f47e7c


I wrote in a while ago to ask that it be officially supported, but apparently that would break something that HN's spam filter relies on.


Looks good in desktop browsers.


not really: http://i.imgur.com/uxqERat.png (you have to be very careful about what indentation level you're at, how long your lines are, etc, etc, things that people just don't do most of the time)


It's also customary when quoting large sections of text. I'm not sure if this carried over from email, or from academic texts (where if your quote is more than a line or two, it needs to be formatted differently).


I think mostly people forget that > works and that therefore you can do

> q


How frigging smart Sedol is?


Kind of cool to know you're as smart as algorithms that require 1920 CPUs and 280 GPUs


They've said it doesn't require that, AlphaGo running on a single machine beats the cluster they're using 25% of the time.


So based on current data, Lee Sedol is exactly as good AlphaGo running on a single machine.


Doubtful; I don't think comparing such small W/L distributions will be illustrative.

On the other hand, the Nature paper shows the single 8 GPU machine performs similar to the 64 GPU cluster, but the larger clusters perform a comfortable margin better. [0]

By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version. The figure suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.

[0] http://www.milesbrundage.com/uploads/2/1/6/8/21681226/877172...


Probably better. If we assume a uniform distribution across possible win rates of Sedol vs AlphaGo, then update it with bayes rule, we get 33% chance that Sedol will win the next match.

That's not factoring in other information, like Sedol now being familiar with alphaGo's strategies and improving his own strategies against it.

So there is a good chance he is now evenly matched with AlphaGo, and likely much better than the single machine version.


There's a bit of a slight of hand in this statistic -- yes, they can do runtime on a single machine, but it took the compute power of a small country to train the neural nets that are loaded onto that one machine.


That's not really sleight of hand. Lots of things take more energy to produce than to run. It's like saying a 400W electric motor can put out as much power as a fairly fit human, but it's 'sleight of hand' because it took a whole factory to make the motor.


And it took decades of play for Sedol to become a top player. I find the similarities a mix of satisfying and amusing.


You're conveniently forgetting that this "AI" is a representation of tens of millions of amateur plays which is far more than a few decades in total time. Not that Lee Sedol needs someone like me to defend him, but remarks like yours are very misleading.

A human is far, far, leagues, more efficient at learning than today's AIs. These AI requires millions of hours of man time of data to even come close to competing at the level of an expert person which did the same, and even arguably far better, in a "few decades".


Even on a single machine it has the "memory" of virtually any Go game AlphaGo could be fed.


If you look at DeepMind's numbers, AlphaGo is meant to win more than 50% of the time against human players (expert). Right now, my opinion is that its actual capability in wins is less than 50%.


We're in the very infancy of the field though. Give it a year and it's likely you'll have a neural-net-based Go program that runs on a single machine that can defeat every single human professional. Human Go players have been getting better for millennia; don't assume that a program that's only two years old has no room for improvement left.

It took some serious hardware for Deep Blue to defeat Garry Kasparov. Now there are smartphone apps with that same level of Chess-playing skill. And if anything the AlphaGo approach is more amenable to running on lower-specced hardware without requiring help from Moore's Law (because you simply train it better).


It is believed now that Lee Sedol has a really good ability in reading AlphaGo's moves now (see: Micheal Redmond and AGA Game 4 commentary). Game 5 is going to be very interesting. I'm pretty sure the DeepMind team is just as excited. After game 4, they were noticeably far more ecstatic in the after-game press review.

Hassabis and Silver kind of reminded me of developers given the details of a bug that was notoriously difficult to find.

edit; I can't wait for the reviews of the entire 5 game series. If a book came out I'd very likely buy it. A book discussing both Go and AlphaGo AI at the same time consisting of people among the top of their respective fields would be amazing.


I wish some of these had been re-tweeted by the @DeepMindAI account. I'll definitely be keeping an eye on Demis' account during game 5.


It feels really weird to see someone being showered with congratulations for beating a computer program. What exactly is he being congratulated for? For probably triggering and then capitalizing on a bug in AlphaGo's AI? For showing that human resolve, perseverance and a "fighting spirit" can trump a flawed AI, at least until the AI gets fixed? For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI? For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?


As noted by the live commentator, this has the opportunity to be the third revolution of the game of Go, opening again new ways of playing for the whole of humanity.

Imagine what AI, in different fields, means for humanity if it has so much to teach us, just by being able to "think" differently. I sure hope that one day they start writing philosophy and, doing so, potentially legitimize everything that currently makes humans unique and extraordinary.


What are the other two revolutions of game of Go?


cf post game 3 conference commentary : https://youtu.be/qUAmTYHEyM8?t=20639

(also during the actual game with more details, but I can not find the exact time again). Edit : can not find game3 commentary but during game 4 he is coming back to it again with interesting details as for why this is a great opportunity to inspire human players : https://youtu.be/yCALyQRN3hw?t=7097

Basically, once in ancient Japan and more recently a Chinese origin player in Japan, both incredibly strong players, surprised everyone with never before seen moves that subsequently where integrated in modern game theory. The hope is that the same could happen here.


a computer that loves wisdom?


Perhaps you'd like a shot at explaining why AI is an "amoral field of study", and why it could "delegitimize everything that currently makes humans unique and extraordinary"?

Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?


> Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?

If we invented dolphins, and they began to replace and displace us, then maybe? I don't think it is about the intelligence, but in how we use it of course. Dolphins don't really have any control of my life or those around me.

That sounded a lot more paranoid than I meant, I actually agree with what I think you are saying


> Dolphins are about as intelligent as us, too

How are you making this comparison?


I think Bud wanted to make a different point. Ie that AI doesn't delegitimize us---whatever that even means. Dolphins were just an example. He could have used Neanderthals, or Aliens.


Dolphins definitely delegitimize Beethoven.

/s


Not really related, but dolphins are assholes!

http://www.deepseanews.com/2013/02/10-reasons-why-dolphins-a...


Frankly many pre-modern, pre-civilizarion humans were assholes. Read many studies or accounts of life in relatively isolated tribal societies and hair curlingy awful accounts of violent and sometimes institutionalised abuse are uncomfortably common. Leave the safety catches off basic human urges for long and it can get pretty ugly.


> Frankly many pre-modern, pre-civilizarion humans were assholes.

While it's somewhat news to me, I'm very glad to hear we've moved past that.


I hope that was irony.


Very much so. Many modern, "civilized" humans are assholes, and now we've got global audiences for it!



Yes, I know, it's just completely insane to declare that we've somehow qualitatively moved past our brutish origins as long as torture, rape, and murder remain de facto extensions to politics.

I have no reason to doubt that quantitatively we have made some progress - although Pinker's arguments are most definitely not universally accepted in the scientific community.


Lee Sedol will forever be known as the first human to defeat AlphaGo in Game 4, and potentially the last one too.

That's quite an accomplishment.


Maybe they should fork version 18 as the hard but human betable version of alphago


How about just for playing a good game of Go?


So I suppose the fact that the opponent is a computer program should not be factored into the reaction to Sedol's win at all.

I hope your reductionist explanation is not accurate because it would imply that we are already so highly conditioned to machines and to AI that this match is thought to be no different from a match between two humans.


The machine beat him 3 times and was more or less expected to win again. It didn't, which would imply that Sedol played an exceptionally good game. Seems pretty obvious to me.


You're acting like following arbitrary sets of rules to maximize a value function isn't something that computers excel at. The only thing that was holding up computers for Go was simply time; there were too many possibilities to consider. Humans are really, really good at heuristically trimming possibilities, but sometimes to the detriment of finding maxima.


You know that computers beating us at chess allowed us to get better at chess, right?


Fan Hui, the professional player they beat late last year, already improved his game by quite a bit from the occasional match with AlphaGo (deepmind hired him as a consultant to test). He's won every single game in the last European championship, and moved from around top 600th player in the world to around top 300.


Wow. I know nothing about go, but 600--->300 sounds like quite an improvement.


A quote by Lee Sedol from years ago (might be apocryphal, couldn't find the original source):

Q: We heard there's now an anti-Lee Sedol website in Korea?

A: I don't even have time for my fans. I don't care about haters. ("나를 좋아하는 팬들에게도 신경을 못 쓰는데 그들에겐 당연히 신경 끈다.")


>For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans

Or, writing from the point of view of our mechanical successors to this world, for helping to advance a highly ethical field that could exterminate that genocidaly murderous evolutionary abomination that was the human race. Who incidentally thought they were extraordinary but couldn't even play Go.


I don't think this comment should be so shouted down. Besides the politeness that typically comes with these events, it does feel strange to congratulate, on top of normal politeness, Se-dol's victory.

And for a game that was thought to be many decades away in terms of computer capabilities, this probably should be a time to think of the possible consequences of AI improvement and take a close look at it.


I think (or at least hope) most of the downvotes have nothing to do with congratulations, and only refer to the part where AI is called "an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary".

That line of reasoning would be worth a laugh, if it wasn't so widespread in the general population.


Perhaps still better than my friend, who thinks suffering is what makes humans special. Discussions with him are, well, interesting.


Maybe introduce your friend to buddhist ideas of ending suffering via meditation. Might be another interesting discussion.


I think Buddhism is where he got that idea from.


amoral

>Both have to do with right and wrong, but amoral means having no sense of either

Well that's true then! A nice quote is from Feynman in regards to scientific research.

"To every man is given the key to the gates of heaven; the same key opens the gates of hell. — Richard Feynman "

We really have no idea yet how this AI research will play out.

I don't quite like the line "Delegitimize all that makes people unique and special" But that's my only real disagreement.


If developments in AI can delegitimise human intelligence, then it was never "legitimate" in the first place (whatever that means).


>For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI?

You're treating Lee Seedol as if he is a fixed dataset to be trained on. Why can't he also be a "NN" who can also "refine" his AI[0] and thus be hearder for Alpha Go to compete with?

You're putting too much faith in the machine and dropping the person, that might not be completely fair.

[0] Albeit not artificial in this case.


> For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?

If you care about being unique and extraordinary more than about reason, knowledge, truth, the observable reality, and the search for what it really means to be sentient, then and only then you may call AI "amoral".

Also, you are being racist against artificial sentient beings, and being racist is hopefully not what makes humans extraordinary.


racist is, imo, a strange word to use there


Yes to all of the above, except the last.


If it's true that AlphaGo started making a series of bad moves after its mistake on move 79, this might tie into a classic problem with agents trained using reinforcement learning, which is that after making an initial mistake (whether by accident or due to noise, etc.), the agent gets taken into a state it's not familiar with, so it makes another mistake, digging an even deeper hole for itself - the mistakes then continue to compound. This is one of the biggest challenges with RL agents in the real, physical world, where you have noise and imperfect information to confront.

Of course, a plausible alternate explanation is that AlphaGo felt like it needed to make risky moves to catch up.


The same happens to people, especially people that study theory. You can totally throw them off their game by making a non-standard move, even a relatively bad one as long as it breaks their existing pre-conceived notions about how the game should progress.

Of course against a really strong player you're going to get beaten after that but a weak player strong on theory will have a harder time.


That's what Gary Kasparov attributes as one of the reasons he lost to deep blue.


Well also that the machine could study his style of play, whereas he was playing a stranger; one whose style changed during each game thanks to humans hand altering the logic. Not really fair. At least, different to how humans play chess.


To be fair, Lee Sedol was also changing style during the game. The opening for game 4 is heavily based on the one for game 2, except when he made a mistake.


Do you have a link toward this interview?



Wait, in the interview he says the opposite: "That move had no impact on me whatsoever. Move on."


Thank you very much.


What you are talking about here is called "label bias". [2] It is present only if training is done badly.

When you have a game of Go, or Super Mario level. You don't want to make your decisions by just checking the local features and doing them, because it can be the case that by compounding errors you end up in a state you never saw, and all of the future decisions won't be good.

One can avoid these situations by training jointly over the whole game.

For example, maximum entropy models can work for decision making problems but their training leaves them in a "label bias" state because the training is trying to minimize loss of local decisions, instead of trying to minimize the future regret of current local decision.

The solution to these label bias problems are Conditional Random Fields, or Hidden Markov Models. You could accomplish the same with Recursive Neural Networks if you trained them properly. For example, there is no search part (monte carlo tree search, or dynamic programming [viterbi] like it is in CRFs or HMMs) in RNNs but they are completely adequate for decision based problems (sequence labeling etc.). Why is that the case? Because search results are present in the data, there's no need to search if you can just learn to search from the data.

If DeepMind open-sourced the hundreds of millions of games that AlphaGo played, it is quite possible to train a model that wouldn't need a Monte Carlo search and would work quite well, because you would learn the model to make local decisions to minimize future regret, not to minimize its local loss. [1]

The only reason why reinforcement learning is used is because there are too few human games of Go available for the model to generalize well. Reinforcement learning can be used in the setting of joint learning because you play out the whole game before you do the learning. This means that you can try to learn a classifier that will minimize the regret by making a proper local decision. Although, as far as I know, and can see from the paper, they didn't train AlphaGo jointly over the game sequence.

But! Now they have a lot of data and they can repeat the process.

[1]: http://arxiv.org/abs/1502.02206

[2]: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162...


I don't think we're talking about the same concept. I'm not familiar with the concept of label bias, and the literature I'm familiar with has not referred to label bias as the problem I'm talking about. Also, I'm not sure how a problem with probabilistic graphical models translates to the neural net policies of AlphaGo.

I fail to see how a "per-state normalization of transition scores" translates to there being a bias in value networks towards states with fewer outgoing transitions.


Yes, the "label bias" is more of a structured learning / joint learning term that is present in natural language processing. But reinforcement learning suffers only if you do the learning to minimize local loss of the decision (label) - if you try to build a classifier that minimizes its loss on local decisions, instead on sequence of decisions.

Their value policy network isn't trained jointly and can compound errors. There are approaches with deep neural networks that don't have a joint training but work pretty well. The reason is that networks have a pretty good memory/representation and by that they avoid much of the problems. But for huge games like Go it is quite possible that more games need to be played for these non-structured models to work well.


Again, I don't think we're talking about the same concept. I also fail to see how training over an entire trajectory is going to help you with trajectories you've never seen. Also, these nets are definitely trained with discounted long-term rewards.


They train using trajectories but train them to guess the trajectory locally, not globally. Discounted long-term rewards are just a hack, they aren't joint learning.

The concept of label bias, or decision bias is a joint/structured learning concept. It is a machine learning concept, it has nothing to do with the application. There are training modes with mathematical guarantee that the local decisions will minimize the future regret.

Joint learning is done not on the whole permutation but on the markov-chain of decisions, which is sometimes a good enough assumption. For example, the value policy network of AlphaGo is percisely a Markov chain, given a state, tell me which next state has the highest probability of victory. The search then tries to find the sequence of moves that will maximize the probability, and then it makes the best local decision (one move). It works like limited depth min-max or beam search. They do rollouts (play the whole game) to train the value network, but it is now a question if they train it to minimize the local loss of the made decisions, or if they train it to minimize the future regret of a local decision. As I've stated before, minimizing joint loss over the sequence, or minimizing local loss over each of made decisions, is exactly influencing if there will be bias or not.

The whole point of reinforcement learning is to create a huge enough dataset to overcome the trajectories-not-seen problem. The training of the models for playing Go is entirely a whole different kind of a problem.

Now when they have hundreds of millions of meaningful games they can skip the reinforcement learning and just learn from the games.

The illustration of the "label bias" problem is available in one source I referenced. Terms like compounding errors and unseen state are there. The "label bias" is present only in discriminative models not generative ones. Which means that AlphaGo - being a discriminative model, can suffer from "label bias" if it wasn't trained to avoid it.


Yes, I'm pretty sure we're not talking about the same thing. I'm precisely talking about the trajectories not seen problem. Nothing is going to save you from the fact that the net has not seen a certain state before.


That's not really a problem. Given a large enough dataset you want to generalize from it - there are always states not present in the dataset - the whole point is now to extract features out of your dataset to allow generalization on unseen states. Seeing all of the Go games isn't possible.

The compounding errors problem that stems from decision bias isn't because you haven't seen the trajectory, it is because the model isn't trained jointly.

We're talking about the same thing. You just aren't familiar with the difference present between joint learning discriminative models and local decision classifiers (Markov entropy model vs conditional random fields - or recursive CNNs trained on joint loss over the sequence or recursive CNNs trained to minimize the loss of all local decisions).

In the case of Go, one would try to minimize the loss over the whole game of Go, or over the local decisions made during the game of Go. The latter will result in decision bias - that will lead to compounding errors. The joint learning has a guarantee that the compounding error has a globally sound bound. (proofs are information theory based and put mathematical guarantees on discriminative models applied to sequence labelling (or sequence decision making))

edit:

Checkout the lecture below, around the 16 minute mark it has a Super Mario example and describes exactly the problem you mentioned. The presenter is one of leading figures in joint learning.

https://www.cs.umd.edu/media/2015/12/video/17235-daume-stuff...

It is completely supervised learning problem. But, look at reinforcement learning as a process that has to have a step of generating a meaningful game from which a model can learn. After you have generated bazillion of meaningful games you can discard the reinforcement and just learn. You now try to get as close to the global "optimal" policy as you can, instead of trying to go from an idiot player to a master.

Of course, the data will have flaws if your intermediate model plays with a decision bias. So, instead of training the intermediate to have a bias, train it without :D


Yes, Hal Daume is referring to the issue I brought up. I'm not interpreting his comments as referring to issues with training the model jointly - he's referring to exactly what I'm describing - never having even seen the expert make a mistake. The only solution is to generate trajectories more intelligently (which is in line with Daume's comments).


Yes, it is true. In the case of Super Mario he does the learning by simulating level-K BFS from positions that resulted in errors (unseen states) and thus minimizes the regret for the next K moves.

Although, if you checkout his papers, the problems I've talked about, when you have more than enough data and when you know you should be able to generalize well you still can get subpar performance if you don't optimize jointly. AlphaGo model isn't optimizied jointly but its power mostly lies in the extreme representation ability of deep neural networks.


This is great stuff, thank you.


Both are plausible for humans as well, I would say (in a more general sense). Certainly mistake spirals from not quite knowing how to deal with the consequences of the first mistake have happened to me personally, and I recall descriptions of people in poverty having to play more aggressively to get a shot at the part of civilization they want to be in, though unfortunately I don't have a source.


May well also depend to a not so "exact" forward pruning


It is just a complex system leaving one of this basin of attraction to try to reach another one. (probably because of an evaluation of risk that deemed it).

And for a complex system transitions happens to be sensitive to the conditions and with quite a lot of impredictability.

AI cannot do smooth transitions because they lack the intuition of what smooth means, and that's how to win against them.

1) identify a basin of attraction(apparition of a bounded domain of evolution in a space phase) 2) set the AI in a well known basin by tricking it; 3) imbalance the AI by throwing garbage behaviour that kick him out of the basin in a random direction 4) let the human win in the chaos that ensues.

Of course it is better done with a software to help you. A real time spase phase analysor.

The point is like in a lot of domain, construction of an AI requires more energy than a software for sabotaging it.

But once you get the framework of thoughts to win against an AI you can get all the AI.


There is a possibility that the AlphaGo team, knowing that it has already won a decisive victory, wanted to spare the champion a crushing 5-0 defeat and so commanded the AI to lose on purpose.


The AlphaGo team has stated that they no longer have the ability to make changes to the AI used for the match. The rules of the exhibition match prohibit that.

They maybe could have anyway, but it would be cheating: just the same as if they'd Mechanical Turk'ed it by e.g. having Ke Jie actually choose the moves to play.


Yesterday I kind of entertained the thought that they could have tried to make Alphago waste a move somewhere at the beginning of the game to give Sedol the equivalent of a stone advantage and see how Alphago could handle that. But it's clear that any move like that would have been immediately detected by the professionals who can read the game as I read the morning newspapers, so no.


I don't know why you're being down voted. The fact that the human won, makes it now so much more interesting! It surely is the best outcome possible both for human go players and Google as well (having won the first 3 already).


Nah, they let the european champion lose 5-0.


Incredibly unlikely.


In the post-game press conference I think Lee Sedol said something like "Before the matches I was thinking the result would be 5-0 or 4-1 in my favor, but then I lost 3 straight... I would not exchange this win for anything in the world."

Demis Hassabis said of Lee Sedol: "Incredible fighting spirit after 3 defeats"

I can definitely relate to what Lee Sedol might be feeling. Very happy for both sides. The fact that people designed the algorithms to beat top pros and the human strength displayed by Lee Sedol.

Congrats to all!


Yep! As someone who was rooting for DeepMind, I like this result, for two reasons: Lee Sedol earned it - he's behaved like a true sportsman all the way - and it gives us some interesting information (yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound).


>yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound

I think it's premature, establishing bounds with good confidence interval requires tens or hundreds of games. Specifically, 3:2 result would be really inconclusive.


The first data point tells you more about bounds than any individual subsequent data point. Knowing that a human can defeat AlphaGo is enormously important.

To use an analogy, having confirmation of contact by even a single alien species would be hugely important, way more so than exactly nailing down the number of alien species. Knowing that something is even possible is oftentimes the most important aspect that needs to be ascertained, and contact (or a win, in Lee's case) does that unequivocally.


At the time chess was "the" problem, humans could beat computer too from time to time. Now the chess is solved. Same will likely happen to go in the years to come. But the point of this was not the go itself. It was the demonstration of potential capabilities of neural network and other machine learning algorithms.


Chess isn't solved. It's just that computers have gotten sufficiently better at it than humans that humans don't have a chance.

If Chess were truly solved, then you wouldn't be able to make a new AI program that could do better than even odds against the existing ones. But that's not the case, and incremental advancements in Chess-playing programs are made all the time. There are even tournaments where Chess programs play each other. If Chess were solved, such a thing wouldn't make any sense, just like how there are no Tic-Tac-Toe tournaments because that game is solved.


Even that isn't strong enough. The game is determinstic.

If chess were _solved_ we'd know a strategy to allow one of the players (likely white) to always win, or for either of them to always force a draw. (and we'd know which of these strategies were possible for chess).

Consider, say there is a first move white could choose such that no matter what moves black makes, white will win. Then the first would be true. Instead consider, that there is no such move, and any first move has choices where either could win-- if some of those are ones which would force a black win, then the first is again true but for black. Otherwise, a draw can always be forced. These are the only possible outcomes for a solved game of chess.


I've read some articles in Korean press that suggested AlphaGo team picked Lee as their match and not Ke Jie (currently #1 ranked, 19 years young) because there's a lot more public records of Lee's plays over his much longer pro career (nearly 20 years now). Thus more material for AlphaGo to train with and against.

So it's not totally arrogant of Ke Jie to suggest he could beat AlphaGo. AlphaGo has not much 'experience' in dealing with Ke Jie.

And Lee winning of game 4 shows a human is still indeed more capable than any AI. He basically reprogrammed his game on his own. Sorta.


That was specifically discussed in the after match conference today. Demis said that it wasn't trained specifically against Lee but was trained against strong amateurs on the Internet initially before playing against itself. He said that even if they'd wanted to, alpha go requires a much larger number of games than are available by Lee Seoul to train so it wouldn't have been possible anyway.


It was discussed pre-match for game 2 or 3 as well, and Lee's games were described as "a drop in the ocean" I believe.

Came up because of an assertion from the interviewer that chess AI had been trained against it's opponent specifically.


Err, so if AlphaGo wins game 5, does that show 'an AI is still indeed more capable than any human?'


Exactly. The faint whiff of bullshit.


Ke Jie just happens to be #1 right now. Lee Sedol is more impressive in that he's been a world-class Go player for much longer. Twenty years from now, if Ke Jie is as dominant as Lee Sedol is now, then he'd be a good pick then. But either way it's not a big deal; there's not that much daylight between the two, so picking the more famous and well-known one to go up against is a safe bet.

Personally I want to see a discussion game with the top Go experts (including both Lee Sedol and Ke Jie amongst others) competing against the next version of AlphaGo, in a game with much longer time limits.


I don't think top pros view it that way. He is of a new generation that play thousands of games over the internet; that study the game socially, while Lee is much more a loner. Whether Ke Jie will dominate as much as Lee did will say more about the strength of his competitors than his strength vs Lee.


Another reason could be that Lee is a much bigger name in go than Ke Jie. Sure, Ke Jie is stronger now but Lee was a dominant/top player for a long long time.


Many of the reinforcement learning techniques require data sets in the billions to make much sense of it all. This is likely the reason why the AI had to play against itself so much, because there simply wasn't enough data in general, even including all recorded games.

So Lee's games are sort of a drop in a bucket as far the performance of the AI goes.


Interesting point here: if you are the best Go player in the world, and you play against an AI that can beat you, it seems really possible that it makes you that much better. It would be interesting to see what the win ratio would be after match 10, match 50 or so on.


My friends and I (many of us are enthusiastic Go lovers/players) have been following all of the games closely. AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move. While it was a complex setup, in terms of number of searches I can't see it be any more complex than the games before. Indeed because it was a very much a local fight, the number of possible moves were rather limited. As Lee said in the post-game conference, it was the only move that made any sense at all, as any other move would quickly prove to be fatal after half a dozen or so exchanges.

Of course, what's obvious to a human might not be so at all to a computer. And this is the interesting point that I hope the DeepMind researchers would shed some light on for all of us after they dig out what was going on inside AlphaGo at the time. And we'd also love to learn why did AlphaGo seem to go off the rails after this initial stumble and made a string of indecipherable moves thereafter.

Congrats to Lee and the DeepMind team! It was an exciting and I hope informative match to both sides.

As a final note: I started following the match thinking I am watching a competition of intelligence (loosely defined) between man and machine. What I ended up witnessing was incredible human drama, of Lee bearing incredible pressure, being hit hard repeatedly while the world is watching, sinking to the lowest of the lows, and soaring back up winning one game for the human race.. Just incredible up and down in a course of a week. Many of my friends were crying as the computer resigned.


Given the 78 was a "divine inspired" move, and Demis tweeted that AlphaGo's mistake was move 79, my guess is that Black 78 was not in AlphaGo's search tree at all. It had pruned that move as being too "unlike what a human pro would do", and had not considered play beyond that point. Then it had to suddenly backtrack, drop its entire search tree, and start rebuilding from scratch. It took only a couple minutes to do so, but that's still not enough time to build up the depth and breadth of a tree that it had been constructing up to that point in the game. So it ended up playing a suboptimal move simply because Lee Sedol played a move that was simultaneously brilliant/strong and novel/unexpected, and it had to respond somehow.


The problem may well have occurred before then, too. If move 78 was incorrectly pruned from the tree for being unlikely, then AlphaGo had many turns in which it should have been considering counter-moves to move 78 (and thus prevent the situation that allowed move 78 to even happen in the first place), but it didn't.


>"divine-inspired" move

Just letting others know, this expression is a rather common way of saying a single move that changed the course of the game. There were "divine-inspired moves" that AlphaGo made in the first three games too.


Well it's not exactly common. It's used to describe the very best moves at the highest level of play.


I mean yeah, the decisive move _I_ make when I play my noob games is hardly divine, but in top level play, it happens every game.


AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move.

Add to that the moves where AlphaGo basically threw away stones by adding to formations that would be removed from the table. Even I, a complete, lousy, amateur, could see that they were a mistake.


To be fair, those moves were made when AlphaGo was already behind. It's just not any good at dealing with being that far back. The AI just has no concept of what to do while behind: What a human would do is to go for positions that are very complicated, making the chances of sloppy play much higher. Instead, it makes moves that have to be answered in only one or two ways, but that are very easy to read by even an amateur human.

Training an ai to make good play in a bad situation would require it to train in ways that are very different than the AlphaGo vs AlphaGo training that it spent a lot of time doing. And why do that, instead of trying to make itself good while the game is even, or when it's winning?

It's a bit like how it's different to train in chess to play in pro games, vs training to hustle amateurs in the park: You are not making the best move, but a good move that will confuse the opponent the most. You are trying to exploit a bad opponent: Very different play.


Crying tears of joy I hope... or are your friends computers?


So AlphaGo is just a bot after all...

Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad. However, one of the commentators made the observation that each time it did, the moves forced a highly-predictable move by Lee Sedol in response. From the point of view of a Go player, they were non-sensical because they only removed points from the board and didn't advance AlphaGo's position at all. From the point of view of a programmer, on the other hand, considering that predicting how your opponent will move has got to be one of the most challenging aspects of a Go algorithm, making a move that easily narrows and deepens the search tree makes complete sense.


Or maybe it was just the computer version of grasping at straws. None of the future gamestates looked good, so it ended up picking whatever could at least theoretically lead to a comeback, even if that would require Lee Sedol to miss a completely obvious move.


This looks like a possibility to me. I'm not sure that Alphago has any idea how strong an opponent it is facing at any given moment. It's played games against venues and it's played games against professionals. But if a set of complex advanced moves are still likely to lead to a narrow defeat, it might well pick some stupid moves that in theory could allow it to win back a dominant position if the opponent plays badly.Since it doesn't have a model of the competence of its opponent, that might appear to be a viable strategy because against some opponents and in some past games it's played, that could work.

This is really interesting, because forming a model of our opponent and tailoring our strategies appropriately is fundamental to how humans approach competitions.


> Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad.

Maybe its training was focused on winning, not loosing narrowly. So as soon as it became obvious it can't win. It was just making silly moves because it was less researched scenario.

Some people when they see they can't win they do silly moves just for fun.


They're not really silly moves, it's just that in order to get into a good state for AlphaGo, Lee Sedol would have to miss the obvious response to those moves.


If he had made a dumb move and AlphaGo capitalized on it we would be talking about how the AI goaded him into a sense of safety.


If he had made a dumb move in response then we would be talking about how he made a really really dumb move, that even an amateur could have seen.

The moves made by AlphaGo there were very bad.


If the situation was unwinnable then in a certain sense all moves are equally bad.


This is not the reason. Monte Carlo programs typically play bad-looking moves in way ahead / way behind situations. In these situations "bad" moves will sometimes not alter the win probability they measure, so they don't know how to disguinish a more natural move from one that looks bad. This was mentioned in the commentary when one of the DeepMind guys came in.

Go programmers have taken various steps to mitigate this behavior, such as dynamically adjusting komi to trick the engine into thinking it is a closer game, but I don't know if AlphaGo uses any such technique.


A bit like getting a confused parser back on track. We've witnessed the first digital cold sweat.


I'm curious - is this the same thing that it was doing in the matches that it won? People have been talking about how it seemed to be toying with its opponent towards the end of those matches, making moves that didn't gain it much, but maybe it just doesn't actually understand what moves to play some of the time and it simply hasn't mattered before.


I doubt its loss function tries to maximize the prediction of the opponents next move.


I doubt it's considering that directly. What I'm guessing (without having really studied the paper in depth) is that if you have two board positions which would score identically in the end, but where one has fewer remaining liberties than the other (i.e. one is deeper into the tree), then it will be scored more highly.

In other words, the humans commentating the game were evaluating the moves as non-sensical because the outcome (AlphaGo plays here so Lee plays here) is a foregone conclusion and doesn't change the human evaluation of the board position. What it does do is remove uncertainty (AlphaGo plays here, but Lee screws up and plays somewhere else). In their evaluation, humans tend to value that uncertainty (i.e. counting on the possibility of a mistake), but I'd guess that AlphaGo penalizes the uncertainty (i.e. known board positions are scored higher than potential board positions), leading it to over-value simple advancement of the board in the end-game.


The bigger the game board, the more possible states, the better chance a human player has against the AI. Is this the same heuristic in play here - Play a game with more unknowns, gain an edge?


Do you think Lee could use this as a way to crack AlphaGo?


From my limited knowledge of the game, a few of the moves that Lee made before AlphaGo "lost its mind" were a tad on the aggressive side. The conventional wisdom in Go is to prefer more conservative moves (increasingly so as the game progresses). Usually, if your opponent is being overly aggressive then you want to play more conservatively and wait for them to make a mistake, but in AlphaGo's case, it attempted to match Lee's aggressiveness move-for-move, and Lee was able to capitalize.

If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups, and so AlphaGo feels overly compelled to "keep up". In this case, that did it in.

If this is what happened, then yes, I would expect Lee to be able to capitalize.


> If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups

One of the DeepMind guys just confirmed that this is how AlphaGo operates in the press conference.


"When you're up against someone smarter than you are, do something insane and let them think themselves to death."

-- Pyanfar Chanur (C.J. Cherryh)


Which makes sense. It doesn't even have a concept of a human opponent or human-style mistakes - it plays against Lee just like it plays against itself when training.


This is by design and admitted by the developers - in fact it plans only for what it judges optimal opponent moves. A modification to the game may be to keep on reserve a set of options for nonoptimal opponent play although in actual practice the simplicity of design here (maximize probability of winning and only plan on optimal opponent moves) may be one reason why it plays as well as it does.


In that case, theoretically at least, we could train AlphaGo by getting top Go players to play many games against each other where one or both players is making very aggressive moves when reasonable?


Well, that's where AlphaGo and the progress in Go AI that it represents is so exciting! The game of Go is so fluid with such a huge number of possible positions that players tend to adopt certain styles of play en masse. I've heard it said that you can identify a Go player's mentor or "house" just by the style of play they use.

This has also resulted in larger shifts in playing style over time. Studying very old (and I mean very old...700+ years old) games can be entertaining and even educational in the abstract, but you won't want to directly adopt the style of play because the game has evolved.

It's already been mentioned a couple of times that AlphaGo almost certainly represents just such a shift. Top players will learn from it, and I'd even be willing to bet they will beat it with some regularity once they do!

Ultimately, what sets apart Go geniuses is their ability to play creatively in the face of seemingly insurmountable challenges. So the big question is how "creative" AlphaGo can be. Is it merely synthesizing strong play from known positions? Can it introduce novel strategies? And if it does, will it be able to adjust as other Go masters adjust to it and bring their own brand of creativity to play?

To answer your original question, this very well could introduce a new era of more aggressive play to the world of Go. Only time will tell...


AlphaGo will learn from any new styles and apply them effectively without mistakes from fatigue or inattention.

This incarnation of AI is not creative, it wont generate new play styles, that is still the domain of top human players for now. But it will ruthlessly learn and adopt any new and improved strategies. That's really the point to take away from its success so far.


AlphaGo mostly plays against itself, meaning it learns in a very separate environment. It certainly might come up with novel strategies.


I suppose that's possible. However Demis Hassabis has said many people have noted that AlphaGo makes human-like moves. He commented that it made sense since AlphaGo taught itself from the games of human players.


Yes, the starting kernel of its learning this time was a collection of human games. This has caused the Policy Network (which says "given this board state, these moves are worth investigating in more detail") to be biased towards more human-like moves.

But they're already working on a new version of AlphaGo which isn't trained on any human data at all. It starts by making truly random moves and improves from there. This will require much more processing time and probably an order of magnitude more "self-play", but it will probably result in truly novel strategies that aren't part of the current human metagame.


It could also be that human-looking moves are already close to the best possible. Neither AlphaGo nor human players can play god's hand, but they're approaching the same location.


Is Go-Space understood well enough to know either way?

The OMG-AI people claim that AGI would be dangerous because it would reliably innovate in new spaces and out-predict humans.

So a true super-AGI would make go moves that were unexpected and incomprehensible with some percentage of misleading fake-outs, but it would still win most or all of the time.

If the human exploration of Go-Space is close to the god's hand bounds, this can't be true.


My intuition (and it's really only just that) is that Go space is large enough that AI would be able to outplay humans while still not even beginning to approach "perfect" play. If so, then I would also expect that humans should be able to follow the lead of AI into new areas of Go space, and outplay the AI (at least until the AI has a chance to learn and catch up).

We'll know if this is the case in a couple of years, if the competition between human and AI goes back-and-forth (unlike Chess, where after AI was good enough to beat humans, it could do so reliably).

Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.


> Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.

To be fair we can't know how many games Sodol played in his own head to figure this out.


Step 1 of exploiting this is "play a brilliant move to win a fight in the middle game", which is not the easiest thing to repeat.


The crucial play here seems to have been Lee Seedol's "tesuji" at White 78. From what I understand this phrase in Go means something like "clever play" but is something like sneaking up on your opponent with something that they did not see coming. Deepmind CEO confirmed that the machine actually missed the implications of this move as the calculated win percentage did not shift until later. https://twitter.com/demishassabis/status/708928006400581632

Another interesting thing I noticed while catching endgame is that AlphaGo actually used up almost all of its time. In professional Go, once each player uses their original (2 hour?) time block, they have 1 minute left for each move. Lee Sedol had gone into "overtime" in some of the earlier games, and here as well, but previously AlphaGo still had time left from its original 2 hours. In this game, it came down quite close to using overtime before resigning, which is does when the calculated win percentage falls below a certain percentage.


To underscore hyperpape:

Tesuji isn't a trick play, it's more like a power play. Each player can read out how a fight is going and see their line far into the future. Two professionals will pick two lines, two suji, which are in balance and push up against one another tightly.

A tesuji is a part of the line which is suddenly showy or strong. It could mean a failure for the opponent if they had not taken enough of an advantage in the struggle to this point or if they do not have a counter tesuji available.

Indeed, that might be the design of a set line: one side continually loses ground to the other forcing the other to take these small advantages all so that the first side has an opportunity to play a tesuji and return to balance. Many such lines are canonicalized ("joseki") and known to any professional. Moreover, professionals regularly identify potential tesuji and expect their opponents to as well.


Tesuji has no implication that your opponent won't anticipate it. Both sides can know that the tesuji is there, and it's still a tesuji.


AlphaGo was in over time in game 2.


To add to this, it seemed to have no problem makkng moves wothin the 60 second window (at that lte tage of the game).


I suspect that in almost all cases it has already converged on a move well before sixty seconds, and so if forced by time pressure to choose a move before it has achieved the confidence it's looking for, it will probably still do the right thing most of the time (especially in the endgame, when the branching factor is cut down enormously). It just uses additional time because it has it. I would love to see a graph of AlphaGo's thinking showing certainty of time, and how long it was taking to come up with the move it ultimately ended up playing, and how that certainty evolved over its thinking period.


Another way to look at this is just how efficient the human brain is for the same amount of computation.

On one hand, we have racks of servers (1920 CPUs and 280 GPUs) [1] using megawatts (gigawatts?) of power, and on the other hand we have a person eating food and using about 100W of power (when physically at rest), of which about 20W is used by the brain.

[1] http://www.economist.com/news/science-and-technology/2169454...


Definitely not anything close to gigawatts: ALL of Google (not DeepMind, but the entire company) uses only about 260 megawatts.

Probably on the order of one megawatt or so.

http://inhabitat.com/infographic-how-much-energy-does-google...


Still, wow! And a human is using 100W or so: (~400 max) https://sustainability.blogs.brynmawr.edu/2012/07/31/underst...


And don't forget, the human grows organically from a complex system of DNA that also codes for way more than playing Go! And is able to perform a lot of tasks very efficiently, including cooperating together on open ended activities.


And building an AI that can play better than itself :)


On the other hand, they've only been working on Alphago for two years.


We can estimate the power consumption: 1920 CPUs as in cores, or physical packages? If the latter they are ~100W each, so that's 192kW; if it's the former, depending on how many cores per package, a fraction of that. The GPUs are likely to be counting physical packages (and not cores, which is much more) and they draw 300-400W each, for a total of ~300kW. Add a bit of overhead and I'd say 500kW (half a megawatt) is a good rough estimate.


The figures quoted for the cluster are about 25 times higher than those quoted for a single machine, so I would guess the cluster consists of 25 machines. 20 kW, or 166 amperes at 120 volts, per machine seems a bit high to me.


Sure, but AlphaGo will probably evolve much faster. In a few years it will run on much smaller devices, as happened to chess programs.


Exactly - the watts comparison is a bad one. Stockfish running on an iPhone (~5 watts?) can play world class chess.


A single A15 core at around 1Ghz has more Gflops of power than deep blue had across it's whole system (11.38Gflops).

1920 CPUs (a 4-core haswell from 2013 is around 170Gflops). 280 GPUs (previous gen Nvidia K series peaks at around 5200GFLOPS). That's 1,782,400Gflops or around 150,000x more processing power. If they were running latest-gen hardware, then the would be closer to 200,000x faster.

Given that Moore's law is slowing down and the size of the system, we're a long way from considering putting that in a smartphone.


People are focusing way too much on the current hardware that AlphaGo happens to be running on.

AlphaGo is still a very new program (two years since inception). It will get significantly better with more training, or, equivalently, it will stay at the same strength while running on much less hardware.

Don't read too much into what one particular snapshot in its development cycle looks like. Humanity has had hundreds of millions of years to maximize the efficiency of the brain. AlphaGo has had two years. It's not a fair comparison, and more importantly, it's not instructive as to what the future potential of AI algorithms looks like.


It should use less that 0.5 megawatt, 1920 * 150W (High end server CPU) + 280 * 300W (Nvidia Tesla cards)


We need to keep our brain running 24x7x365, though, so in the long run, deep mind is more efficient.


There were a few jokes made during the round about how AlphaGo resigns. Turns out it's just a popup window! http://i.imgur.com/WKWMHLv.png


For anyone wondering, that's Ubuntu Linux (version most likely 14.04; Unity interface).

off-topic: DeepMind should switch to a tiling window manager like i3 for increased keyboard-only productivity :)


The developers probably use whatever window manager they want. I'm personally a fan of i3 also.

But there is nothing wrong with keeping the frontend machine used in this Go match in default Ubuntu desktop environment since its only purpose is to play Go with a graphical user interface anyway.


I know. I was half-joking.

Also given the progress of DeepMind so far, it's very likely that whatever desktop setups they have, is working very well for them.


Probably Goobuntu. ;-)


Probably not, as they are not sitting on a corp workstation on the corp network :)


They could be remoted into one?



AlphaGo resigns

The result "W:Resign" was added to the game information.

Edit: Tinyyy is right.


According to this picture[1], it is more likely "W+Resign". I'm curious why a plus sign is used instead of a colon!

[1] http://gall.dcinside.com/board/view/?id=baduk&no=109200&page...


In go results are Color+amount of points, when counted. So the plus is then left for uniformity in resigns/forfeits


Likely they're using SGF format to store game records:

http://www.red-bean.com/sgf/properties.html#RE


Bad google+ joke withheld.


I think it reads W:Resign


No, it's "The result “W+Resign” was added to the game information"

https://www.dropbox.com/s/c5730ibejbv4wle/AlphaGo.jpg?dl=0


Usually the nomenclature is "W+Resign"


"White wins by resignation".


The title bar looks like it's GoGui, so it's just GoGui's reaction to a Go Text Protocol resignation?


Looks like it, see:

https://github.com/lukaszlew/EasyGoGui/blob/master/src/net/s...

And search for MSG_RESIGN_2


Yeah, I was worried that if the score ends up 5-0, we could never see how AlphaGo resigns. Good to see Lee Sedol's victory.


AlphaGo's weakness was stated in the press conference inadvertently: it considers only the opponent moves in the future which it deems to be the most profitable for the opponent. This leaves it with glaring blind spots when it has not prepared for lines which are surprising to it. Lee Sedol has now learned to exploit this fact in a mere 4 games, whereas the NN requires millions of games to train on in order to alter its playing style. So Lee only needs to find surprising and strong moves (no small feat but also the strong suit of Lee's playing style generally).


You are saying, its weakness is making the mistake of not looking at all the profitable moves.

The contra argument for the sake of the argument goes, he just was lucky to find a local maximum (?) outside of the search space (?) by chance, rather than learning in a few days the universal function (?) that the NN thinks solves go, or at least one fixed point (?) [i.e. the surprisingly wrong expectation].


We were discussing the probability that Sedol would win this game. Everyone, including me, bet 90% that no human would ever win again, let alone this specific game: http://predictionbook.com/predictions/177592

I tried to estimate it mathematically. Using a uniform distribution across possible win rates, then updating the probability of different win rates with bayes rule. You can do that with Laplace's law of succession. I got a 20% that Sedol would win this game.

However a uniform prior doesn't seem right. Eliezer Yudkowsky often says that AI is likely to be much better than humans, or much worse than humans. The probability of it falling into the exact same skill level is pretty implausible. And that argument seems right, but I wasn't sure how to model that formally. But it seemed right, and so 90% "felt" right. Clearly I was overconfident.

So for the next game, with we use Laplace's law again, we get 33% chance that Sedol will win. That's not factoring in other information, like Sedol now being familiar with AlphaGo's strategies and improving his own strategies against it. So there is some chance he is now evenly matched with AlphaGo!

I look forward to many future AI-human games. Hopefully humans will be able to learn from them, and perhaps even learn their weaknesses and how to exploit them.

Depending how deterministic they are, you could perhaps even play the same sequence of moves and win again. That would really embarrass the Google team. I hear they froze AlphaGo's weights to prevent it from developing new bugs after testing.


The argument that AI is either much better or much worse does not apply here. It's not an accident that they chose this point in time to play against Lee Sedol instead of 10 years ago or 10 years in the future. They chose this point in time because they thought that they have a reasonable chance of winning.

Also, he won with white but he will play with black next time, so playing the same sequence of moves can't happen. Additionally, even if the AI didn't incorporate any randomness in the opening (I think it does) it may choose different moves if it gets a different amount of time to think, so Lee Sedol would have to play his moves at exactly the same time as the last game. A couple of seconds deviation only has to lead to a different move in one of the 80 or so moves before the mistake was made to invalidate this strategy.


> It's not an accident that they chose this point in time to play against Lee Sedol instead of 10 years ago or 10 years in the future. They chose this point in time because they thought that they have a reasonable chance of winning.

Most importantly, they chose this point in time because they know that there are several other AI research teams that are also on the right track (including at Facebook). Like circumnavigating the globe or landing on the Moon, 90% of the benefit of it is lost if someone else does it before you. So you don't wait for 100% certainty of winning -- you are maximizing for the chance of winning first, not winning for certain.

Given that, it seems reasonable that they would go after Lee Sedol when they were sure they were better than him, but not too much better than him. So a non-5-0 outcome is, in hindsight, not horribly surprising.


That kind of makes sense, however the point was more that AI progress isn't gradual and continuous. It's (sometimes) rapid and has discontinuities.

Alphago was an entirely new method, using deep convolutional neural networks as a move generator. Therefore there wasn't any guarantee that it had to be just slightly better than previous Go playing algorithms, it could have easily been far above humans.

Likewise this is also the first time Go AI has been given Google scale resources. Both in terms of a team of the best researchers working full time on it, and their computing power. Whereas previous Go projects were all hobbyist things.

And Google didn't wait 10 years to challenge Sedol. The match was arranged only a few months at most after they started developing it.


There are a lot of cases where humans are actually really close to optimal (one example is racing lines taken by F1 drivers being within tenths of a percent of perfect, don't have the link any more though sorry). In this case there are diminishing returns at play, and an AI which is a lot "stronger" than a human might still produce very similar results.


According to the commentary of both streams I was watching, after losing an important exchange in the middle (apparently move 79 https://twitter.com/demishassabis/status/708928006400581632) it seems AlphaGo sort of bugged out and started making wrong moves on an already dead group on the right side of the board. After that it kept repeating similar mistakes until it resigned a lot of moves after. But the game was already won for Lee Sedol after that middle exchange. It was really interesting seeing everyone's reactions to AlphaGo's bad moves though.


I would avoid thinking of this like a traditional computer program that just "bugged out" due to a glitch or a problem in the software. More accurately, it failed to account for the implications of a move on the board and therefore focused its attention in the wrong place. This happens often in games, I can imagine that a chess master playing an amateur might move his knight or bishop into a position that does immediately seem threatening to an amateur, so the amateur responds by moving a pawn somewhere else on the board, while the correct move would have been to attempt to counter the threat. As should be well known by now, in Go you cannot examine the implications of all possible moves, so some things will be missed. In this case, it seems something important was missed at the time, and the implications were not realized by the program until 10 or 15 turns later.


I mean that it bugged out because the moves it made after missing the exchange in the middle were moves that were obviously wrong, not in a "maybe it's up to something" kind of way, but in an objectively 100% bad kind of way. Even if you can't analyze all possibilities AlphaGo made a number of moves that made absolutely no sense at all even to way lower level players.


> a number of moves that made absolutely no sense at all even to way lower level players.

Another possibility is that it was looking way deeper than anybody, and there was a 1% chance or turning the whole game around with those seemingly bad moves. But Lee Sedol blocked that deep move in a way nobody was able to see.


No. That is not what happened. They were genuinely bad moves.


Probably to the program, they were all bad moves, and the ones it made just hemorrhaged more slowly.


You don't have the authority to make that call unless you intimately understand how it AlphaGo works. Please stop.


He's right. The 9d commentators just laughed them off. They were the kinds of moves a 25k might play hoping that his opponent would make a silly mistake. That's exactly what AlphaGo was trying to do. The moves it played did have a "slight" chance of working, if Lee Sedol had responded incorrectly, but that would have never happened. They were the kinds of moves that are insulting when an opponent plays them against me and I'm just a weak amateur player. The moves clearly had no depth to them, and everyone that understands the game agrees on that.


> it seems AlphaGo sort of bugged out and started making wrong moves

Didn't they say that it's not considered a "bug" but rather how AlphaGo "thinks"? "when it's winning it doesn't care about how much it's winning, and when it's losing it doesn't care how bad it's losing"


it does seem like it is suboptimal behavior to keep making obviously dumb plays instead of attempting a comeback like a professional human would.


"Attempting a comeback" is a completely different task that usual, though.

When you're winning, a good move has a mathematical definition; it's a move that, given optimal play by both sides, will lead to victory for you. Computers aren't powerful enough to be able to calculate exactly what moves those are, but it's at least well-defined in a way that they know what they're looking for.

When you're losing, there's zero moves that, given optimal play by both sides, will lead to victory for you (so all moves are equally "bad" in a mathematical sense). Instead, "attempting a comeback" involves hoping your opponent will mess up some way, so in that sense, what constitutes a good move isn't mathematical but more about predicting how your opponent thinks and where they might mess up.

AlphaGo has mostly trained by playing itself, so the ways it thinks its opponent might mess up are probably completely different from how an actual human messes up.


Speculating that the reinforcement learning phase reinforced all the best winning strategies but had few examples of weak positions out of which the AI had to fight.


AlphaGo lost half the games it played against itself, so it's not like it doesn't have millions of training examples. However maybe it didn't learn very well how to recover once it's losing, but rather concentrated on learning how to avoid that in the first place.


> But the game was already won for Lee Sedol after that middle exchange.

Myungwan Kim in his commentary, estimated the game to be worse for white even after 79 if Black had not destroyed it's opportunity for fighting the ko at M-13 - https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h40m1s


I found this comment on that thread quite insightful: https://gogameguru.com/alphago-4/#comment-13410

Edit: here's another great one on MCTS: https://gogameguru.com/alphago-4/#comment-13479


That was really cool! It seemed after the brilliant play in the middle the most probable moves for winning required Lee Sedol to make impossibly bad mistakes for a professional, which would be a prior that AlphaGo doesn't incorporate. I've heard the training data was mostly amateur games so perhaps the value/policy networks were overfit? Or maybe greedily picking the highest probability, common with tree search approaches, is just suboptimal?


Failure to generalize is not always caused by overfitting. Even if there is no overfitting, deep neural networks seem to learn a surprisingly discontinuous function. Consequently, in rare cases, they can misclassify things with great confidence. [0]

From the cited paper,

> [Experimental results] suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set.

Anyway, Monte Carlo Tree Search is bad at losing positions. In general, you want to delay the impending catastrophe as long as possible instead of making stupid moves that make your position worse and worse. However, MCTS uses random rollouts to the end of the game, which sometimes make it difficult to ascertain if the inevitable doom is near or far.

Also, MCTS converges very, very slowly and is likely to miss a unique, single winning continuation.

I think it is probably a combination of both AlphaGo's value network failing to realize the good position of Lee Sedol after his brilliant play, and the MCTS failing to spot the unique winning sequence for Lee, that caused it to make the mistake. But we should probably wait for official analysis from the Deepmind team to see what exactly went wrong.

[0] "Intriguing properties of neural networks" http://arxiv.org/pdf/1312.6199.pdf


I'm hoping someone creates adversarial formations for AlphaGo if Google ever releases their model :)

One thing I've been pondering is if many adversarial samples exist. The board is rather low dimensional (19 x 19) and discrete. While certainly a massive state space, one of the suggestions for why adversarial images work is that the real number line is incredibly dense.

For example our possible Go board space is 2^(log2(3) * 19^2) for Go but 2^(24 * 28^2) for greyscale [0, 1] normalized single precision float imagery for MNIST. Thats an exponentially bigger space (I think something like 1e910 times bigger!), and gets only larger if you train with double precision, have larger images, add multiple color channels, add more nonlinear layers, etc.


I think it's more that the value network includes moves which look plausible but won't concentrate around the answer to a forcing move as having >99% probability. A human has a heuristic: "I must play here else I lose" but AG assumes its opponent might play anywhere that the ANN calls reasonable.


That's not the definition of overfitting.


If the value/policy model is predictive with a dataset containing only amateur games, but fails to generalize to unseen data with professional games, that seems like a case of overfitting to a dataset only containing amateur games. In this case the expected value network may be different for amateur games than professional games.

Is there something I'm missing?


Sorry, I'm being a little academic. Overfitting is when the model fits to noise or error. Overfitting is not synonymous with "inability to generalize beyond the train and test distribution."

For all we know AlphaGo has perfectly fit amateur games, but professional games are on a whole different level


All depends on how you define your data sets I suppose. IGS games include some professional games which, if it is the case that AG is perfectly trained on the amateur mode, were smoothed away.


Again, the only way to tell if overfitting specifically (and not other factors that are more likely) is the issue is performance on a held out test set.


That's the only empirical way, yes. We can also just talk about what it would mean in theory, though. In this case, we'd say that AlphaGo is well trained to the training data set sampling distribution but that may be far from the actual world game distribution.


That does not relate to overfitting. That problem is closer to "domain adaptation."


Unlikely. Some of AlphaGo's moves were lousy moves, even for an amateur.


You are implying overfitting is the only reason a model ever performs poorly. That is not true. Just because a model does poorly does not mean overfitting must be the issue.


The value/policy model includes a few hundred thousand amateur games, and a few hundred million games of self-play. Once AlphaGo beat Fan Hui those would have been games of self-play versus the equivalent of a professional. So overfitting is probably not a problem. I think it's a basic incentive mismatch - MCTS algorithms tend to like close games, whereas humans will try crazy moves when losing to throw off their opponent.


Wouldn't a million self play games exacerbate overfitting by learning it's own play style which it initially learned from amateur games? I


Neither of those are really evidence overfitting isn't a problem.

You tell whether overfitting is a problem by evaluating performance on a held-out test set.


I wouldn't be surprised if, in a month, Lee Sedol was able to beat AlphaGo in another match. This is what happened in chess. The best computers were able to beat the best humans, until the best humans learned how to play anti-computer chess. This bought them a year or so more, until computers finally dominated for good.


Right now I don't know if I'm more impressed by AlphaGo's artificial intelligence or its artificial stupidity.

Lee Sedol won because he played extremely well. But when AlphaGo was already losing it made some very bad moves. One of them was so bad that it's the kind of mistake you would only expect from someone who's starting to learn how to play Go.


AlphaGo kept making bad moves in such a way the rest of the game becomes more and more predictable - each of the moves Lee Sedol makes could be described as the only obvious one.

On the surface, as an analogy, it sounds like investors in financial markets, capitulating, selling at a loss for a more risky outcome. In hindsight almost always bad moves, but at the time of making them it feels right because it's removing risk. Investors are losing, and then when capitulating they make even worse moves, like selling at market bottoms.

http://www.investopedia.com/terms/c/capitulation.asp?layout=...


Correction: for a more risky outcome should be for a less risky outcome


Maybe it has never been in a losing position before?


The overwhelming majority of games it played were against itself. So yes, it has been almost the same number as it has been winning.



That got me for a second hehe (it's a fake)


Am I right by asumming, that if they would play another game (AlphaGo black and Lee Sedol white), that Lee Sedol could pressure AlphaGo into makeing the same mistake again?


This is an interesting question - if Lee Sedol simply replays the game exactly, does he repeat a win?

I think the answer would be most likely not - the monte carlo tree search is randomized so AlphaGo's responses to Sedol may not be exactly the same, requiring Sedol to not be able to repeat the exact same play.


Assuming it's a parallel monte carlo computation i doubt it would make exactly the same moves.


Could the AlphaGo developers make modifications before the next game to prevent this happening again?


They stated before game 3 that AG had been locked down a while before these games to test the code for bugs and issues. I also think they don't want some of the implications associated with DeepBlue where people modified code even during the games.


After AlphaGo won the first three games, I wondered not if the computer had reached and surpassed human mastery, but instead how many orders of magnitude better it was. Given today's result, it may be only one order, or even less. Perhaps the best human players are relatively close to the maximum skill level for go, and that the pros of the future will not be categorically better than Lee Sedol is today.


Pros themselves estimate their strength 3-4 stones handicap below God: http://senseis.xmp.net/?KamiNoItte:


Exactly what heuristics would they use to know how an omniscient being would play? Unless there are some strong arguments behind it, it sounds like arrogant BS.


Maybe they're extrapolating from relaxed instances that they've solved? If you know that humans can play pretty much optimally on, say, 11x11, and you know how much performance drops off with each expansion of the board?


Yudkowski argues that on the scale of intelligence, Einstein and a village idiot are basically right next to each other [1]. So once artificial intelligence gets close to matching the village idiot, it is not far from completely thrashing Einstein.

Now if that same picture held for Go, then a situation like this would seem to be impossible. Either the computer should be much worse than a human player, or much better. It would be an incredible coincidence that, at the end of six months of training, the computer happened to be of comparable skill to humans.

For the game of Go, at least, Yudkowski is wrong. What other aspects of intelligence are this way? Yudkowski's picture seems appealing, but perhaps it is wrong for many areas of intelligence.

[1] http://lesswrong.com/lw/ql/my_childhood_role_model/


AlphaGo is not an AI in the sense meant by Yudkowsky, I believe. He speaks more of a recursively self-improving AI, an AI which is capable of upgrading itself to be faster and more intelligent.

In the linked article, Yudkowsky even says "On the right side of the scale, you would find Deep Thought—Douglas Adams's original version, thank you, not the chessplayer." The implication is clear that these programs playing chess/Go are nothing like what he is talking about - general AI.

Or so I assume, from my less-than-complete understanding of Yudkowsky's writings.


Only if there's actual room to be much better. It's quite possible that 9p players are already pretty close to perfect play. Maybe a computer can get slightly closer still to that perfect play, but it's never going to be better than perfect play. There's not always room for improvement.


>> It would be an incredible coincidence that, at the end of six months of training, the computer happened to be of comparable skill to humans.

I don't think it's that incredible - By 18 years old a significant proportion of high school students know more about chemistry than the best scientists up to 1800 did, combined.

There's a lot of human games for AlphaGo to look at, but if it is to exceed human level of play, it'll have to figure how to do that by itself. Look at human level games, learn to play human level.

It's quick to get to the edge of human knowledge, and slower to go beyond it.


Yeah, I was waiting for AlphaGo to connect up 10 seemingly unrelated moves into some amazing unpredictable shape that wins the game. But now I think that maybe some humans actually have the ability to play a near-perfect game of Go. Maybe the most skilled human players have already nearly reached the peak of what is possible in a Go game.


I was not expecting Lee Sedol to come back and win a game after his first three losses. AlphaGo seemed to be struggling at the end of the match.


The commentary from Redmond was illuminating for me, in his estimation having played against monte carlo simulations before, at that point the computer just plays very low probability moves with extremely high outcomes if the other player makes a grievous mistake. So the moves looked terrible since the 'retort' is obvious, but if for some reason LSD didn't play the obvious counter, it would catapult Alphago back into the lead. Of course since LSD is an expert, he never missed the retort, but very interesting behavior nonetheless.


I was hoping Lee Sedol would be able to at least win one, humans can 'learn fast'. We cant learn 24/7 in parallel like a computer, but we do seem to have quite the talent at learning 'situational' things very quickly, quite likely because our very survival depended on it in the deep distant past. :-)

I don't believe AlphaGo had the time to do any additional training between matches. So effectively Lee has the ability to 'learn his opponent' while AlphaGo cannot until the entire match set is over because of how long it would take do do additional training.


It's been said elsewhere that AlphaGo studying such a corpus of matches that even every match Lee ever recorded wouldn't be enough to bias it. I presume this also means that adding the last 3 games to that corpus would still not be enough to affect AlphaGo meaningfully.


Bizarre. I felt a palpable sense of relief when I read this. Silly meat-brain that I am.


Living flesh is not meat. Meat is what we call dead flesh.

A living brain is made of neural nano-processors.


Are you familiar with different levels of precision in language, and word play?



This is a matter of perspective. If you were a tiger, you would consider anything that walks or crawls to be "meat".


What special insight do you have into the perspective of a tiger? I think this is anthropomorphization.


When the tiger has killed it, it's dead & is then meat, yes


negative, I am a meat popsicle.


GoGameGuru just published a commentary of the game with some extra insight https://gogameguru.com/lee-sedol-defeats-alphago-masterful-c...

The author thinks that Lee Sedol was able "to force an all or nothing battle where AlphaGo’s accurate negotiating skills were largely irrelevant."

[...]

"Once White 78 was on the board, Black’s territory at the top collapsed in value."

[...]

"This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves."

"We’ve talked about AlphaGo’s ‘bad’ moves in the discussion of previous games, but this was not the same."

"In previous games, AlphaGo played ‘bad’ (slack) moves when it was already ahead. Human observers criticized these moves because there seemed to be no reason to play slackly, but AlphaGo had already calculated that these moves would lead to a safe win."

Which, I add, is something that human players also do: simplify the game and get home quickly with a win. We usually don't give up as much as AlphaGo (pride?), still it's not different.

"The bad moves AlphaGo played in game four were not at all like that. They were simply bad, and they ruined AlphaGo’s chances of recovering."

"They’re the kind of moves played by someone who forgets that their opponent also gets to respond with a move. Moves that trample over possibilities and damage one’s own position — achieving less than nothing."

And those moves unfortunately resemble what beginners play when they stubbornly cling to the hope of winning, because they don't realize the game is lost or because they didn't play enough games yet not to expect the opponent to make impossible mistakes. At pro level those mistakes are more than impossible.

Somebody asked an interesting question during the press conference about the effect of those kind of mistakes in the real world. You can hear it at https://youtu.be/yCALyQRN3hw?t=5h56m15s It's a couple of minutes because of the translation overhead.


Lee Sedol definitely did not look like he was in top form there. I would say (as an amateur) his play in Game 2 was far better. It was the funky clamp position that perhaps forced AlphaGo to start falling apart this game. [0]

I wonder if Lee Sedol can find a way to replicate that in Game 5.

[0]: https://twitter.com/demishassabis/status/708928006400581632


Here's the post-game conference livestream:

https://www.youtube.com/watch?v=yCALyQRN3hw

At the end, Lee asked to play white in the last match, and the Deepmind guys agreed. He feels that AlphaGo is stronger as white, so he views it as more worthwhile to play as black and beat AlphaGo.

Conference over, see you all tomorrow.


Lee asked to be black, because there's 7.5 points advantage for the white who follows the black, and Lee won as a white this time.


Given that Lee was black in games one (due to luck of the draw) and three I expected he would be in game five as well. Perhaps the system permits loser to choose?


normally the color of the 5th game is chosen by the result of the 4th game, but Lee asked to change that rule and make him black, so he can have a shot at beating AlphaGo as black.


Are you sure? I thought I heard the commentators saying that it was randomized again for game 5, but maybe I was mistaken?


Is there a source of the rules for colour choice? I had thought it would be another nigiri as in 1st game. Choosing it based on the 4th game's result would imply Lee Sedol would have been given white. Since he won with white, isn't it an unfair advantage in general? For instance, if it were 2-2 after the 4th game.


This is a great day for humans. Glad to see all those years of human research finally pay off.


On a tangential note, apparently AlphaGo has been added to http://www.goratings.org/, though its current rating of 3533 looks off. Shouldn't it be much higher?


Probably not. Nine games of data is not really that strong of evidence. The algorithm probably gives each player a prior such has assigning one fake loss in order to limit the maximum ranking of players with few games and no losses. This prior is discussed in the paper on the ranking algorithm used here: http://www.remi-coulom.fr/WHR/WHR.pdf


So where can we see this "move 78" that everyone is talking about, without having to go through the entire match counting?



"look at that move. that's an exciting move"

https://www.youtube.com/watch?v=yCALyQRN3hw&feature=youtu.be


Thanks!


You'll know it when you see it. Even as a complete amateur clicking through quickly it stood out.


The post match conference analysis with Lee Sedol and the CEO of deepmind about the different aspects of the game is beautiful to watch. There seems to be a sense of sincerity rather than the greed to win from each of the side.


Now that we have two points for interpolation, expectations are down to near best human competency in go using distributed computation. Also from move 79 to 87 the machine wasn't able to detect the weak position, that shows its weakness. Now Lee can try and aggressive strategy creating multiple hot points of attacks to defeat his enemy. The human player is showing the power of intelligence.


<joke> AlphaGo let Lee Sedol win to lull us all into a false sense of security. The robot apocalypse is well underway. </joke>


okay human race, let's sit back and enjoy our last moments of glory!


Wow! Incredible! Now we know that they have a chance against each other. I would say that this was a very major point... otherwise we wouldn't know whether AlphaGo's powers have progressed to the point where no one can ever beat it. Now I take what Ke Je said much more seriously: http://www.telegraph.co.uk/news/worldnews/asia/china/1219091...


This game is a great example for the people that said that AlphaGo didnt play mistakes when it had a better position because it lowered the margin, because it only looks at winning probability.

AlphaGo made a mistake and realized it was behind, and crumbled because all moves are "mistakes"(they all lead to loss) so any of them is as good as any other.

Im very suprrised and glad to see Humans still have something against AlphaGo, but ultimately, these kind of errors might dissapear if AlphaGo trains 6 more months. It made a tactical mistake, not a theory one.


That doesn't make sense to me. Even if the objective function is win probability, it's used to order all potential moves. Thus given a menu of bad options, it should choose the least-bad one, not start choosing at random.

I think there's something more subtle going on.


Alphago only has an approximation of win probability, and to be more precise of `win probability playing against itself'. That works well in an even match against humans---but when far to the losing side, Alphago's win probability estimate is not very good.


Has anyone heard the crazy theory that alphago bugged out because of daylight savings (the cutover happened mid-game)? Anyone know the exact time at which alphago made its first wonky move?


Also:

Lee Sedol doesn't have RAM that can be crammed with faithfully recalled gigabytes of information, and that allow exhaustive, yet precise searching of vast information spaces. The amount of short-term information Sedol can remember perfectly is very small by comparison, and doing so requires a lot of concentration and effort.

Secondly, the faculty with which Lee Sedol plays Go wasn't designed for the exclusive task of playing Go. Without having to load a different program, Sedol's brain can do many other things well.


AlphaGo obviously made mistakes in game 4 under the pressure from LSD's brilliant play. I'd like to know if the "dumb moves" are caused by the lack of pro data or some more fundamental flaws with the algorithm/methodology. AlphaGo was trained on millions of amateur games, but if Google/Deepmind builds a website where people (including prop players) can play with AlphaGo, it would be interesting to see who improves faster.


AlphaGo doesn't feel pressure.

http://i.imgur.com/ny3RhD4.png

My guess is that Sedol won because he introduced sufficient complexity through cutting points and numerous black groups (see the image). Since AlphaGo uses Value and Policy networks to determine the hot spots to analyse using Monte Carlo tree searches, by making a game rife with lots of simultaneous fights, Sedol dodged the one-two punch of Value and Policy networks combined with MCTS.

In other words, if Sedol can make over a dozen points of interest on the board, AlphaGo cannot deeply assess them all. In the image, there are at least 13 interesting moves and cuts plus up to 15 groups (depending if lone stones are considered groups by AlphaGo). I suspect that this position was far more complex than at any point during any of the three previous games.

It might also explain the meltdown of playing out an unfavourable ladder (the P10 group, as P8 is another possible move).


Could this be overcome by throwing more hardware?


Yes, to a certain extent and certain complexity.

https://en.wikipedia.org/wiki/Go_and_mathematics#Game_tree_c...

Eventually, math wins. There will come a point where humans cannot make the game sufficiently complex to beat a domain-specific machine intelligence (such as AlphaGo).


Apparently AlphaGo made two rather stupid moves on the sidelines, judging from the commentary. Which incidentally is the kind of edgecase one would expect machine learning against itself is bad at learning, since there is a possibility that AlphaGo just tries to avoid such situations. It will be interesting to see if top players are able to exploit such weaknesses once AlphaGo is better understood by high level Go players.


From my perspective as a weak player, those stupid moves are the sorts of moves I would make if I already knew I was losing (but not by a huge margin, otherwise resign) but I was in byo-yomi overtime. It settles already dead stones and gives me a bit more time to keep searching for a better move that could maybe turn the game around, and if the opponent fails to make the obvious response then I can turn the game around from their blunder. The only odd thing is AlphaGo makes these moves without having entered byo-yomi time.


Thanks, this just raises further questions. I have a lot to learn before I can call myself a weak player, but it does sound like AlphaGo regressed to the level of an amateur, which needs a explanaition by itself.


LSD maybe the only human to ever win against AlphaGo.


Fan Hui won 2 / 5 of the informal games of their earlier match


That is really a different AI. AlphaGo has improved significantly since then.


Ke Jie won 8 out of 10 when went against Lee though. Lee is probably not the strongest in the world right now.


I think esturk meant that AlphaGo is barely beatable now, so by the time anyone else gets a chance, it will have improved in the meantime and even a stronger human player won't be able to beat it.


By that measure, Fan Hui beat AlphaGo before Lee Sedol did, since he was playing an earlier and bit weaker version of more or less the same complete, distributed, system.


Lee is 3rd ranked as of now. He was 1st for almost 10 years before that.

Ke Jie is 19 years old. Lee has been a pro for nearly 20 years. Lee is not old but certainly not young. Go game prodigies seem to peak when early 20's, much like mathematicians.


http://lesswrong.com/lw/4gi/age_fluid_intelligence_and_intel...

My personal hypothesis for the reason why Magnus Carlsen is the youngest chess champion of all time is that with the rise of online battles and computer battles, the age where the blend of crystalized intelligence and working intelligence combines lowers.

This has obvious implications for the recent rise of machines immediately correcting human mistakes in mathematics and physics.


It's worth mentioning that while 79 is where Black goes bad, not everyone is sure that 78 actually works for White (http://lifein19x19.com/forum/viewtopic.php?f=15&t=12826). I'm sure we'll eventually get a more complete analysis.


I was hoping to see how AlphaGo would play in overtime. Now I'm curious, does it know how to play in overtime? Can the system handle evaluating how much time it can give itself to 'think' about each move, or does it fall into the halting problem territory and it was programmed to evaluate its probability of winning given the 'fixed' time it had left.


It's called scorboarding. You start coming up with solutions and ranking them and putting the best one up on the scoreboard. When you run out of time, you just go with what you got. Pretty much what humans have.

I'm sure there are many levels of watchdogs in this program.


They've been in overtime before. In game 2 I think. AlphaGo spent about 30 seconds on each move


It was in overtime in game 2, I believe. It did fine.


I just noticed a pun in the name: All Phago, devourer of worlds. Especially funny as beating a stone could be imaged as swallowing.


https://gogameguru.com/lee-sedol-defeats-alphago-masterful-c...

> This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves.

It seems to me, that these bad moves were a direct result of AlphaGo's min-maxing tree search.

According to @demishassabis' tweet, it had had the "realisation" that it had misestimated the board situation at move 87. After that, it did a series of bad moves, but it seems to me that those moves were done precisely because it couldn't come up with any other better strategy – the min-max algorithm used traversing the play tree expects that your opponent responds the best he possibly can, so the moves were optimal in that sense.

But if you are an underdog, it doesn't suffice to play the "best" moves, because the best moves might be conservative. With that playing style, the only way you can do a comeback is to wait for your opponent to "make a mistake", that is, to stray from a series of the best moves you are able to find, and then capitalize that.

I don't think AlphaGo has the concept of betting on the opportunity of the opponent making mistakes. It always just tries to find the "best play in game" with its neural networks and tree search – in terms of maximising the probability of winning. If it doesn't find any moves that would raise the probability, it picks one that will lower it as little as possible. That's why it picks uninteresting sente moves without any strategy. It just postpones the inevitable.

If you're expecting the opponent to play the best move you can think of, expecting mistakes is simply not part of the scheme. In this situation, it would be actually profitable to exchange some "best-of-class" moves to moves that aren't that excellent, but that are confusing, hard to read and make the game longer and more convoluted. Note that this totally DOESN'T work if the opponent is better at reading than you, on average. It will make the situation worse. But I think that AlphaGo is better in reading than Lee Sedol, so it would work here. The point is to "stir" the game up, so you can unlock yourself from your suboptimal position, and enable your better-on-average reading skills to work for you.

It seems to me that the way skilful humans are playing has another evaluation function in addition to the "value" of a move – how confusing, "disturbing" or "stirring up" a move is, considering the opponent's skill. Basically, that's a thing you'd need to skilfully assess your chances to perform an OVERPLAY. And overplay may be the only way to recover if you are in a losing situation.


It would be interesting to see how AlphaGo would play with a handicap – because the skill of overplaying is required from the start in that situation. It might actually suck at handicap games, at the moment.


So I am a completely ignorant of the game go. I mean I've heard about it my whole life but never bothered to understand it ever.

But after watching the summary video of AlphaGos win... I'm fascinated.

I'm sure there are thousands of resources that can teach me the rules, but HN; can you point me to a resource you recommend to get up to speed?


Would it not be beneficial to the deepmind team to open at least the non-distributed version to the public to allow for training on more players? I was surprised to learn that the training set was strong amateur internet play, why not train on the database of the history of pro games?


That's just the initial training set, to get things started.

Alphago learns mostly from playing against itself. (And in the future they are planning to remove the crutch of starting with human generated data entirely.)


Does this mean Lee found AlphaGo's weakness, and AlphaGo wasn't player at a out-of-reach level?


As a human, I'm pulling for the human. As a computer programmer, I'm pulling for the human. As a romantic, I'm pulling for the human. As a fan of science fiction, I'm pulling for the human. To me it will matter even he can pull off a 3-2 loss over a 4-1 loss.


I wonder what the chances are of a cosmic ray or some stray radiation causing AlphaGo to have problems. It's quite a rare event, but when you have 1920 CPUs and 280 GPUs, it might up the probability enough to be something you have to worry about.


I would expect them to be running on Google's distributed infrastructure which has error corrections through end to end checksums


Actually they were running on a single PC!

https://twitter.com/demishassabis/status/708489093676568576

ed: oops I misread!


"Using distributed for match but single machine AG very strong also"

Doesn't this imply they weren't using a single PC?


They're using the distributed version for these matches, this question was just asked in the post match press conference.


I am super excited and all about the Progress AI has made in AlphaGo, but a part of me feels kind of relieved that humans won at least one match. :). Sure, won't last for long.


I wonder if Lee Sedol were to start as white again, and follow the exact same starting sequences, would AlphaGo's algorithms follow the exact same moves as it did before?


It seems Lee Sedol fares better at late to end game than AlphaGo. Makes one wonder if Lee might have won the earlier games had Lee pushed on until the late game stages.


I don't think so. Everyone seems to agree that MCTS only gets stronger in the endgame, and that AlphaGo started making weird moves towards the end because it thought it was already losing, which by definition means there's no strategy leading to a winning position.

...However if it's true that it couldn't find a way out, shouldn't its probability of winning have hit ~0% much earlier? There was still a long time between when it started acting strangely and when it resigned.


maybe the probability was calculated by playing against other alphagos?


Way to go humans. (I felt that AlphaGo was unbeatable and a milestone in computing overthrowing organic brains... I gave in the buzz a bit prematurely).


Was it AlphaGo losing the game, or Lee Sedol winning it?


Lee Sedol winning, and keeping his cool and not make any mistakes. AlphaGo, on the other hand, went bonker especially towards the end but it got into bad territory not because of silly mistakes but brilliant play by Lee Sedol.

Could it possibly be that both of the mistakes were bugs? Perhaps it suggested a non-sensical position such as (25.23, 13.15), and it was snapped to (19, 13) :D


I dont think they were bugs in a traditional sense. I think AlphaGo picked moves to try and maximize the probability of winning, and at some point that was only by the opponent making a suboptimal response. I remember reading somewhere most of it's training data is from amateur games. The model doesn't have a prior that AlphaGo is playing a professional who won't make a bad response. It probably would have resigned a lot earlier with that prior :)

Another thing to keep in mind is that AlphaGo has no "memory", so every turn it looks at the board fresh. This means if the probabilities are very close you could have it jump around a bit either due to numerical noise from floating point calculations, model errors, or just tiny differences in probability making the behavior appear erratic and quick to change "strategy".


> Perhaps it suggested a non-sensical position such as (25.23, 13.15), and it was snapped to (19, 13) :D

Alphago doesn't work like that..


According to the head of DeepMind, AlphaGo made a mistake in evaluating move 79: https://twitter.com/demishassabis/status/708928006400581632


> Mistake was on move 79, but #AlphaGo only came to that realisation on around move 87

That's cool to think of AlphaGo having "realizations"


It is one way to say that AlphaGo's value network can get wrong.


Honestly, I think that is a meaningless distinction unless AlphaGo actually broke down in the middle of the match


Given the number of errors by the AlphaGo in the last 10 minutes, probably the former.


I think Lee Sedol won the game earlier by destroying AlphaGo's territory in the center. The commentator (Michael Redmond) was quite impressed with what he did there.


Those moves look like AlphaGo had calculated a loss way ahead of time.


Sometimes its just not possible to stop the snowball rolling down hill. You cant always turn a retreat into an advance or flanking move, sometimes the first step backwards just turns into a full on rout.

I got the impression (possibly incorrectly) that AlphaGo was trying to throw curve-balls and be 'unexpected' in a way that might have 'forced' a mistake it could exploit.


Errors when already lost don't mean much.


Given the genius of LSD, probably the latter.


Montecarlo bots behave weirdly when losing.


Some pretty questionable moves from AlphaGo in that game, but I'm glad LSD managed to close it out.


Wow that is awesome news. Very happy to read this this morning. It's a good day to be human.


Is there a place I can go to quickly flip through all the board states from the game?



Move 78 gives us hope in the war against the machines.

78 could come to symbolize humanity.

What a special moment.


Machines retreat and regroup, and fix the problems. Next time there is no chance for a move 78.


Move78 sounds like a good name for an undeground human resistance group)


Humans strike back! :P


I hope it won't turn out they let him win.


Hopefully they're not that disrespectful


But, did he pull a Kirk vs Kobayashi Maru? :) (yes, I went there)


Now THIS is news!


The game seemed to be going in AlphaGo's favour when it was half way through. Black (AG) had secured a large area on the top that seemed nearly impossible to invade.

It was amazing to see how Lee Sedol found the right moves to make the invasion work.

This makes me think that if the time for match was three hours instead of two, maybe a professional player will have enough time to read the board deeply enough to find the right moves.


The thing is, Lee Sedol didn't "find the right moves"; the wedge at L11 shouldn't have worked. If black's 79th move had been at L10 instead of K10, Sedol would likely have resigned on the spot.


As far as I can tell from watching Redmond's commentary at the time, there were other options for white in the area.


I was on the other stream; Myungwan Kim and Hajin Lee went way deeper than Redmond typically goes (since they have access to an SGF editor instead of a clumsy demo board, and they aren't performing for the camera as much). They seemed pretty confident in their conclusion that L10 killed white.


Just from Redmond's commentary, killing white in the center is just one of the outcomes that Alphago was impelled to achieve. [for just an arbitrary example] If Alphago killed white in the center but lost the bottom and the top right, it could have still lost. (and Sedol had to acheive multiple objectives also, of course).

Sure, maybe Alphago missed a winning move. But the situation was fluid both strategically and tactically, which might have been why that machine chose it's losing move and moreover, why I don't think it was just a matter of the machine failing to find a kill - especially since I think computers have been able to beat humans on pure tesuji for a while now.


Ah! This matches up with Demis' tweet that black's 79th move was later determined to be a mistake. In MCTS you wouldn't normally go back to moves you've already played unless it's an incomplete information game. However I guess for reinforcement learning (even if it's not actually done during these matches), you would go back and update the estimated values of moves already played, which explains how they know that.


Demis clarified what he meant by that in a subsequent tweet – it wasn't that AlphaGo re-evaluated move 79, it's that the winrate only plummeted after move 87, which was beyond the point of no return: https://twitter.com/demishassabis/status/708934687926804482


But he called move 79 a mistake. So did he learn from a human commentator that it was a mistake, or was that AlphaGo's assessment?

Because it is actually trivial in MCTS to reevaluate moves already taken once you have a better assessment of the positions that follow it.


I would love to hear from the team for sure, but I suspect that they have, by now, fed the game state as of move 78 into AlphaGo, had it look exhaustively at possibilities (way more than in the few minutes it took during the match), and determined that in hindsight move 79 was indeed not an optimal response.


Demis tweeted that while the game was still ongoing.

Anyway I'm pretty sure what happened; I have actually implemented MCTS myself and know that you can trivially update evaluations of old nodes in MCTS as the game continues and you get a better estimate of the line of play actually taken, although you wouldn't normally have a reason to do so. Basically, it can see in hindsight that a move it took was bad, but without additional calculation you wouldn't know whether the alternatives were any better, or also are worse than estimated at the time.


negativist paranoid skeptic could say that it would be a good move for the team to intentionally make αgo lose single battle in the moment it has already won the war..


Maybe Alpha Go understood it won the 5 series, so its reading that it can lose the last 2 games and still win and hence plays suboptimal :P


I believe that after winning 3 out of 5, AlphaGo team started experimenting with variables now that they can relax. Which will in turn be even more helpful for future AlphaGo version than the previous 3 wins.


They've declared AlphaGo's code and network frozen for the purposes of this competition, in part to avoid the PR issues that Deep Blue got for doing exactly that.


Don't want to sound all Conspiracy Theory but somehow this feels planned.. It plays into DeepMind's hand to not have the machine completely trouncing the human. It's less scary and keeps people engaged further into the future.

Also seems in-line with the way Demis was "rooting" for the human this time – they already won so now they focus on PR.


It doesn't seem that plausible if you've been around serious game players. The notion of throwing a game would be seen as extremely rude and condescending to one's opponent.

Once you step on to the field, you play your best. It doesn't matter whether the opponent is a master, or a six-year-old.

EDIT> I have a friend who I've never beaten at an RTS or non-random strategy game, even though we've played hundreds of times. If I thought for a moment that he'd let me win, I'd stop playing, and I hate losing.


Don't post unfounded conspiracy theories if you don't want to sound like a conspiracy theorist.


A lot of comments here are unfounded. It was a thought I had that imo warrants some discussion (which you may disagree)


I agree, this point does warrant discussion.

How could DeepMind prove that AlphaGo played its best in match 4 ?


[flagged]


Wow, I think it's not that farfetched. See the thought wasn't appreciated. It's a worthwhile discussion, imho




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: