Lee Sedol is playing brilliantly! #AlphaGo thought it
was doing well, but got confused on move 87. We
are in trouble now...
Mistake was on move 79, but #AlphaGo only came to
that realisation on around move 87
When I say 'thought' and 'realisation' I just mean the
output of #AlphaGo value net. It was around 70% at
move 79 and then dived on move 87
Lee Sedol wins game 4!!! Congratulations! He was
too good for us today and pressured #AlphaGo into
a mistake that it couldn’t recover from
I'll risk assumption that somebody from Deepmind team is reading this.
Guys, please, publish charts of win prob estimated by alpha go in time during these games. Some heatmap telling which moves did it consider as best for both sides during the games would also be cool, but that's surely more time consuming to prepare.
It would be great to be able to have such things for top pro tournaments in the future.
Actually, it's a tradition that both Players should replay the game after the match, discuss about good moves, bad moves and what they were thinking during the match.
I felt so sorry for Lee Sedol when I saw him lose the second match, facing an empty chair ,and he could only ask one of his friend to review the game.
yeah, not to belabor it, but if you can do something like be a champion in go or chess, chances are you have the mental skillset to do something exponentially more lucrative.
This is not true. Scientific studies have shown that skills in abstract strategy games are so specific to the game that they do not transfer to other tasks. The urban legend about Chess grandmasters all being geniuses is false, in other words. They're actually worse at most jobs than people who've been doing those jobs for awhile, because they lack the experience, and they aren't more predisposed to having a genius intellect that would let them overcome that.
A Chess or Go champion is probably doing the single-most lucrative activity that they are capable of.
Of course chess grandmasters aren't all-around geniuses, but many traits that are prerequisites for a successful chess player certainly are translatable into careers in other fields, and correlate with above average brainpower (good memory, discipline, long attention span, spatial intelligence etc.)
Botvinnik was an accomplished engineer, Euwe had PhD in mathematics, Anatoly Karpov is a millionaire, interestingly enough a lot of recognized chess players had careers in music (like Taimanov or Smyslov)...
Being a grandmaster surely requires an above average intellect, there is no such thing as a chess savant. While it's an urban legend that Kasparov's (arguably the greatest chess player ever) IQ was in the ballpark of 190, he did clock at 135. Such a result is not unheard of, yet still placing him in top 1% or so.
There's a whole body of research out there, but I just scratched the surface of it. Here's the first result that's coming up from a random Google search. https://psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html The references should be worth exploring, and I bet there's more recent research that expands on it as well. Here's a quote from the paper:
> In a recent review, Ericsson and Lehmann (1996) found that (1) measures of general basic capacities do not predict success in a domain, (2) the superior performance of experts is often very domain specific and transfer outside their narrow area of expertise is surprisingly limited and (3) systematic differences between experts and less proficient individuals nearly always reflect attributes acquired by the experts during their lengthy training.
I will say this for Chess or Go grandmasters: They have drive and dedication (and in some cases, compulsiveness). That alone would probably allow them to do better than the average person in another field if they had pursued that field from the get-go. Also, I'd caution you against relying on hand-picked anecdotes; I could just as easily pick out a bunch of Chess players who weren't good for anything else. You'd need broad-based statistics.
Thanks for the link. Well, of course I just quoted a few examples without pretending it's legitimate piece of statistics, however, there hasn't been all that many world chess champions so far (and Botvinnik, Smyslov, Euwe, Karpov, belong in that category), and thus a few PhDs or successful intellectual careers is already enough to show that something clearly sets this bunch apart from the general public. You're less likely to find a plumber or a shelf-stacker in that bus (no disrespect for people in these professions). Whatever causes that is another matter. I agree with you that drive and dedication - and not necessarily some innate talents - could be the crucial factor in this.
I'm not sure you're disagreeing with the person you're responding to. Just because the skills involved are very specific doesn't imply that the people with those skills only are capable of obtaining said skill.
What the person you're responding to means is that if they have the mental capacity (and also discipline) to play the game at such a high level, they could probably also excel in other professions if their goal was to make money instead of doing something they loved.
Of course we can debate whether they would achieve such success if they were doing something they didn't enjoy as much, but I think most people would argue that most chess or go champions could make more money or in other words do more "lucrative" activities if they chose to do them instead of dedicated all their time to the game.
You are relying too much on the concept of a generalized mental capacity, which many researchers (maybe even most) would say does not exist. You seem to be assuming that grandmasters are just amazingly smart and they happen to apply those smarts to one particular skill, but what's closer to the truth is that they happen to be particularly amazing at that one skill, in a way that doesn't really correlate strongly with having amazingly high skill in other areas.
If you took the people who would have grown up to be chess grandmasters, intercepted them just before they started spending significant amounts of time studying chess, and had them instead spend all that time studying programming or physics or something, how do you think they would rank up?
You seem to be arguing against the idea, "A chess grandmaster is a genius and therefore can walk into Google and immediately start doing more and better work than most of their senior programmers". I don't think anyone believes this (correct me if I'm wrong).
I think a more serious idea is, "A chess grandmaster is a genius and therefore learns faster and has a higher performance ceiling and such than most people, and if they spent a couple of years learning to program, they could become an entry-level Google programmer, after which they would rise more quickly than most hires, and eventually would outperform most of Google's senior programmers."
By the way, I think most of the best chess players were extreme chess prodigies. (Just looked at Kasparov, Karpov, Shirov, Kramnik, Anand, and Carlsen's Wiki pages; all but Kramnik had the year listed, and they all became grandmasters around age 17-19, except Carlsen, who was around 14. Kramnik's page mentioned winning a gold medal for the Russian team at age 16, and that he wasn't a grandmaster when selected for the team and this was unusual.) I think this is consistent with them being highly gifted children, who choose to spend their time doing chess.
>I don't think anyone believes this (correct me if I'm wrong).
While not exactly what you wrote, people do think that someone is a genius at some subject can become a genius at another area with less work than it took for someone in either field to originally become a genius at that field, especially society groups the areas together (so sports star becoming master programmer is far less likely to be believed than chess master becoming master programmer).
Though being able to compel yourself to do great things seems like a completely different skill set from being able to do great things on someone else's dime or on a team.
For example, my brilliant programmer friend that can't hold a job. Or the artist that can only paint their own inspirations. Or the savant that doesn't get along with anybody.
I dunno, Bobby Fischer was kinda deranged, for instance, and that often hurts outcomes in otherwise "lucrative" positions. Incidentally, he is credited with raising chess player compensation through his demands.
Maybe not a leadership position but I've worked on the engineering side of quant work and am familiar enough with fairly advanced Go / Chess players (e.g. national youth champions for age brackets for the US) to know that most of them could make enough to retire off one years salary during the boom years of algorithmic trading (~2003-2007ish).
I'm not sure how you define "deranged" (AFAIK, that's not a medically defined term within the DSM-IV) but most of those brilliant people end up being a little 'off'. My father was an academic, one of the people he went to graduate school with was working in Boston while I was a child. He was absolutely groundbreaking work but he's so difficult to collaborate with (think: the mannerisms of Richard Stallman) that he's been floating around universities until his welcome is worn out. He can figure out remarkable things in higher level computational chemistry, but he can't really figure out humans. Had he decided instead during the 80s to work at Renaissance instead of pursuing academic research, he almost certainly would be worth in the low hundreds of millions.
Well, defensive end Olivier Vernon just signed a deal with the New York Giants which will pay him an average annual salary of around 17 million dollars. And there was a huge signing bonus as well.[1]
NFL seasons are 16 regular season games, plus 4 pre-season games. And then there are the playoffs, which a given team might or might not make or advance in.
All told, given that the signing bonus is amortized over all the games he plays, the annual salary, etc., I think it would be fair to say that Vernon will make around a million dollars a game.
Aside: Vernon isn't necessarily "the best" DE in the NFL, but due to market forces and the way things work with the salary cap, free agency rules, etc., the contract he just signed is one of the largest for a defensive player in the league. QB's tend to make even more, but I can't recall a really high profile QB who has signed a big deal recently.
Osweiler QB from the Broncos just signed a huge deal with the Texans...$18mil/year but I haven't heard how much of that is guaranteed so I'm sure.
A really big issue with NFL contracts is this "guaranteed money" thing...I believe the NFL is the only major US sports league who give player contracts without it, so you have to take those salary numbers with a grain of salt.
Best is around 10x that, but at the cost of their body and long term health. Nobody is playing professional football|basketball|baseball in there 50's, and even just 40 is pushing it. Where pro go players can 60+.
I.e. Something like [1] for every move. I was a bit disappointed to learn that such table is only available for that particluar move at the first reading.
Off-topic: we (the human species) have built an AI that can master the game of Go, but we don't yet have the intelligence to publish charts correctly on the web. That blur of a JPG is half a megabyte!
Yeah honestly that would be much better presented as an SVG since its mostly solid colors and geometric shapes. Every browsers supports them, and all the big image editors as well; the trouble is that the rest of our software and culture hasn't caught on. Why am I taking bitmap screenshots of websites when they're almost completely text? We need to start building a tools ecosystem around vector graphics. An extra bonus would be a machine-readable representation of the board as well. We live in a digital age, but its almost like we're cavemen still dealing with the modern equivalent of analog formats. (And at least analog has fidelity)
Screenshots might not be the best example, since there may be some "shiny cute effects" going around. However, reading your idea sprang another one in my head: screenshots shouldn't be restrained to the same rendering resolution as the rest of the system/the screenshot resolution should be configurable.
I don't require the screenshot to be instantaneous, I require it to appear instantaneous. In that sense, if the whole rendering pipeline is working to give me a framerate of 60 fps, then I could spend ten times as much rendering a screenshot without that delay being noticeable. Also, why on earth does Windows (don't know the behaviour on Linux/Mac) apply ClearType to a screenshot? That has always bugged me, there are some situations where you tolerate it and others where it hurts.
NN is (was) also used in neapolitan comedy, derived from earlier use throughout the Roman Empire and Middle Age.
The "figlio di NN", or "son of NN" means someone who was found and adopted (typically by nuns) and whose parents were unknown.
NN can be used in general for people whose origin is uncertain. I think in this specific case it's a bit misleading - although the wikipedia article seems to suggest that NN can also be used as a synonym for "unknown", although from a historical perspective it is a bit incorrect.
The literal translation from Latin creates some confusion if you didn't know the context.
Not so coincidentally,
"This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves." [1]
The bad moves in the eyes of humans could be risky bets or the horizon effect. [1] [2] AlphaGo's use of deep neural nets (value networks) to evaluate board positions should significantly help counter the horizon effect, but since move 78 by Lee Sedol which turned the situation around was unexpected by some top pros (Gu Li referred to it as the 'hand of god' [1]), the patterns which follow are likely rare in possible game states and therefore not strongly embedded into the value networks, leading to AlphaGo's loss.
I hope the DeepMind team will help enlighten us on this in the near future.
78 was hard, but not impossible: If you watch the AGA commentary of the game, they had two pros at the time 78 happened, and they found 78 as the best answer a few minutes before Lee did, expecting AlphaGo to go with a stronger, yet still good not good enough 79, that left the game even, instead of basically lost. Then they were elated about how AlphaGo seemed to have failed to read the whole thing, and instead of doing preparatory moves for a nasty ko fight, the best option at the time, AlphaGo just took a route that provided almost no compensation.
If anything, it seems to me that AlphaGo's problem here might be the time management: Seeing a really scary situation, Lee Seido just sank many minutes into reading the problem, going pretty much all the way to byoyomi time. A human, after seeing something like that, would figure out that their assessment of the situation and their opponent's is very different, and spend a lot of budget trying to figure out what was wrong. AlphaGo just didn't see the problem, and didn't just budgets its time to analyze the position to death. It moved slower than before, but not really that much, and ended up making moves a kyu player could see as terrible.
Either way, I'd love to see Deepmind giving us all a good postmortem of the 70-100 range of moves.
I concur that better time management may have made a difference.
Beyond that, I think that AlphaGo may still be missing a type of component. From the descriptions of it, the policy network generates possible moves from board positions, and the value network evaluates the probability of desirable outcomes. How this is different than human play is that strategic assessment and planning are implicit in the middle layers rather than a 'conscious' element to searching and decision making. I'm not saying that this is a necessary component as AlphaGo has already done exceptionally well. I do believe this kind of 'middle-out' processing producing and evaluating strategic concepts could make it better handle unusual circumstances. Being trained on high amateur and pro games, it will best respond to the most conventional of those types of games, more unconventional the game becomes, the worse it would fare in terms of efficiency of move generation and choices of which to evaluate.
I got something different from the AGA commentary. The pros did consider 78, but came to the conclusion that it didn't work! I'm just a 1kyu player but I too can't see any way to make 78 work after Black plays 79 as an atari at L10, as suggested by the pros. So I'd be very interested to see some analysis that shows how White could get a fair result after Black 79 at L10.
It would be interesting to see a neural net evolve to take into account player state, not just game state. I play chess, and no where near professional levels, so I don't know if this anecdote is valuable, but if I see my opponent looking at a particular area of the board, I tend to take a second look.
I suppose beating humans isn't AlphaGo's primary motive though - learning to play a perfect game of Go in general is probably more difficult than playing the perfect game against a particular person.
Having the AI use eye-tracking to predict an opponent's moves is a truly terrifying thought. Just by tracking the eye with millisecond precision you could probably work out the strategy they were reading. It's so game-breaking that it shouldn't be allowed as an input, and indeed, is easily defeated with reflective sunglasses anyway (like a lot of pro Poker players wear).
The AI player can't give up information in this manner because it lacks eyes, so I'd say that it should not be able to use this information from the human player.
This actually came up in the commentary. Michael Redmond actually mentioned that some amateurs watch to see where their opponent is looking, but he called it merely a "trick", and he said it is not useful in professional play.
That type of metagame seems useful though, in the sense that you can start to sense the state of mind of your opponent. If they are working harder than you think they need to, you might get away with a riskier move for instance.
Professionals are understandably hesitant to call AlphaGo's plays mistakes after the first few games, because AlphaGo had played a few moves that seemed like obvious mistakes (the eyestealing tesuji under the avalanche and the fifth line shoulder hit) but later proved to be very strong moves. Remember that when go is your career, your reputation is based partly on your ability to analyze games accurately, so calling a move a mistake and finding out later that it was strong would be a big embarrassment.
AlphaGo enlarged a group that was at risk of atari (loss to opponent) instead of correctly trying to divert attention to elsewhere on the board or starting a new fight or simply writing it off. If basically added stones to a group that was almost certainly dead and allowed the opponent to punish it heavily, thus giving away more stones than it had to.
OT: What's with using monotype for quotes? That's breaks line wrapping and makes it hard to read on mobile or small screens. I don't get why people do it.
"People" don't do it; HN does it. Indented lines in a post are always monospaced, on the assumption, I suppose, that they're likely to be code examples. There's no way to turn this off AFAIK.
I know the formatting system. My question is why the users are formatting their text (e.g. with 4 places leading each line) *suc that HN renders it as monospace, which then forces a sideways scroll on mobile and small screens. That seems to make it harder to read for some with no corresponding upside.
I think mostly people forget that > works and that therefore you can do
> quotes like this
since it's markdown - often you get commentish things that allow limited HTML plus indented code blocks, and people get used to using the latter as the only thing that works everywhere
not really: http://i.imgur.com/uxqERat.png (you have to be very careful about what indentation level you're at, how long your lines are, etc, etc, things that people just don't do most of the time)
It's also customary when quoting large sections of text. I'm not sure if this carried over from email, or from academic texts (where if your quote is more than a line or two, it needs to be formatted differently).
Doubtful; I don't think comparing such small W/L distributions will be illustrative.
On the other hand, the Nature paper shows the single 8 GPU machine performs similar to the 64 GPU cluster, but the larger clusters perform a comfortable margin better. [0]
By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version. The figure suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.
Probably better. If we assume a uniform distribution across possible win rates of Sedol vs AlphaGo, then update it with bayes rule, we get 33% chance that Sedol will win the next match.
That's not factoring in other information, like Sedol now being familiar with alphaGo's strategies and improving his own strategies against it.
So there is a good chance he is now evenly matched with AlphaGo, and likely much better than the single machine version.
There's a bit of a slight of hand in this statistic -- yes, they can do runtime on a single machine, but it took the compute power of a small country to train the neural nets that are loaded onto that one machine.
That's not really sleight of hand. Lots of things take more energy to produce than to run. It's like saying a 400W electric motor can put out as much power as a fairly fit human, but it's 'sleight of hand' because it took a whole factory to make the motor.
You're conveniently forgetting that this "AI" is a representation of tens of millions of amateur plays which is far more than a few decades in total time. Not that Lee Sedol needs someone like me to defend him, but remarks like yours are very misleading.
A human is far, far, leagues, more efficient at learning than today's AIs. These AI requires millions of hours of man time of data to even come close to competing at the level of an expert person which did the same, and even arguably far better, in a "few decades".
If you look at DeepMind's numbers, AlphaGo is meant to win more than 50% of the time against human players (expert). Right now, my opinion is that its actual capability in wins is less than 50%.
We're in the very infancy of the field though. Give it a year and it's likely you'll have a neural-net-based Go program that runs on a single machine that can defeat every single human professional. Human Go players have been getting better for millennia; don't assume that a program that's only two years old has no room for improvement left.
It took some serious hardware for Deep Blue to defeat Garry Kasparov. Now there are smartphone apps with that same level of Chess-playing skill. And if anything the AlphaGo approach is more amenable to running on lower-specced hardware without requiring help from Moore's Law (because you simply train it better).
It is believed now that Lee Sedol has a really good ability in reading AlphaGo's moves now (see: Micheal Redmond and AGA Game 4 commentary). Game 5 is going to be very interesting. I'm pretty sure the DeepMind team is just as excited. After game 4, they were noticeably far more ecstatic in the after-game press review.
Hassabis and Silver kind of reminded me of developers given the details of a bug that was notoriously difficult to find.
edit; I can't wait for the reviews of the entire 5 game series. If a book came out I'd very likely buy it. A book discussing both Go and AlphaGo AI at the same time consisting of people among the top of their respective fields would be amazing.
It feels really weird to see someone being showered with congratulations for beating a computer program. What exactly is he being congratulated for? For probably triggering and then capitalizing on a bug in AlphaGo's AI? For showing that human resolve, perseverance and a "fighting spirit" can trump a flawed AI, at least until the AI gets fixed? For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI? For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?
As noted by the live commentator, this has the opportunity to be the third revolution of the game of Go, opening again new ways of playing for the whole of humanity.
Imagine what AI, in different fields, means for humanity if it has so much to teach us, just by being able to "think" differently. I sure hope that one day they start writing philosophy and, doing so, potentially legitimize everything that currently makes humans unique and extraordinary.
(also during the actual game with more details, but I can not find the exact time again).
Edit : can not find game3 commentary but during game 4 he is coming back to it again with interesting details as for why this is a great opportunity to inspire human players : https://youtu.be/yCALyQRN3hw?t=7097
Basically, once in ancient Japan and more recently a Chinese origin player in Japan, both incredibly strong players, surprised everyone with never before seen moves that subsequently where integrated in modern game theory. The hope is that the same could happen here.
Perhaps you'd like a shot at explaining why AI is an "amoral field of study", and why it could "delegitimize everything that currently makes humans unique and extraordinary"?
Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?
> Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?
If we invented dolphins, and they began to replace and displace us, then maybe? I don't think it is about the intelligence, but in how we use it of course. Dolphins don't really have any control of my life or those around me.
That sounded a lot more paranoid than I meant, I actually agree with what I think you are saying
I think Bud wanted to make a different point. Ie that AI doesn't delegitimize us---whatever that even means. Dolphins were just an example. He could have used Neanderthals, or Aliens.
Frankly many pre-modern, pre-civilizarion humans were assholes. Read many studies or accounts of life in relatively isolated tribal societies and hair curlingy awful accounts of violent and sometimes institutionalised abuse are uncomfortably common. Leave the safety catches off basic human urges for long and it can get pretty ugly.
Yes, I know, it's just completely insane to declare that we've somehow qualitatively moved past our brutish origins as long as torture, rape, and murder remain de facto extensions to politics.
I have no reason to doubt that quantitatively we have made some progress - although Pinker's arguments are most definitely not universally accepted in the scientific community.
So I suppose the fact that the opponent is a computer program should not be factored into the reaction to Sedol's win at all.
I hope your reductionist explanation is not accurate because it would imply that we are already so highly conditioned to machines and to AI that this match is thought to be no different from a match between two humans.
The machine beat him 3 times and was more or less expected to win again. It didn't, which would imply that Sedol played an exceptionally good game. Seems pretty obvious to me.
You're acting like following arbitrary sets of rules to maximize a value function isn't something that computers excel at. The only thing that was holding up computers for Go was simply time; there were too many possibilities to consider. Humans are really, really good at heuristically trimming possibilities, but sometimes to the detriment of finding maxima.
Fan Hui, the professional player they beat late last year, already improved his game by quite a bit from the occasional match with AlphaGo (deepmind hired him as a consultant to test). He's won every single game in the last European championship, and moved from around top 600th player in the world to around top 300.
>For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans
Or, writing from the point of view of our mechanical successors to this world, for helping to advance a highly ethical field that could exterminate that genocidaly murderous evolutionary abomination that was the human race. Who incidentally thought they were extraordinary but couldn't even play Go.
I don't think this comment should be so shouted down. Besides the politeness that typically comes with these events, it does feel strange to congratulate, on top of normal politeness, Se-dol's victory.
And for a game that was thought to be many decades away in terms of computer capabilities, this probably should be a time to think of the possible consequences of AI improvement and take a close look at it.
I think (or at least hope) most of the downvotes have nothing to do with congratulations, and only refer to the part where AI is called "an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary".
That line of reasoning would be worth a laugh, if it wasn't so widespread in the general population.
>For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI?
You're treating Lee Seedol as if he is a fixed dataset to be trained on. Why can't he also be a "NN" who can also "refine" his AI[0] and thus be hearder for Alpha Go to compete with?
You're putting too much faith in the machine and dropping the person, that might not be completely fair.
> For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?
If you care about being unique and extraordinary more than about reason, knowledge, truth, the observable reality, and the search for what it really means to be sentient, then and only then you may call AI "amoral".
Also, you are being racist against artificial sentient beings, and being racist is hopefully not what makes humans extraordinary.
If it's true that AlphaGo started making a series of bad moves after its mistake on move 79, this might tie into a classic problem with agents trained using reinforcement learning, which is that after making an initial mistake (whether by accident or due to noise, etc.), the agent gets taken into a state it's not familiar with, so it makes another mistake, digging an even deeper hole for itself - the mistakes then continue to compound. This is one of the biggest challenges with RL agents in the real, physical world, where you have noise and imperfect information to confront.
Of course, a plausible alternate explanation is that AlphaGo felt like it needed to make risky moves to catch up.
The same happens to people, especially people that study theory. You can totally throw them off their game by making a non-standard move, even a relatively bad one as long as it breaks their existing pre-conceived notions about how the game should progress.
Of course against a really strong player you're going to get beaten after that but a weak player strong on theory will have a harder time.
Well also that the machine could study his style of play, whereas he was playing a stranger; one whose style changed during each game thanks to humans hand altering the logic. Not really fair. At least, different to how humans play chess.
To be fair, Lee Sedol was also changing style during the game. The opening for game 4 is heavily based on the one for game 2, except when he made a mistake.
What you are talking about here is called "label bias". [2] It is present only if training is done badly.
When you have a game of Go, or Super Mario level. You don't want to make your decisions by just checking the local features and doing them, because it can be the case that by compounding errors you end up in a state you never saw, and all of the future decisions won't be good.
One can avoid these situations by training jointly over the whole game.
For example, maximum entropy models can work for decision making problems but their training leaves them in a "label bias" state because the training is trying to minimize loss of local decisions, instead of trying to minimize the future regret of current local decision.
The solution to these label bias problems are Conditional Random Fields, or Hidden Markov Models. You could accomplish the same with Recursive Neural Networks if you trained them properly. For example, there is no search part (monte carlo tree search, or dynamic programming [viterbi] like it is in CRFs or HMMs) in RNNs but they are completely adequate for decision based problems (sequence labeling etc.). Why is that the case? Because search results are present in the data, there's no need to search if you can just learn to search from the data.
If DeepMind open-sourced the hundreds of millions of games that AlphaGo played, it is quite possible to train a model that wouldn't need a Monte Carlo search and would work quite well, because you would learn the model to make local decisions to minimize future regret, not to minimize its local loss. [1]
The only reason why reinforcement learning is used is because there are too few human games of Go available for the model to generalize well. Reinforcement learning can be used in the setting of joint learning because you play out the whole game before you do the learning. This means that you can try to learn a classifier that will minimize the regret by making a proper local decision. Although, as far as I know, and can see from the paper, they didn't train AlphaGo jointly over the game sequence.
But! Now they have a lot of data and they can repeat the process.
I don't think we're talking about the same concept. I'm not familiar with the concept of label bias, and the literature I'm familiar with has not referred to label bias as the problem I'm talking about. Also, I'm not sure how a problem with probabilistic graphical models translates to the neural net policies of AlphaGo.
I fail to see how a "per-state normalization of transition scores" translates to there being a bias in value networks towards states with fewer outgoing transitions.
Yes, the "label bias" is more of a structured learning / joint learning term that is present in natural language processing. But reinforcement learning suffers only if you do the learning to minimize local loss of the decision (label) - if you try to build a classifier that minimizes its loss on local decisions, instead on sequence of decisions.
Their value policy network isn't trained jointly and can compound errors. There are approaches with deep neural networks that don't have a joint training but work pretty well. The reason is that networks have a pretty good memory/representation and by that they avoid much of the problems. But for huge games like Go it is quite possible that more games need to be played for these non-structured models to work well.
Again, I don't think we're talking about the same concept. I also fail to see how training over an entire trajectory is going to help you with trajectories you've never seen. Also, these nets are definitely trained with discounted long-term rewards.
They train using trajectories but train them to guess the trajectory locally, not globally. Discounted long-term rewards are just a hack, they aren't joint learning.
The concept of label bias, or decision bias is a joint/structured learning concept. It is a machine learning concept, it has nothing to do with the application. There are training modes with mathematical guarantee that the local decisions will minimize the future regret.
Joint learning is done not on the whole permutation but on the markov-chain of decisions, which is sometimes a good enough assumption. For example, the value policy network of AlphaGo is percisely a Markov chain, given a state, tell me which next state has the highest probability of victory. The search then tries to find the sequence of moves that will maximize the probability, and then it makes the best local decision (one move). It works like limited depth min-max or beam search. They do rollouts (play the whole game) to train the value network, but it is now a question if they train it to minimize the local loss of the made decisions, or if they train it to minimize the future regret of a local decision. As I've stated before, minimizing joint loss over the sequence, or minimizing local loss over each of made decisions, is exactly influencing if there will be bias or not.
The whole point of reinforcement learning is to create a huge enough dataset to overcome the trajectories-not-seen problem. The training of the models for playing Go is entirely a whole different kind of a problem.
Now when they have hundreds of millions of meaningful games they can skip the reinforcement learning and just learn from the games.
The illustration of the "label bias" problem is available in one source I referenced. Terms like compounding errors and unseen state are there. The "label bias" is present only in discriminative models not generative ones. Which means that AlphaGo - being a discriminative model, can suffer from "label bias" if it wasn't trained to avoid it.
Yes, I'm pretty sure we're not talking about the same thing. I'm precisely talking about the trajectories not seen problem. Nothing is going to save you from the fact that the net has not seen a certain state before.
That's not really a problem. Given a large enough dataset you want to generalize from it - there are always states not present in the dataset - the whole point is now to extract features out of your dataset to allow generalization on unseen states. Seeing all of the Go games isn't possible.
The compounding errors problem that stems from decision bias isn't because you haven't seen the trajectory, it is because the model isn't trained jointly.
We're talking about the same thing. You just aren't familiar with the difference present between joint learning discriminative models and local decision classifiers (Markov entropy model vs conditional random fields - or recursive CNNs trained on joint loss over the sequence or recursive CNNs trained to minimize the loss of all local decisions).
In the case of Go, one would try to minimize the loss over the whole game of Go, or over the local decisions made during the game of Go. The latter will result in decision bias - that will lead to compounding errors. The joint learning has a guarantee that the compounding error has a globally sound bound. (proofs are information theory based and put mathematical guarantees on discriminative models applied to sequence labelling (or sequence decision making))
edit:
Checkout the lecture below, around the 16 minute mark it has a Super Mario example and describes exactly the problem you mentioned. The presenter is one of leading figures in joint learning.
It is completely supervised learning problem. But, look at reinforcement learning as a process that has to have a step of generating a meaningful game from which a model can learn. After you have generated bazillion of meaningful games you can discard the reinforcement and just learn. You now try to get as close to the global "optimal" policy as you can, instead of trying to go from an idiot player to a master.
Of course, the data will have flaws if your intermediate model plays with a decision bias. So, instead of training the intermediate to have a bias, train it without :D
Yes, Hal Daume is referring to the issue I brought up. I'm not interpreting his comments as referring to issues with training the model jointly - he's referring to exactly what I'm describing - never having even seen the expert make a mistake. The only solution is to generate trajectories more intelligently (which is in line with Daume's comments).
Yes, it is true. In the case of Super Mario he does the learning by simulating level-K BFS from positions that resulted in errors (unseen states) and thus minimizes the regret for the next K moves.
Although, if you checkout his papers, the problems I've talked about, when you have more than enough data and when you know you should be able to generalize well you still can get subpar performance if you don't optimize jointly. AlphaGo model isn't optimizied jointly but its power mostly lies in the extreme representation ability of deep neural networks.
Both are plausible for humans as well, I would say (in a more general sense). Certainly mistake spirals from not quite knowing how to deal with the consequences of the first mistake have happened to me personally, and I recall descriptions of people in poverty having to play more aggressively to get a shot at the part of civilization they want to be in, though unfortunately I don't have a source.
It is just a complex system leaving one of this basin of attraction to try to reach another one. (probably because of an evaluation of risk that deemed it).
And for a complex system transitions happens to be sensitive to the conditions and with quite a lot of impredictability.
AI cannot do smooth transitions because they lack the intuition of what smooth means, and that's how to win against them.
1) identify a basin of attraction(apparition of a bounded domain of evolution in a space phase)
2) set the AI in a well known basin by tricking it;
3) imbalance the AI by throwing garbage behaviour that kick him out of the basin in a random direction
4) let the human win in the chaos that ensues.
Of course it is better done with a software to help you. A real time spase phase analysor.
The point is like in a lot of domain, construction of an AI requires more energy than a software for sabotaging it.
But once you get the framework of thoughts to win against an AI you can get all the AI.
There is a possibility that the AlphaGo team, knowing that it has already won a decisive victory, wanted to spare the champion a crushing 5-0 defeat and so commanded the AI to lose on purpose.
The AlphaGo team has stated that they no longer have the ability to make changes to the AI used for the match. The rules of the exhibition match prohibit that.
They maybe could have anyway, but it would be cheating: just the same as if they'd Mechanical Turk'ed it by e.g. having Ke Jie actually choose the moves to play.
Yesterday I kind of entertained the thought that they could have tried to make Alphago waste a move somewhere at the beginning of the game to give Sedol the equivalent of a stone advantage and see how Alphago could handle that. But it's clear that any move like that would have been immediately detected by the professionals who can read the game as I read the morning newspapers, so no.
I don't know why you're being down voted. The fact that the human won, makes it now so much more interesting! It surely is the best outcome possible both for human go players and Google as well (having won the first 3 already).
In the post-game press conference I think Lee Sedol said something like "Before the matches I was thinking the result would be 5-0 or 4-1 in my favor, but then I lost 3 straight... I would not exchange this win for anything in the world."
Demis Hassabis said of Lee Sedol: "Incredible fighting spirit after 3 defeats"
I can definitely relate to what Lee Sedol might be feeling.
Very happy for both sides. The fact that people designed the algorithms to beat top pros and the human strength displayed by Lee Sedol.
Yep! As someone who was rooting for DeepMind, I like this result, for two reasons: Lee Sedol earned it - he's behaved like a true sportsman all the way - and it gives us some interesting information (yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound).
>yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound
I think it's premature, establishing bounds with good confidence interval requires tens or hundreds of games. Specifically, 3:2 result would be really inconclusive.
The first data point tells you more about bounds than any individual subsequent data point. Knowing that a human can defeat AlphaGo is enormously important.
To use an analogy, having confirmation of contact by even a single alien species would be hugely important, way more so than exactly nailing down the number of alien species. Knowing that something is even possible is oftentimes the most important aspect that needs to be ascertained, and contact (or a win, in Lee's case) does that unequivocally.
At the time chess was "the" problem, humans could beat computer too from time to time. Now the chess is solved. Same will likely happen to go in the years to come. But the point of this was not the go itself. It was the demonstration of potential capabilities of neural network and other machine learning algorithms.
Chess isn't solved. It's just that computers have gotten sufficiently better at it than humans that humans don't have a chance.
If Chess were truly solved, then you wouldn't be able to make a new AI program that could do better than even odds against the existing ones. But that's not the case, and incremental advancements in Chess-playing programs are made all the time. There are even tournaments where Chess programs play each other. If Chess were solved, such a thing wouldn't make any sense, just like how there are no Tic-Tac-Toe tournaments because that game is solved.
Even that isn't strong enough. The game is determinstic.
If chess were _solved_ we'd know a strategy to allow one of the players (likely white) to always win, or for either of them to always force a draw. (and we'd know which of these strategies were possible for chess).
Consider, say there is a first move white could choose such that no matter what moves black makes, white will win. Then the first would be true. Instead consider, that there is no such move, and any first move has choices where either could win-- if some of those are ones which would force a black win, then the first is again true but for black. Otherwise, a draw can always be forced. These are the only possible outcomes for a solved game of chess.
I've read some articles in Korean press that suggested AlphaGo team picked Lee as their match and not Ke Jie (currently #1 ranked, 19 years young) because there's a lot more public records of Lee's plays over his much longer pro career (nearly 20 years now). Thus more material for AlphaGo to train with and against.
So it's not totally arrogant of Ke Jie to suggest he could beat AlphaGo. AlphaGo has not much 'experience' in dealing with Ke Jie.
And Lee winning of game 4 shows a human is still indeed more capable than any AI. He basically reprogrammed his game on his own. Sorta.
That was specifically discussed in the after match conference today. Demis said that it wasn't trained specifically against Lee but was trained against strong amateurs on the Internet initially before playing against itself. He said that even if they'd wanted to, alpha go requires a much larger number of games than are available by Lee Seoul to train so it wouldn't have been possible anyway.
Ke Jie just happens to be #1 right now. Lee Sedol is more impressive in that he's been a world-class Go player for much longer. Twenty years from now, if Ke Jie is as dominant as Lee Sedol is now, then he'd be a good pick then. But either way it's not a big deal; there's not that much daylight between the two, so picking the more famous and well-known one to go up against is a safe bet.
Personally I want to see a discussion game with the top Go experts (including both Lee Sedol and Ke Jie amongst others) competing against the next version of AlphaGo, in a game with much longer time limits.
I don't think top pros view it that way. He is of a new generation that play thousands of games over the internet; that study the game socially, while Lee is much more a loner. Whether Ke Jie will dominate as much as Lee did will say more about the strength of his competitors than his strength vs Lee.
Another reason could be that Lee is a much bigger name in go than Ke Jie. Sure, Ke Jie is stronger now but Lee was a dominant/top player for a long long time.
Many of the reinforcement learning techniques require data sets in the billions to make much sense of it all. This is likely the reason why the AI had to play against itself so much, because there simply wasn't enough data in general, even including all recorded games.
So Lee's games are sort of a drop in a bucket as far the performance of the AI goes.
Interesting point here: if you are the best Go player in the world, and you play against an AI that can beat you, it seems really possible that it makes you that much better. It would be interesting to see what the win ratio would be after match 10, match 50 or so on.
My friends and I (many of us are enthusiastic Go lovers/players) have been following all of the games closely. AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move. While it was a complex setup, in terms of number of searches I can't see it be any more complex than the games before. Indeed because it was a very much a local fight, the number of possible moves were rather limited. As Lee said in the post-game conference, it was the only move that made any sense at all, as any other move would quickly prove to be fatal after half a dozen or so exchanges.
Of course, what's obvious to a human might not be so at all to a computer. And this is the interesting point that I hope the DeepMind researchers would shed some light on for all of us after they dig out what was going on inside AlphaGo at the time. And we'd also love to learn why did AlphaGo seem to go off the rails after this initial stumble and made a string of indecipherable moves thereafter.
Congrats to Lee and the DeepMind team! It was an exciting and I hope informative match to both sides.
As a final note: I started following the match thinking I am watching a competition of intelligence (loosely defined) between man and machine. What I ended up witnessing was incredible human drama, of Lee bearing incredible pressure, being hit hard repeatedly while the world is watching, sinking to the lowest of the lows, and soaring back up winning one game for the human race.. Just incredible up and down in a course of a week. Many of my friends were crying as the computer resigned.
Given the 78 was a "divine inspired" move, and Demis tweeted that AlphaGo's mistake was move 79, my guess is that Black 78 was not in AlphaGo's search tree at all. It had pruned that move as being too "unlike what a human pro would do", and had not considered play beyond that point. Then it had to suddenly backtrack, drop its entire search tree, and start rebuilding from scratch. It took only a couple minutes to do so, but that's still not enough time to build up the depth and breadth of a tree that it had been constructing up to that point in the game. So it ended up playing a suboptimal move simply because Lee Sedol played a move that was simultaneously brilliant/strong and novel/unexpected, and it had to respond somehow.
The problem may well have occurred before then, too. If move 78 was incorrectly pruned from the tree for being unlikely, then AlphaGo had many turns in which it should have been considering counter-moves to move 78 (and thus prevent the situation that allowed move 78 to even happen in the first place), but it didn't.
Just letting others know, this expression is a rather common way of saying a single move that changed the course of the game. There were "divine-inspired moves" that AlphaGo made in the first three games too.
AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move.
Add to that the moves where AlphaGo basically threw away stones by adding to formations that would be removed from the table. Even I, a complete, lousy, amateur, could see that they were a mistake.
To be fair, those moves were made when AlphaGo was already behind. It's just not any good at dealing with being that far back. The AI just has no concept of what to do while behind: What a human would do is to go for positions that are very complicated, making the chances of sloppy play much higher. Instead, it makes moves that have to be answered in only one or two ways, but that are very easy to read by even an amateur human.
Training an ai to make good play in a bad situation would require it to train in ways that are very different than the AlphaGo vs AlphaGo training that it spent a lot of time doing. And why do that, instead of trying to make itself good while the game is even, or when it's winning?
It's a bit like how it's different to train in chess to play in pro games, vs training to hustle amateurs in the park: You are not making the best move, but a good move that will confuse the opponent the most. You are trying to exploit a bad opponent: Very different play.
Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad. However, one of the commentators made the observation that each time it did, the moves forced a highly-predictable move by Lee Sedol in response. From the point of view of a Go player, they were non-sensical because they only removed points from the board and didn't advance AlphaGo's position at all. From the point of view of a programmer, on the other hand, considering that predicting how your opponent will move has got to be one of the most challenging aspects of a Go algorithm, making a move that easily narrows and deepens the search tree makes complete sense.
Or maybe it was just the computer version of grasping at straws. None of the future gamestates looked good, so it ended up picking whatever could at least theoretically lead to a comeback, even if that would require Lee Sedol to miss a completely obvious move.
This looks like a possibility to me. I'm not sure that Alphago has any idea how strong an opponent it is facing at any given moment. It's played games against venues and it's played games against professionals. But if a set of complex advanced moves are still likely to lead to a narrow defeat, it might well pick some stupid moves that in theory could allow it to win back a dominant position if the opponent plays badly.Since it doesn't have a model of the competence of its opponent, that might appear to be a viable strategy because against some opponents and in some past games it's played, that could work.
This is really interesting, because forming a model of our opponent and tailoring our strategies appropriately is fundamental to how humans approach competitions.
> Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad.
Maybe its training was focused on winning, not loosing narrowly. So as soon as it became obvious it can't win. It was just making silly moves because it was less researched scenario.
Some people when they see they can't win they do silly moves just for fun.
They're not really silly moves, it's just that in order to get into a good state for AlphaGo, Lee Sedol would have to miss the obvious response to those moves.
This is not the reason. Monte Carlo programs typically play bad-looking moves in way ahead / way behind situations. In these situations "bad" moves will sometimes not alter the win probability they measure, so they don't know how to disguinish a more natural move from one that looks bad. This was mentioned in the commentary when one of the DeepMind guys came in.
Go programmers have taken various steps to mitigate this behavior, such as dynamically adjusting komi to trick the engine into thinking it is a closer game, but I don't know if AlphaGo uses any such technique.
I'm curious - is this the same thing that it was doing in the matches that it won? People have been talking about how it seemed to be toying with its opponent towards the end of those matches, making moves that didn't gain it much, but maybe it just doesn't actually understand what moves to play some of the time and it simply hasn't mattered before.
I doubt it's considering that directly. What I'm guessing (without having really studied the paper in depth) is that if you have two board positions which would score identically in the end, but where one has fewer remaining liberties than the other (i.e. one is deeper into the tree), then it will be scored more highly.
In other words, the humans commentating the game were evaluating the moves as non-sensical because the outcome (AlphaGo plays here so Lee plays here) is a foregone conclusion and doesn't change the human evaluation of the board position. What it does do is remove uncertainty (AlphaGo plays here, but Lee screws up and plays somewhere else). In their evaluation, humans tend to value that uncertainty (i.e. counting on the possibility of a mistake), but I'd guess that AlphaGo penalizes the uncertainty (i.e. known board positions are scored higher than potential board positions), leading it to over-value simple advancement of the board in the end-game.
The bigger the game board, the more possible states, the better chance a human player has against the AI. Is this the same heuristic in play here - Play a game with more unknowns, gain an edge?
From my limited knowledge of the game, a few of the moves that Lee made before AlphaGo "lost its mind" were a tad on the aggressive side. The conventional wisdom in Go is to prefer more conservative moves (increasingly so as the game progresses). Usually, if your opponent is being overly aggressive then you want to play more conservatively and wait for them to make a mistake, but in AlphaGo's case, it attempted to match Lee's aggressiveness move-for-move, and Lee was able to capitalize.
If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups, and so AlphaGo feels overly compelled to "keep up". In this case, that did it in.
If this is what happened, then yes, I would expect Lee to be able to capitalize.
> If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups
One of the DeepMind guys just confirmed that this is how AlphaGo operates in the press conference.
Which makes sense. It doesn't even have a concept of a human opponent or human-style mistakes - it plays against Lee just like it plays against itself when training.
This is by design and admitted by the developers - in fact it plans only for what it judges optimal opponent moves. A modification to the game may be to keep on reserve a set of options for nonoptimal opponent play although in actual practice the simplicity of design here (maximize probability of winning and only plan on optimal opponent moves) may be one reason why it plays as well as it does.
In that case, theoretically at least, we could train AlphaGo by getting top Go players to play many games against each other where one or both players is making very aggressive moves when reasonable?
Well, that's where AlphaGo and the progress in Go AI that it represents is so exciting! The game of Go is so fluid with such a huge number of possible positions that players tend to adopt certain styles of play en masse. I've heard it said that you can identify a Go player's mentor or "house" just by the style of play they use.
This has also resulted in larger shifts in playing style over time. Studying very old (and I mean very old...700+ years old) games can be entertaining and even educational in the abstract, but you won't want to directly adopt the style of play because the game has evolved.
It's already been mentioned a couple of times that AlphaGo almost certainly represents just such a shift. Top players will learn from it, and I'd even be willing to bet they will beat it with some regularity once they do!
Ultimately, what sets apart Go geniuses is their ability to play creatively in the face of seemingly insurmountable challenges. So the big question is how "creative" AlphaGo can be. Is it merely synthesizing strong play from known positions? Can it introduce novel strategies? And if it does, will it be able to adjust as other Go masters adjust to it and bring their own brand of creativity to play?
To answer your original question, this very well could introduce a new era of more aggressive play to the world of Go. Only time will tell...
AlphaGo will learn from any new styles and apply them effectively without mistakes from fatigue or inattention.
This incarnation of AI is not creative, it wont generate new play styles, that is still the domain of top human players for now. But it will ruthlessly learn and adopt any new and improved strategies. That's really the point to take away from its success so far.
I suppose that's possible. However Demis Hassabis has said many people have noted that AlphaGo makes human-like moves. He commented that it made sense since AlphaGo taught itself from the games of human players.
Yes, the starting kernel of its learning this time was a collection of human games. This has caused the Policy Network (which says "given this board state, these moves are worth investigating in more detail") to be biased towards more human-like moves.
But they're already working on a new version of AlphaGo which isn't trained on any human data at all. It starts by making truly random moves and improves from there. This will require much more processing time and probably an order of magnitude more "self-play", but it will probably result in truly novel strategies that aren't part of the current human metagame.
It could also be that human-looking moves are already close to the best possible. Neither AlphaGo nor human players can play god's hand, but they're approaching the same location.
Is Go-Space understood well enough to know either way?
The OMG-AI people claim that AGI would be dangerous because it would reliably innovate in new spaces and out-predict humans.
So a true super-AGI would make go moves that were unexpected and incomprehensible with some percentage of misleading fake-outs, but it would still win most or all of the time.
If the human exploration of Go-Space is close to the god's hand bounds, this can't be true.
My intuition (and it's really only just that) is that Go space is large enough that AI would be able to outplay humans while still not even beginning to approach "perfect" play. If so, then I would also expect that humans should be able to follow the lead of AI into new areas of Go space, and outplay the AI (at least until the AI has a chance to learn and catch up).
We'll know if this is the case in a couple of years, if the competition between human and AI goes back-and-forth (unlike Chess, where after AI was good enough to beat humans, it could do so reliably).
Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.
> Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.
To be fair we can't know how many games Sodol played in his own head to figure this out.
The crucial play here seems to have been Lee Seedol's "tesuji" at White 78. From what I understand this phrase in Go means something like "clever play" but is something like sneaking up on your opponent with something that they did not see coming. Deepmind CEO confirmed that the machine actually missed the implications of this move as the calculated win percentage did not shift until later.
https://twitter.com/demishassabis/status/708928006400581632
Another interesting thing I noticed while catching endgame is that AlphaGo actually used up almost all of its time. In professional Go, once each player uses their original (2 hour?) time block, they have 1 minute left for each move. Lee Sedol had gone into "overtime" in some of the earlier games, and here as well, but previously AlphaGo still had time left from its original 2 hours. In this game, it came down quite close to using overtime before resigning, which is does when the calculated win percentage falls below a certain percentage.
Tesuji isn't a trick play, it's more like a power play. Each player can read out how a fight is going and see their line far into the future. Two professionals will pick two lines, two suji, which are in balance and push up against one another tightly.
A tesuji is a part of the line which is suddenly showy or strong. It could mean a failure for the opponent if they had not taken enough of an advantage in the struggle to this point or if they do not have a counter tesuji available.
Indeed, that might be the design of a set line: one side continually loses ground to the other forcing the other to take these small advantages all so that the first side has an opportunity to play a tesuji and return to balance. Many such lines are canonicalized ("joseki") and known to any professional. Moreover, professionals regularly identify potential tesuji and expect their opponents to as well.
I suspect that in almost all cases it has already converged on a move well before sixty seconds, and so if forced by time pressure to choose a move before it has achieved the confidence it's looking for, it will probably still do the right thing most of the time (especially in the endgame, when the branching factor is cut down enormously). It just uses additional time because it has it. I would love to see a graph of AlphaGo's thinking showing certainty of time, and how long it was taking to come up with the move it ultimately ended up playing, and how that certainty evolved over its thinking period.
Another way to look at this is just how efficient the human brain is for the same amount of computation.
On one hand, we have racks of servers (1920 CPUs and 280 GPUs) [1] using megawatts (gigawatts?) of power, and on the other hand we have a person eating food and using about 100W of power (when physically at rest), of which about 20W is used by the brain.
And don't forget, the human grows organically from a complex system of DNA that also codes for way more than playing Go! And is able to perform a lot of tasks very efficiently, including cooperating together on open ended activities.
We can estimate the power consumption: 1920 CPUs as in cores, or physical packages? If the latter they are ~100W each, so that's 192kW; if it's the former, depending on how many cores per package, a fraction of that. The GPUs are likely to be counting physical packages (and not cores, which is much more) and they draw 300-400W each, for a total of ~300kW. Add a bit of overhead and I'd say 500kW (half a megawatt) is a good rough estimate.
The figures quoted for the cluster are about 25 times higher than those quoted for a single machine, so I would guess the cluster consists of 25 machines. 20 kW, or 166 amperes at 120 volts, per machine seems a bit high to me.
A single A15 core at around 1Ghz has more Gflops of power than deep blue had across it's whole system (11.38Gflops).
1920 CPUs (a 4-core haswell from 2013 is around 170Gflops). 280 GPUs (previous gen Nvidia K series peaks at around 5200GFLOPS). That's 1,782,400Gflops or around 150,000x more processing power. If they were running latest-gen hardware, then the would be closer to 200,000x faster.
Given that Moore's law is slowing down and the size of the system, we're a long way from considering putting that in a smartphone.
People are focusing way too much on the current hardware that AlphaGo happens to be running on.
AlphaGo is still a very new program (two years since inception). It will get significantly better with more training, or, equivalently, it will stay at the same strength while running on much less hardware.
Don't read too much into what one particular snapshot in its development cycle looks like. Humanity has had hundreds of millions of years to maximize the efficiency of the brain. AlphaGo has had two years. It's not a fair comparison, and more importantly, it's not instructive as to what the future potential of AI algorithms looks like.
The developers probably use whatever window manager they want. I'm personally a fan of i3 also.
But there is nothing wrong with keeping the frontend machine used in this Go match in default Ubuntu desktop environment since its only purpose is to play Go with a graphical user interface anyway.
AlphaGo's weakness was stated in the press conference inadvertently: it considers only the opponent moves in the future which it deems to be the most profitable for the opponent. This leaves it with glaring blind spots when it has not prepared for lines which are surprising to it. Lee Sedol has now learned to exploit this fact in a mere 4 games, whereas the NN requires millions of games to train on in order to alter its playing style. So Lee only needs to find surprising and strong moves (no small feat but also the strong suit of Lee's playing style generally).
You are saying, its weakness is making the mistake of not looking at all the profitable moves.
The contra argument for the sake of the argument goes, he just was lucky to find a local maximum (?) outside of the search space (?) by chance, rather than learning in a few days the universal function (?) that the NN thinks solves go, or at least one fixed point (?) [i.e. the surprisingly wrong expectation].
We were discussing the probability that Sedol would win this game. Everyone, including me, bet 90% that no human would ever win again, let alone this specific game: http://predictionbook.com/predictions/177592
I tried to estimate it mathematically. Using a uniform distribution across possible win rates, then updating the probability of different win rates with bayes rule. You can do that with Laplace's law of succession. I got a 20% that Sedol would win this game.
However a uniform prior doesn't seem right. Eliezer Yudkowsky often says that AI is likely to be much better than humans, or much worse than humans. The probability of it falling into the exact same skill level is pretty implausible. And that argument seems right, but I wasn't sure how to model that formally. But it seemed right, and so 90% "felt" right. Clearly I was overconfident.
So for the next game, with we use Laplace's law again, we get 33% chance that Sedol will win. That's not factoring in other information, like Sedol now being familiar with AlphaGo's strategies and improving his own strategies against it. So there is some chance he is now evenly matched with AlphaGo!
I look forward to many future AI-human games. Hopefully humans will be able to learn from them, and perhaps even learn their weaknesses and how to exploit them.
Depending how deterministic they are, you could perhaps even play the same sequence of moves and win again. That would really embarrass the Google team. I hear they froze AlphaGo's weights to prevent it from developing new bugs after testing.
The argument that AI is either much better or much worse does not apply here. It's not an accident that they chose this point in time to play against Lee Sedol instead of 10 years ago or 10 years in the future. They chose this point in time because they thought that they have a reasonable chance of winning.
Also, he won with white but he will play with black next time, so playing the same sequence of moves can't happen. Additionally, even if the AI didn't incorporate any randomness in the opening (I think it does) it may choose different moves if it gets a different amount of time to think, so Lee Sedol would have to play his moves at exactly the same time as the last game. A couple of seconds deviation only has to lead to a different move in one of the 80 or so moves before the mistake was made to invalidate this strategy.
> It's not an accident that they chose this point in time to play against Lee Sedol instead of 10 years ago or 10 years in the future. They chose this point in time because they thought that they have a reasonable chance of winning.
Most importantly, they chose this point in time because they know that there are several other AI research teams that are also on the right track (including at Facebook). Like circumnavigating the globe or landing on the Moon, 90% of the benefit of it is lost if someone else does it before you. So you don't wait for 100% certainty of winning -- you are maximizing for the chance of winning first, not winning for certain.
Given that, it seems reasonable that they would go after Lee Sedol when they were sure they were better than him, but not too much better than him. So a non-5-0 outcome is, in hindsight, not horribly surprising.
That kind of makes sense, however the point was more that AI progress isn't gradual and continuous. It's (sometimes) rapid and has discontinuities.
Alphago was an entirely new method, using deep convolutional neural networks as a move generator. Therefore there wasn't any guarantee that it had to be just slightly better than previous Go playing algorithms, it could have easily been far above humans.
Likewise this is also the first time Go AI has been given Google scale resources. Both in terms of a team of the best researchers working full time on it, and their computing power. Whereas previous Go projects were all hobbyist things.
And Google didn't wait 10 years to challenge Sedol. The match was arranged only a few months at most after they started developing it.
There are a lot of cases where humans are actually really close to optimal (one example is racing lines taken by F1 drivers being within tenths of a percent of perfect, don't have the link any more though sorry). In this case there are diminishing returns at play, and an AI which is a lot "stronger" than a human might still produce very similar results.
According to the commentary of both streams I was watching, after losing an important exchange in the middle (apparently move 79 https://twitter.com/demishassabis/status/708928006400581632) it seems AlphaGo sort of bugged out and started making wrong moves on an already dead group on the right side of the board. After that it kept repeating similar mistakes until it resigned a lot of moves after. But the game was already won for Lee Sedol after that middle exchange. It was really interesting seeing everyone's reactions to AlphaGo's bad moves though.
I would avoid thinking of this like a traditional computer program that just "bugged out" due to a glitch or a problem in the software. More accurately, it failed to account for the implications of a move on the board and therefore focused its attention in the wrong place. This happens often in games, I can imagine that a chess master playing an amateur might move his knight or bishop into a position that does immediately seem threatening to an amateur, so the amateur responds by moving a pawn somewhere else on the board, while the correct move would have been to attempt to counter the threat. As should be well known by now, in Go you cannot examine the implications of all possible moves, so some things will be missed. In this case, it seems something important was missed at the time, and the implications were not realized by the program until 10 or 15 turns later.
I mean that it bugged out because the moves it made after missing the exchange in the middle were moves that were obviously wrong, not in a "maybe it's up to something" kind of way, but in an objectively 100% bad kind of way. Even if you can't analyze all possibilities AlphaGo made a number of moves that made absolutely no sense at all even to way lower level players.
> a number of moves that made absolutely no sense at all even to way lower level players.
Another possibility is that it was looking way deeper than anybody, and there was a 1% chance or turning the whole game around with those seemingly bad moves. But Lee Sedol blocked that deep move in a way nobody was able to see.
He's right. The 9d commentators just laughed them off. They were the kinds of moves a 25k might play hoping that his opponent would make a silly mistake. That's exactly what AlphaGo was trying to do. The moves it played did have a "slight" chance of working, if Lee Sedol had responded incorrectly, but that would have never happened. They were the kinds of moves that are insulting when an opponent plays them against me and I'm just a weak amateur player. The moves clearly had no depth to them, and everyone that understands the game agrees on that.
> it seems AlphaGo sort of bugged out and started making wrong moves
Didn't they say that it's not considered a "bug" but rather how AlphaGo "thinks"? "when it's winning it doesn't care about how much it's winning, and when it's losing it doesn't care how bad it's losing"
"Attempting a comeback" is a completely different task that usual, though.
When you're winning, a good move has a mathematical definition; it's a move that, given optimal play by both sides, will lead to victory for you. Computers aren't powerful enough to be able to calculate exactly what moves those are, but it's at least well-defined in a way that they know what they're looking for.
When you're losing, there's zero moves that, given optimal play by both sides, will lead to victory for you (so all moves are equally "bad" in a mathematical sense). Instead, "attempting a comeback" involves hoping your opponent will mess up some way, so in that sense, what constitutes a good move isn't mathematical but more about predicting how your opponent thinks and where they might mess up.
AlphaGo has mostly trained by playing itself, so the ways it thinks its opponent might mess up are probably completely different from how an actual human messes up.
Speculating that the reinforcement learning phase reinforced all the best winning strategies but had few examples of weak positions out of which the AI had to fight.
AlphaGo lost half the games it played against itself, so it's not like it doesn't have millions of training examples. However maybe it didn't learn very well how to recover once it's losing, but rather concentrated on learning how to avoid that in the first place.
> But the game was already won for Lee Sedol after that middle exchange.
Myungwan Kim in his commentary, estimated the game to be worse for white even after 79 if Black had not destroyed it's opportunity for fighting the ko at M-13 - https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h40m1s
That was really cool! It seemed after the brilliant play in the middle the most probable moves for winning required Lee Sedol to make impossibly bad mistakes for a professional, which would be a prior that AlphaGo doesn't incorporate. I've heard the training data was mostly amateur games so perhaps the value/policy networks were overfit? Or maybe greedily picking the highest probability, common with tree search approaches, is just suboptimal?
Failure to generalize is not always caused by overfitting. Even if there is no overfitting, deep neural networks seem to learn a surprisingly discontinuous function. Consequently, in rare cases, they can misclassify things with great confidence. [0]
From the cited paper,
> [Experimental results] suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set.
Anyway, Monte Carlo Tree Search is bad at losing positions. In general, you want to delay the impending catastrophe as long as possible instead of making stupid moves that make your position worse and worse. However, MCTS uses random rollouts to the end of the game, which sometimes make it difficult to ascertain if the inevitable doom is near or far.
Also, MCTS converges very, very slowly and is likely to miss a unique, single winning continuation.
I think it is probably a combination of both AlphaGo's value network failing to realize the good position of Lee Sedol after his brilliant play, and the MCTS failing to spot the unique winning sequence for Lee, that caused it to make the mistake. But we should probably wait for official analysis from the Deepmind team to see what exactly went wrong.
I'm hoping someone creates adversarial formations for AlphaGo if Google ever releases their model :)
One thing I've been pondering is if many adversarial samples exist. The board is rather low dimensional (19 x 19) and discrete. While certainly a massive state space, one of the suggestions for why adversarial images work is that the real number line is incredibly dense.
For example our possible Go board space is 2^(log2(3) * 19^2) for Go but 2^(24 * 28^2) for greyscale [0, 1] normalized single precision float imagery for MNIST. Thats an exponentially bigger space (I think something like 1e910 times bigger!), and gets only larger if you train with double precision, have larger images, add multiple color channels, add more nonlinear layers, etc.
I think it's more that the value network includes moves which look plausible but won't concentrate around the answer to a forcing move as having >99% probability. A human has a heuristic: "I must play here else I lose" but AG assumes its opponent might play anywhere that the ANN calls reasonable.
If the value/policy model is predictive with a dataset containing only amateur games, but fails to generalize to unseen data with professional games, that seems like a case of overfitting to a dataset only containing amateur games. In this case the expected value network may be different for amateur games than professional games.
Sorry, I'm being a little academic. Overfitting is when the model fits to noise or error. Overfitting is not synonymous with "inability to generalize beyond the train and test distribution."
For all we know AlphaGo has perfectly fit amateur games, but professional games are on a whole different level
All depends on how you define your data sets I suppose. IGS games include some professional games which, if it is the case that AG is perfectly trained on the amateur mode, were smoothed away.
Again, the only way to tell if overfitting specifically (and not other factors that are more likely) is the issue is performance on a held out test set.
That's the only empirical way, yes. We can also just talk about what it would mean in theory, though. In this case, we'd say that AlphaGo is well trained to the training data set sampling distribution but that may be far from the actual world game distribution.
You are implying overfitting is the only reason a model ever performs poorly. That is not true. Just because a model does poorly does not mean overfitting must be the issue.
The value/policy model includes a few hundred thousand amateur games, and a few hundred million games of self-play. Once AlphaGo beat Fan Hui those would have been games of self-play versus the equivalent of a professional. So overfitting is probably not a problem. I think it's a basic incentive mismatch - MCTS algorithms tend to like close games, whereas humans will try crazy moves when losing to throw off their opponent.
I wouldn't be surprised if, in a month, Lee Sedol was able to beat AlphaGo in another match. This is what happened in chess. The best computers were able to beat the best humans, until the best humans learned how to play anti-computer chess. This bought them a year or so more, until computers finally dominated for good.
Right now I don't know if I'm more impressed by AlphaGo's artificial intelligence or its artificial stupidity.
Lee Sedol won because he played extremely well. But when AlphaGo was already losing it made some very bad moves. One of them was so bad that it's the kind of mistake you would only expect from someone who's starting to learn how to play Go.
AlphaGo kept making bad moves in such a way the rest of the game becomes more and more predictable - each of the moves Lee Sedol makes could be described as the only obvious one.
On the surface, as an analogy, it sounds like investors in financial markets, capitulating, selling at a loss for a more risky outcome. In hindsight almost always bad moves, but at the time of making them it feels right because it's removing risk. Investors are losing, and then when capitulating they make even worse moves, like selling at market bottoms.
Am I right by asumming, that if they would play another game (AlphaGo black and Lee Sedol white), that Lee Sedol could pressure AlphaGo into makeing the same mistake again?
This is an interesting question - if Lee Sedol simply replays the game exactly, does he repeat a win?
I think the answer would be most likely not - the monte carlo tree search is randomized so AlphaGo's responses to Sedol may not be exactly the same, requiring Sedol to not be able to repeat the exact same play.
They stated before game 3 that AG had been locked down a while before these games to test the code for bugs and issues. I also think they don't want some of the implications associated with DeepBlue where people modified code even during the games.
After AlphaGo won the first three games, I wondered not if the computer had reached and surpassed human mastery, but instead how many orders of magnitude better it was. Given today's result, it may be only one order, or even less. Perhaps the best human players are relatively close to the maximum skill level for go, and that the pros of the future will not be categorically better than Lee Sedol is today.
Exactly what heuristics would they use to know how an omniscient being would play? Unless there are some strong arguments behind it, it sounds like arrogant BS.
Maybe they're extrapolating from relaxed instances that they've solved? If you know that humans can play pretty much optimally on, say, 11x11, and you know how much performance drops off with each expansion of the board?
Yudkowski argues that on the scale of intelligence, Einstein and a village idiot are basically right next to each other [1]. So once artificial intelligence gets close to matching the village idiot, it is not far from completely thrashing Einstein.
Now if that same picture held for Go, then a situation like this would seem to be impossible. Either the computer should be much worse than a human player, or much better. It would be an incredible coincidence that, at the end of six months of training, the computer happened to be of comparable skill to humans.
For the game of Go, at least, Yudkowski is wrong. What other aspects of intelligence are this way? Yudkowski's picture seems appealing, but perhaps it is wrong for many areas of intelligence.
AlphaGo is not an AI in the sense meant by Yudkowsky, I believe. He speaks more of a recursively self-improving AI, an AI which is capable of upgrading itself to be faster and more intelligent.
In the linked article, Yudkowsky even says "On the right side of the scale, you would find Deep Thought—Douglas Adams's original version, thank you, not the chessplayer." The implication is clear that these programs playing chess/Go are nothing like what he is talking about - general AI.
Or so I assume, from my less-than-complete understanding of Yudkowsky's writings.
Only if there's actual room to be much better. It's quite possible that 9p players are already pretty close to perfect play. Maybe a computer can get slightly closer still to that perfect play, but it's never going to be better than perfect play. There's not always room for improvement.
>> It would be an incredible coincidence that, at the end of six months of training, the computer happened to be of comparable skill to humans.
I don't think it's that incredible - By 18 years old a significant proportion of high school students know more about chemistry than the best scientists up to 1800 did, combined.
There's a lot of human games for AlphaGo to look at, but if it is to exceed human level of play, it'll have to figure how to do that by itself. Look at human level games, learn to play human level.
It's quick to get to the edge of human knowledge, and slower to go beyond it.
Yeah, I was waiting for AlphaGo to connect up 10 seemingly unrelated moves into some amazing unpredictable shape that wins the game. But now I think that maybe some humans actually have the ability to play a near-perfect game of Go. Maybe the most skilled human players have already nearly reached the peak of what is possible in a Go game.
The commentary from Redmond was illuminating for me, in his estimation having played against monte carlo simulations before, at that point the computer just plays very low probability moves with extremely high outcomes if the other player makes a grievous mistake. So the moves looked terrible since the 'retort' is obvious, but if for some reason LSD didn't play the obvious counter, it would catapult Alphago back into the lead. Of course since LSD is an expert, he never missed the retort, but very interesting behavior nonetheless.
I was hoping Lee Sedol would be able to at least win one, humans can 'learn fast'. We cant learn 24/7 in parallel like a computer, but we do seem to have quite the talent at learning 'situational' things very quickly, quite likely because our very survival depended on it in the deep distant past. :-)
I don't believe AlphaGo had the time to do any additional training between matches. So effectively Lee has the ability to 'learn his opponent' while AlphaGo cannot until the entire match set is over because of how long it would take do do additional training.
It's been said elsewhere that AlphaGo studying such a corpus of matches that even every match Lee ever recorded wouldn't be enough to bias it. I presume this also means that adding the last 3 games to that corpus would still not be enough to affect AlphaGo meaningfully.
The author thinks that Lee Sedol was able "to force an all or nothing battle where AlphaGo’s accurate negotiating skills were largely irrelevant."
[...]
"Once White 78 was on the board, Black’s territory at the top collapsed in value."
[...]
"This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves."
"We’ve talked about AlphaGo’s ‘bad’ moves in the discussion of previous games, but this was not the same."
"In previous games, AlphaGo played ‘bad’ (slack) moves when it was already ahead. Human observers criticized these moves because there seemed to be no reason to play slackly, but AlphaGo had already calculated that these moves would lead to a safe win."
Which, I add, is something that human players also do: simplify the game and get home quickly with a win. We usually don't give up as much as AlphaGo (pride?), still it's not different.
"The bad moves AlphaGo played in game four were not at all like that. They were simply bad, and they ruined AlphaGo’s chances of recovering."
"They’re the kind of moves played by someone who forgets that their opponent also gets to respond with a move. Moves that trample over possibilities and damage one’s own position — achieving less than nothing."
And those moves unfortunately resemble what beginners play when they stubbornly cling to the hope of winning, because they don't realize the game is lost or because they didn't play enough games yet not to expect the opponent to make impossible mistakes. At pro level those mistakes are more than impossible.
Somebody asked an interesting question during the press conference about the effect of those kind of mistakes in the real world. You can hear it at https://youtu.be/yCALyQRN3hw?t=5h56m15s It's a couple of minutes because of the translation overhead.
Lee Sedol definitely did not look like he was in top form there. I would say (as an amateur) his play in Game 2 was far better. It was the funky clamp position that perhaps forced AlphaGo to start falling apart this game. [0]
I wonder if Lee Sedol can find a way to replicate that in Game 5.
At the end, Lee asked to play white in the last match, and the Deepmind guys agreed. He feels that AlphaGo is stronger as white, so he views it as more worthwhile to play as black and beat AlphaGo.
Given that Lee was black in games one (due to luck of the draw) and three I expected he would be in game five as well. Perhaps the system permits loser to choose?
normally the color of the 5th game is chosen by the result of the 4th game, but Lee asked to change that rule and make him black, so he can have a shot at beating AlphaGo as black.
Is there a source of the rules for colour choice? I had thought it would be another nigiri as in 1st game. Choosing it based on the 4th game's result would imply Lee Sedol would have been given white. Since he won with white, isn't it an unfair advantage in general? For instance, if it were 2-2 after the 4th game.
On a tangential note, apparently AlphaGo has been added to http://www.goratings.org/, though its current rating of 3533 looks off. Shouldn't it be much higher?
Probably not. Nine games of data is not really that strong of evidence. The algorithm probably gives each player a prior such has assigning one fake loss in order to limit the maximum ranking of players with few games and no losses. This prior is discussed in the paper on the ranking algorithm used here: http://www.remi-coulom.fr/WHR/WHR.pdf
The post match conference analysis with Lee Sedol and the CEO of deepmind about the different aspects of the game is beautiful to watch. There seems to be a sense of sincerity rather than the greed to win from each of the side.
Now that we have two points for interpolation, expectations are down to near best human competency in go using distributed computation. Also from move 79 to 87 the machine wasn't able to detect the weak position, that shows its weakness. Now Lee can try and aggressive strategy creating multiple hot points of attacks to defeat his enemy. The human player is showing the power of intelligence.
Wow! Incredible! Now we know that they have a chance against each other. I would say that this was a very major point... otherwise we wouldn't know whether AlphaGo's powers have progressed to the point where no one can ever beat it. Now I take what Ke Je said much more seriously: http://www.telegraph.co.uk/news/worldnews/asia/china/1219091...
This game is a great example for the people that said that AlphaGo didnt play mistakes when it had a better position because it lowered the margin, because it only looks at winning probability.
AlphaGo made a mistake and realized it was behind, and crumbled because all moves are "mistakes"(they all lead to loss) so any of them is as good as any other.
Im very suprrised and glad to see Humans still have something against AlphaGo, but ultimately, these kind of errors might dissapear if AlphaGo trains 6 more months. It made a tactical mistake, not a theory one.
That doesn't make sense to me. Even if the objective function is win probability, it's used to order all potential moves. Thus given a menu of bad options, it should choose the least-bad one, not start choosing at random.
Alphago only has an approximation of win probability, and to be more precise of `win probability playing against itself'. That works well in an even match against humans---but when far to the losing side, Alphago's win probability estimate is not very good.
Has anyone heard the crazy theory that alphago bugged out because of daylight savings (the cutover happened mid-game)? Anyone know the exact time at which alphago made its first wonky move?
Lee Sedol doesn't have RAM that can be crammed with faithfully recalled gigabytes of information, and that allow exhaustive, yet precise searching of vast information spaces. The amount of short-term information Sedol can remember perfectly is very small by comparison, and doing so requires a lot of concentration and effort.
Secondly, the faculty with which Lee Sedol plays Go wasn't designed for the exclusive task of playing Go. Without having to load a different program, Sedol's brain can do many other things well.
AlphaGo obviously made mistakes in game 4 under the pressure from LSD's brilliant play. I'd like to know if the "dumb moves" are caused by the lack of pro data or some more fundamental flaws with the algorithm/methodology. AlphaGo was trained on millions of amateur games, but if Google/Deepmind builds a website where people (including prop players) can play with AlphaGo, it would be interesting to see who improves faster.
My guess is that Sedol won because he introduced sufficient complexity through cutting points and numerous black groups (see the image). Since AlphaGo uses Value and Policy networks to determine the hot spots to analyse using Monte Carlo tree searches, by making a game rife with lots of simultaneous fights, Sedol dodged the one-two punch of Value and Policy networks combined with MCTS.
In other words, if Sedol can make over a dozen points of interest on the board, AlphaGo cannot deeply assess them all. In the image, there are at least 13 interesting moves and cuts plus up to 15 groups (depending if lone stones are considered groups by AlphaGo). I suspect that this position was far more complex than at any point during any of the three previous games.
It might also explain the meltdown of playing out an unfavourable ladder (the P10 group, as P8 is another possible move).
Eventually, math wins. There will come a point where humans cannot make the game sufficiently complex to beat
a domain-specific machine intelligence (such as AlphaGo).
Apparently AlphaGo made two rather stupid moves on the sidelines, judging from the commentary. Which incidentally is the kind of edgecase one would expect machine learning against itself is bad at learning, since there is a possibility that AlphaGo just tries to avoid such situations. It will be interesting to see if top players are able to exploit such weaknesses once AlphaGo is better understood by high level Go players.
From my perspective as a weak player, those stupid moves are the sorts of moves I would make if I already knew I was losing (but not by a huge margin, otherwise resign) but I was in byo-yomi overtime. It settles already dead stones and gives me a bit more time to keep searching for a better move that could maybe turn the game around, and if the opponent fails to make the obvious response then I can turn the game around from their blunder. The only odd thing is AlphaGo makes these moves without having entered byo-yomi time.
Thanks, this just raises further questions. I have a lot to learn before I can call myself a weak player, but it does sound like AlphaGo regressed to the level of an amateur, which needs a explanaition by itself.
I think esturk meant that AlphaGo is barely beatable now, so by the time anyone else gets a chance, it will have improved in the meantime and even a stronger human player won't be able to beat it.
By that measure, Fan Hui beat AlphaGo before Lee Sedol did, since he was playing an earlier and bit weaker version of more or less the same complete, distributed, system.
Lee is 3rd ranked as of now. He was 1st for almost 10 years before that.
Ke Jie is 19 years old. Lee has been a pro for nearly 20 years. Lee is not old but certainly not young. Go game prodigies seem to peak when early 20's, much like mathematicians.
My personal hypothesis for the reason why Magnus Carlsen is the youngest chess champion of all time is that with the rise of online battles and computer battles, the age where the blend of crystalized intelligence and working intelligence combines lowers.
This has obvious implications for the recent rise of machines immediately correcting human mistakes in mathematics and physics.
It's worth mentioning that while 79 is where Black goes bad, not everyone is sure that 78 actually works for White (http://lifein19x19.com/forum/viewtopic.php?f=15&t=12826). I'm sure we'll eventually get a more complete analysis.
I was hoping to see how AlphaGo would play in overtime. Now I'm curious, does it know how to play in overtime? Can the system handle evaluating how much time it can give itself to 'think' about each move, or does it fall into the halting problem territory and it was programmed to evaluate its probability of winning given the 'fixed' time it had left.
It's called scorboarding. You start coming up with solutions and ranking them and putting the best one up on the scoreboard. When you run out of time, you just go with what you got. Pretty much what humans have.
I'm sure there are many levels of watchdogs in this program.
> This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves.
It seems to me, that these bad moves were a direct result of AlphaGo's min-maxing tree search.
According to @demishassabis' tweet, it had had the "realisation" that it had misestimated the board situation at move 87. After that, it did a series of bad moves, but it seems to me that those moves were done precisely because it couldn't come up with any other better strategy – the min-max algorithm used traversing the play tree expects that your opponent responds the best he possibly can, so the moves were optimal in that sense.
But if you are an underdog, it doesn't suffice to play the "best" moves, because the best moves might be conservative. With that playing style, the only way you can do a comeback is to wait for your opponent to "make a mistake", that is, to stray from a series of the best moves you are able to find, and then capitalize that.
I don't think AlphaGo has the concept of betting on the opportunity of the opponent making mistakes. It always just tries to find the "best play in game" with its neural networks and tree search – in terms of maximising the probability of winning. If it doesn't find any moves that would raise the probability, it picks one that will lower it as little as possible. That's why it picks uninteresting sente moves without any strategy. It just postpones the inevitable.
If you're expecting the opponent to play the best move you can think of, expecting mistakes is simply not part of the scheme. In this situation, it would be actually profitable to exchange some "best-of-class" moves to moves that aren't that excellent, but that are confusing, hard to read and make the game longer and more convoluted. Note that this totally DOESN'T work if the opponent is better at reading than you, on average. It will make the situation worse. But I think that AlphaGo is better in reading than Lee Sedol, so it would work here. The point is to "stir" the game up, so you can unlock yourself from your suboptimal position, and enable your better-on-average reading skills to work for you.
It seems to me that the way skilful humans are playing has another evaluation function in addition to the "value" of a move – how confusing, "disturbing" or "stirring up" a move is, considering the opponent's skill. Basically, that's a thing you'd need to skilfully assess your chances to perform an OVERPLAY. And overplay may be the only way to recover if you are in a losing situation.
It would be interesting to see how AlphaGo would play with a handicap – because the skill of overplaying is required from the start in that situation. It might actually suck at handicap games, at the moment.
Would it not be beneficial to the deepmind team to open at least the non-distributed version to the public to allow for training on more players? I was surprised to learn that the training set was strong amateur internet play, why not train on the database of the history of pro games?
That's just the initial training set, to get things started.
Alphago learns mostly from playing against itself. (And in the future they are planning to remove the crutch of starting with human generated data entirely.)
As a human, I'm pulling for the human. As a computer programmer, I'm pulling for the human. As a romantic, I'm pulling for the human. As a fan of science fiction, I'm pulling for the human. To me it will matter even he can pull off a 3-2 loss over a 4-1 loss.
I wonder what the chances are of a cosmic ray or some stray radiation causing AlphaGo to have problems. It's quite a rare event, but when you have 1920 CPUs and 280 GPUs, it might up the probability enough to be something you have to worry about.
I am super excited and all about the Progress AI has made in AlphaGo, but a part of me feels kind of relieved that humans won at least one match. :). Sure, won't last for long.
I wonder if Lee Sedol were to start as white again, and follow the exact same starting sequences, would AlphaGo's algorithms follow the exact same moves as it did before?
It seems Lee Sedol fares better at late to end game than AlphaGo. Makes one wonder if Lee might have won the earlier games had Lee pushed on until the late game stages.
I don't think so. Everyone seems to agree that MCTS only gets stronger in the endgame, and that AlphaGo started making weird moves towards the end because it thought it was already losing, which by definition means there's no strategy leading to a winning position.
...However if it's true that it couldn't find a way out, shouldn't its probability of winning have hit ~0% much earlier? There was still a long time between when it started acting strangely and when it resigned.
Way to go humans. (I felt that AlphaGo was unbeatable and a milestone in computing overthrowing organic brains... I gave in the buzz a bit prematurely).
Lee Sedol winning, and keeping his cool and not make any mistakes. AlphaGo, on the other hand, went bonker especially towards the end but it got into bad territory not because of silly mistakes but brilliant play by Lee Sedol.
Could it possibly be that both of the mistakes were bugs? Perhaps it suggested a non-sensical position such as (25.23, 13.15), and it was snapped to (19, 13) :D
I dont think they were bugs in a traditional sense. I think AlphaGo picked moves to try and maximize the probability of winning, and at some point that was only by the opponent making a suboptimal response. I remember reading somewhere most of it's training data is from amateur games. The model doesn't have a prior that AlphaGo is playing a professional who won't make a bad response. It probably would have resigned a lot earlier with that prior :)
Another thing to keep in mind is that AlphaGo has no "memory", so every turn it looks at the board fresh. This means if the probabilities are very close you could have it jump around a bit either due to numerical noise from floating point calculations, model errors, or just tiny differences in probability making the behavior appear erratic and quick to change "strategy".
I think Lee Sedol won the game earlier by destroying AlphaGo's territory in the center. The commentator (Michael Redmond) was quite impressed with what he did there.
Sometimes its just not possible to stop the snowball rolling down hill. You cant always turn a retreat into an advance or flanking move, sometimes the first step backwards just turns into a full on rout.
I got the impression (possibly incorrectly) that AlphaGo was trying to throw curve-balls and be 'unexpected' in a way that might have 'forced' a mistake it could exploit.
The game seemed to be going in AlphaGo's favour when it was half way through. Black (AG) had secured a large area on the top that seemed nearly impossible to invade.
It was amazing to see how Lee Sedol found the right moves to make the invasion work.
This makes me think that if the time for match was three hours instead of two, maybe a professional player will have enough time to read the board deeply enough to find the right moves.
The thing is, Lee Sedol didn't "find the right moves"; the wedge at L11 shouldn't have worked. If black's 79th move had been at L10 instead of K10, Sedol would likely have resigned on the spot.
I was on the other stream; Myungwan Kim and Hajin Lee went way deeper than Redmond typically goes (since they have access to an SGF editor instead of a clumsy demo board, and they aren't performing for the camera as much). They seemed pretty confident in their conclusion that L10 killed white.
Just from Redmond's commentary, killing white in the center is just one of the outcomes that Alphago was impelled to achieve. [for just an arbitrary example] If Alphago killed white in the center but lost the bottom and the top right, it could have still lost. (and Sedol had to acheive multiple objectives also, of course).
Sure, maybe Alphago missed a winning move. But the situation was fluid both strategically and tactically, which might have been why that machine chose it's losing move and moreover, why I don't think it was just a matter of the machine failing to find a kill - especially since I think computers have been able to beat humans on pure tesuji for a while now.
Ah! This matches up with Demis' tweet that black's 79th move was later determined to be a mistake. In MCTS you wouldn't normally go back to moves you've already played unless it's an incomplete information game. However I guess for reinforcement learning (even if it's not actually done during these matches), you would go back and update the estimated values of moves already played, which explains how they know that.
Demis clarified what he meant by that in a subsequent tweet – it wasn't that AlphaGo re-evaluated move 79, it's that the winrate only plummeted after move 87, which was beyond the point of no return: https://twitter.com/demishassabis/status/708934687926804482
I would love to hear from the team for sure, but I suspect that they have, by now, fed the game state as of move 78 into AlphaGo, had it look exhaustively at possibilities (way more than in the few minutes it took during the match), and determined that in hindsight move 79 was indeed not an optimal response.
Demis tweeted that while the game was still ongoing.
Anyway I'm pretty sure what happened; I have actually implemented MCTS myself and know that you can trivially update evaluations of old nodes in MCTS as the game continues and you get a better estimate of the line of play actually taken, although you wouldn't normally have a reason to do so. Basically, it can see in hindsight that a move it took was bad, but without additional calculation you wouldn't know whether the alternatives were any better, or also are worse than estimated at the time.
negativist paranoid skeptic could say that it would be a good move for the team to intentionally make αgo lose single battle in the moment it has already won the war..
I believe that after winning 3 out of 5, AlphaGo team started experimenting with variables now that they can relax. Which will in turn be even more helpful for future AlphaGo version than the previous 3 wins.
They've declared AlphaGo's code and network frozen for the purposes of this competition, in part to avoid the PR issues that Deep Blue got for doing exactly that.
Don't want to sound all Conspiracy Theory but somehow this feels planned.. It plays into DeepMind's hand to not have the machine completely trouncing the human. It's less scary and keeps people engaged further into the future.
Also seems in-line with the way Demis was "rooting" for the human this time – they already won so now they focus on PR.
It doesn't seem that plausible if you've been around serious game players. The notion of throwing a game would be seen as extremely rude and condescending to one's opponent.
Once you step on to the field, you play your best. It doesn't matter whether the opponent is a master, or a six-year-old.
EDIT> I have a friend who I've never beaten at an RTS or non-random strategy game, even though we've played hundreds of times. If I thought for a moment that he'd let me win, I'd stop playing, and I hate losing.