Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The key part from the paper:

> To assess the merits of self-play reinforcement learning, compared to learning from human data, we trained a second neural network (using the same architecture) to predict expert moves in the KGS Server data­ set; this achieved state-of-the-art prediction accuracy compared to pre­ vious work 12,30–33 (see Extended Data Tables 1 and 2 for current and previous results, respectively). Supervised learning achieved a better initial performance, and was better at predicting human professional moves (Fig. 3). Notably, although supervised learning achieved higher move prediction accuracy, the self-learned player performed much better overall, defeating the human-trained player within the first 24 h of training. This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play.



That is really interesting. Given a neural network that solely exist to play Go, one that is influenced by the human mind is limited compared to the exact same set of neurons that doesn't have that influence.

EDIT: changed a set of neurons to neural network per andbbergers comments


Please don't refer to it as 'a set of neurons' - it only serves to fuel the (IMO) absolutely ridiculous AI winter fearmongering, and is also just a bad description. Neural nets are linear algebra blackboxes, the connections to biology are tenuous at best.

Sorry to be that guy, but the AI hype is getting out hand. COSYNE this year was packed with papers comparing deep learning to the brain... it drives me nutty. Convnets can be reasonably put into analogy with the visual system.... because they were inspired from it. But that's about it.

To address your actual comment: I would argue that this is not really interesting or surprising (at least to the ML practitioner), it is very well known that neural nets are incredibly sensitive to initialization. Think of it like this: as training progresses, parameters of neural nets move along manifolds in parameter space, but they can get nudged off of the "right" manifold and will never be able to recover.

Sorry for the rant, the AI hype is just getting really out of hand recently.

Machine learning is specifically not magic. Blackboxes are not useful. Convnets work so well because they build the symmetries of natural scenes directly into the model - natural scenes are translation invariant (as well as a couple of other symmetries), anything that models them sure as hell better have those symmetries too, or you're just crippling your model with extra superfluous parameters.


I changed my comment to neural network since a set of neurons is somewhat wrong, but I don't really agree that there isn't much of a connection between this and biology. While there might not be much of a connection between how they currently work and how our brains work, the whole point of machine learning and neural networks is to improve computers performance on the things we are good at. And while originally it was loosely modeled on it, and might be different know, it doesn't make it so people can't compare it to the brain. It would be wrong to say it is exactly like the brain, but I don't think there is anything wrong with comparing and contrasting the two. If our goal is to improve performance and we are the benchmark, then why shouldn't we compare them.

What I found interesting was mainly that it was us who nudged the parameter space you talked about into the "wrong" one manifold, especially given how old and complicated Go is. The sheer amount of human brain power that has been put into getting good at a game wasn't able to find certain aspects of it, and in 60 hours of training a neural network was able to.


I'm not saying there is nothing of value to be obtained by investigating connections between ML and the brain. That's how I got into ML in the first place, doing theoretical neuro research.

We absolutely should and do look to the brain for inspiration.

I'm taking issue with the rather ham-fisted series of papers that have come out in recent years aggressively pushing the agenda of connections between ML and neuro that just aren't there.

Are you sure that humans have done more net compute on Go than Deepmind just did? The Go game tree is _enormous_, humans are bias. We don't invent strategies from scratch, we use heuristics handed down to us from the pros (who in turn were handed down the heuristics from their mentors).

To me, it's not so interesting or surprising that the human initialized net performed worse. We just built the same biases and heuristics we have into that net.


As far as we know the brain is just a "linear algebra blackbox". It's an uninteresting reduction since linear algebra can describe almost everything. Yes NNs aren't magic, but neither is the brain. Likely they use similar principles. Hinton has a theory about how real neurons might be implementing a variation of backpropagation and there are a number of other theories.


>As far as we know the brain is just a "linear algebra blackbox"...Likely they use similar principles.

I'm not an expert, but my impression is that this is not really a reasonable claim, unless you're only considering very small function-like subsystems of the brain (e.g. visual cortex). Neural nets (of the nonrecurrent sort) are strict feed-forward function approximators, whereas the brain appears to be a big mess of feedback loops that is capable of (sloppily, and with much grumbling) modeling any algorithm you could want, and, importantly, adding small recursions/loops to the architecture as needed rather than a) unrolling them all into nonrecursive operations (like a feedforward net) or b) building them all into one central singly-nested loop (like an RNN).

The brain definitely seems to be using something backprop-like (in that it identifies pathways responsible for negative outcomes and penalizes them). But brains also seem to make efficiency improvements really aggressively (see: muscle memory, chunking, and other markers of proficiency), even in the absence of any external reward signal, which seems like something we don't really have a good analogue for in ANNs.


There are some parts of the brain we have no clue about. Episodic memory or our higher level ability to reason. But most of the brain is just low level pattern matching just like what NNs do.

The constraints you mention aren't deal breakers. We can make RNNs without maintaining a global state and fully unrolling the loop. See synthetic gradients for instance. NNs can do unsupervised learning as well, through things like autoencoders.


A pattern matcher can learn high level reasoning. Reasoning is just a boolean circuit


> It's an uninteresting reduction since linear algebra can describe almost everything.

The question is whether it can do so efficiently. As far as I know, alternating applications of affine transforms and non-linearities are not so useful for some computations that are known to occur in the brain such as routing, spatio-temporal clustering, frequency filtering, high-dimensional temporal states per neuron etc.


Hinton changes his opinion about what the brain is doing every 5 years... Hinton is not a neuroscientist...


If he changes his opinion, which I understand to be models of the brain in this case, and each iteration improves the model, then that is perfectly fine. It would be bad if someone did not change their view in case of inconsistent evidence.


For political opinions sure, but if he's changing his opinions so often ...

When you're a big scientific figure, I think that you have some extra responsibility to the public to only say things you're very confident about. Or otherwise very clearly communicate your uncertainty!!


> Hinton is not a neuroscientist...

It's not like neuroscientists know that either.


Agreed. If we announce that A* search is superhuman in finding best routes, most technorati would't bat an eye. Technically it is probably accurate to say that the results here show that neural networks can find good heuristics for MCTS search through unsupervised training in the game of Go. According to DeepMind authors:

"These search probabilities usually select much stronger moves than the raw move probabilities of the neural network; MCTS may therefore be viewed as a powerful policy improvement operator. Self-play with search – using the improved MCTS-based policy to select each move, then using the game winner as a sample of the value – may be viewed as a powerful policy evaluation operator. The main idea of our reinforcement learning algorithm is to use these search operators repeatedly in a policy iteration procedure ..."

The fact that this reinforcement training is unsupervised from the very beginning is quite exciting and may lead to better heuristics for other kinds of combinatorial optimization problems.


Please don't refer to them as black boxes. The internals are fully observable.


Fully observable and we still have no idea what the hell it's doing.

Makes neuroscience seem kinda bleak doesn't it?

There has been a lot of great work lately building up a theory of how these things work, but it is very much still in the early stage. Jascha Sohl-Dickstein in particular has been doing some great work on this.

We don't even have answers to the most basic questions.

For instance (pedagogically), how the hell is it possible to train these things at all? They have ridiculously non-convex loss landscapes and we optimize in the dumbest conceivable way, first-order stochastic gradient descent. This should not work. But it does, all too often.

Not a great example because there are easy hand wavy arguments as to why it should work, but as far as proofs go...

The hand wavy argument goes as follows: - we're in like a 10000 dimensional space, for the stationary point we're at to be a true local minima that means each one of those 10000 dimensions goes uphill in either direction. It's overwhelming likely that there's at least one way out - there are many many different ways to set the params of the net for each function. Permutation is a simple example.

We really have no idea how these things work.

Anyone who tells you otherwise is lying to you...


> Fully observable and we still have no idea what the hell it's doing.

Of course we do. It's matching a smooth multi-dimensional curve to sample data.


It's a conceptual black box. There's no way for us to understand what each individual neuron is doing.


The tools we have developed so far are limited, but that's different from "there's no way". Many academics are working hard right now to better understand deep neural networks.



Is there meaningful information in what one observes?


Yes, it turns out you can find meaningful information. etiam provided this https://arxiv.org/pdf/1312.6034.pdf The main issue is making sure what you are looking for is actually what the network is doing. You have to correctly interpret and visualize a jumble of numbers, which usually requires a hypothesis about how it worked in the first place. But assuming both go well you can gain meaningful information.


Can I train an NN to visualize the numbers?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: