I tried understanding the gist of the paper, and I’m not really convinced there’s anything meaningful here. It just looks like a variation of the transformer architecture inspired by biology, but no real innovation or demonstrated results.
> BDH is designed for interpretability. Activation vectors of BDH are sparse and positive.
This looks like the main tradeoff of this idea. Sparse and positive activations makes me think the architecture has lower capacity than standard transformers. While having an architecture be more easily interpretable is a good thing, this seems to be a significant cost to the performance and capacity when transformers use superposition to represent features in the activations spanning a larger space. Also I suspect sparse autoencoders already make transformers just as interpretable as BDH.
Attention mechanisms are wonderfully interpretable as is. You can literally see which tokens each token is attending to. People don’t bother much these days. But that’s not a strong selling point.
That last line isn't true. To be brain-like, it only needs to imitate one thing in the brain. That thing is udually tested in isolation against observed results in human brains. Then, people will combine multiple, brain-inspired components in various ways.
That's standard in computational neuroscience. Our standard should simply be whether they are imitating an actual structure or technique in the brain. They usually mention that one. If they don't, it's probably a nonsense comparison to get more views or funding.
I am genuinely baffled by this reply. Every single sentence you've typed is complete and utter nonsense. I'm going to bookmark this as a great example of the Dunning-Kruger effect in the wild.
Just to illustrate the absurdity of your point: I could claim, using your standard, that a fresh pile of cow dung is brain-like because it imitates the warmth and moistness of a brain.
I'll ignore the insults and rhetoric to give some examples.
The brain-inspired papers have done realistic models of specific neurons, spiking, Hebbian learning, learning rates tied to neuron measurements, matched firing patterns, done temporal synchronization, hippocampus-like memory, and prediction-based synthesis for self-training.
Brain-like or brain-inspired appears to mean using techniques similar to the brain. They study the brain, develop models that match its machinery, implement them, and compare observed outputs of both. That, which is computational neuroscience, deserves to be called brain-like since it duplicates hypothesized, brain techniques with brain-like results.
Others take the principles or behavior of the above, figure out practical designs, and implement them. They have some attributes of the brain-like models or similar behavior but don't duplicate it. They could be called brain-inspired but we need to be careful. Folks could game the label by making things that have nothing to do with brain-like models or went very far away.
I prefer the be specific about what is brain-like or brain-inspired. Otherwise, just mention the technique (eg spiking NN) to let us focus on what's actually being done.
Be specific, provide examples. Much of the things brains do that we call intelligent are totally unknown to us. We have, quite literally, no idea what algorithms the brain employs. If you want to talk about intelligence, I don’t know why you’re talking about neuron spiking. We don’t talk about semiconductor voltages when we talk about computer programs we’re working on.
AI systems are software, so if you want to build something brain like, you need to understand what the brain is actually like. And we don’t.
You can, of course, use the almost equivalent scientific-sounding Latin-derived term ("neuromorphic"), buy popcorn, and come back with it for a discussion about memristors.
This is like at the beginning or the end of the Crypto Bubble.
Publish a whitepaper for the next model architecture and hope that uninformed people with money blow it up your companys... i mean blow up the economy.... i mean blow , ahh whatever you know
ATH was in july.
but yeah i am not talking about BTC.
I am talking about the sheer number of Rugpulls that were done by people doing a etherium fork and claiming it as the new revolution in decentralized computing.
Prices have risen by orders of magnitude, untethered to any measurable fundamentals, then crashed, multiple times. I'm not sure what other definition of bubble you're operating with...
> BDH is designed for interpretability. Activation vectors of BDH are sparse and positive.
This looks like the main tradeoff of this idea. Sparse and positive activations makes me think the architecture has lower capacity than standard transformers. While having an architecture be more easily interpretable is a good thing, this seems to be a significant cost to the performance and capacity when transformers use superposition to represent features in the activations spanning a larger space. Also I suspect sparse autoencoders already make transformers just as interpretable as BDH.