It has been clear for a long time (e.g. Marvin Minsky's early research) that:
1. both ANNs and the brain need to solve the credit assignment problem
2. backprop works well for ANNs but probably isn't how the problem is solved in the brain
This paper is really interesting, but is more a novel theory about how the brain solves the credit assignment problem. The HN title makes it sound like differences between the brain and ANNs were previously unknown and is misleading IMO.
> The HN title makes it sound like differences between the brain and ANNs were previously unknown and is misleading IMO.
Agreed on both counts. There's nothing surprising in "there are differences between the brain and ANN's."
But their might be something useful in the "novel theory about how the brain solves the credit assignment problem" presented in the paper. At least for me, it caught my attention enough to justify giving it a full reading sometime soon.
Are there any results about the "optimality" of backpropagation? Can one show that it emerges naturally from some Bayesian optimality criterion or a dynamic programming principle? This is a significant advantage that the "free energy principle" people have.
For example, let's say instead of gradient descent you want to do a Newton descent. Then maybe there's a better way to compute the needed weight updates besides backprop?
I'd be willing to be proven wrong, but as a starting point I'd suggest it obviously isn't optimal for what it is being used for. The performance on tasks of AI seems to be quite poor relative to the time spent training. For example, when AIs overtake humans at Baduk it is normal for the AI to have played several orders of magnitude more games than elite human players.
The important thing is backprop does work and so we're just scaling it up to absurd levels to get good results. There is going to be a big step change found sooner or later where training gets a lot better. Maybe there is some sort of threshold we're looking for where a trick only works for models with lots of parameters or something before we stumble on it, but if evolution can do it so will researchers.
> For example, let's say instead of gradient descent you want to do a Newton descent. Then maybe there's a better way to compute the needed weight updates besides backprop?
IIRC, feedback alignment [1] approximates Gauss-Newton minimization.
So there is an easier way, that is potentially biologically more plausible, though not necessarily a better way.
"differs fundamentally", being in the tense that it is, with the widely known context that AI is "modeled after the brain", definitely does suggest that oh no, they got the brain wrong when that modelling happened, therefore AI is fundamentally built wrong. Or at least I can definitely see this angle in it.
The angle I actually see in it though is the typical pitiful appeal to the idea that the brain is this incredible thing we should never hope to unravel, that AI bad, and that everyone working on AI is an idiot as per the link (and then the link painting a leaps and bounds more nuanced picture).
The title does express that, due to context. An article in Nature with the title "X is Y" suggests that, until now, we didn't know that X is Y, or we even thought that X is definitely not Y.
Making the 'fundimental difference' the focus seems like laying the foundation to a claim that AI lacks some ability because of the difference. The difference does mean you cannot infer abilities present in one by detecting them in the other. This is the similar to, and as about as profound as, saying that you cannot say that rocks can move fast because of their lack of legs. Which is true, but says nothing about the ability of rocks to move fast by other means.
Not my area of expertise, but this paper may be important for the reason that it is more closely aligned with the “enactive” paradigm of understand brain-body-behavior and learning than a backpropogation-only paradigm.
(I like enactive models of perception such as those advocated by Alva Noe, Humberto Maturana, Francisco Valera, and others. They get us well beyond the straightjacket of Cartesian dualism.)
Rather than have error signals tweak synaptic weights after a behavior, a cognitive system generates a set of actions it predicts will accommodate needs. This can apparently be accomplished without requiring short term synaptic plasticity. Then if all is good, weights are modified in a secondary phase that is more about asserting utility of the “test” response. More selection than descent. The emphasis is more on feedforward modulation and selection. Clearly there must be error signal feedback so some if you may argue that the distinction will be blurry at some levels. Agreed.
Look forward to reading more carefully to see how far off-base I am.
Theories that brains predict the pattern of expected neural activity aren't new, (eg this paper cites work towards the Free Energy Principle, but not Embodied Predictive Interoception Coding works). I have 0 neuroscience training so I doubt I'd be able to reliably answer my question just by reading this paper, but does anyone know how specifically their Prospective Configuration model differs, or expands, upon the previous work? Is it a better model of how brains actually handle credit assign than the aforementioned models?
The FEP is more about what objective function the brain (really the isocortex) ought to optimize. EPIC is a somewhat related hypothesis about how viscerosensory data is translated into percepts.
Prospective Configuration is an actual algorithm that, to my understanding, attempts to reproduce input patterns but can also engage in supervised learning.
I'm less clear on Prospective Configuration than the other two, which I've worked with directly.
> In prospective configuration, before synaptic weights are modified, neural activity changes across the network so that output neurons better predict the target output; only then are the synaptic weights (hereafter termed ‘weights’) modified to consolidate this change in neural activity. By contrast, in backpropagation, the order is reversed; weight modification takes the lead, and the change in neural activity is the result that follows.
What would neural activity changes look like in an ML model?
The post headline is distracting people and making a poor discussion. The paper describes a learning mechanism that had advantages over backprop, and may be closer to what we see in brains.
The contribution of the paper, and its actual title is about the proposed mechanism.
All the comments amounting to ‘no shit, sherlock’, are about the mangled headline, not the paper.
Oh hey, I know one of the authors on this paper. I've been meaning to ask him at NeurIPS how this prospective configuration algorithm works for latent variable models.
1. both ANNs and the brain need to solve the credit assignment problem 2. backprop works well for ANNs but probably isn't how the problem is solved in the brain
This paper is really interesting, but is more a novel theory about how the brain solves the credit assignment problem. The HN title makes it sound like differences between the brain and ANNs were previously unknown and is misleading IMO.
Agreed on both counts. There's nothing surprising in "there are differences between the brain and ANN's."
But their might be something useful in the "novel theory about how the brain solves the credit assignment problem" presented in the paper. At least for me, it caught my attention enough to justify giving it a full reading sometime soon.
Dang it, how did I miss that. Uugh. :-(
For example, let's say instead of gradient descent you want to do a Newton descent. Then maybe there's a better way to compute the needed weight updates besides backprop?
The important thing is backprop does work and so we're just scaling it up to absurd levels to get good results. There is going to be a big step change found sooner or later where training gets a lot better. Maybe there is some sort of threshold we're looking for where a trick only works for models with lots of parameters or something before we stumble on it, but if evolution can do it so will researchers.
IIRC, feedback alignment [1] approximates Gauss-Newton minimization. So there is an easier way, that is potentially biologically more plausible, though not necessarily a better way.
[1] https://www.nature.com/articles/ncomms13276#Sec20
There are no words in the title which express this. Your own brain is "making it sound" like that. Misleading, yes, but attribute it correctly.
The angle I actually see in it though is the typical pitiful appeal to the idea that the brain is this incredible thing we should never hope to unravel, that AI bad, and that everyone working on AI is an idiot as per the link (and then the link painting a leaps and bounds more nuanced picture).
The current HN title ("Brain learning differs fundamentally from artificial intelligence systems") seems very heavily editorialized.
Making the 'fundimental difference' the focus seems like laying the foundation to a claim that AI lacks some ability because of the difference. The difference does mean you cannot infer abilities present in one by detecting them in the other. This is the similar to, and as about as profound as, saying that you cannot say that rocks can move fast because of their lack of legs. Which is true, but says nothing about the ability of rocks to move fast by other means.
(I like enactive models of perception such as those advocated by Alva Noe, Humberto Maturana, Francisco Valera, and others. They get us well beyond the straightjacket of Cartesian dualism.)
Rather than have error signals tweak synaptic weights after a behavior, a cognitive system generates a set of actions it predicts will accommodate needs. This can apparently be accomplished without requiring short term synaptic plasticity. Then if all is good, weights are modified in a secondary phase that is more about asserting utility of the “test” response. More selection than descent. The emphasis is more on feedforward modulation and selection. Clearly there must be error signal feedback so some if you may argue that the distinction will be blurry at some levels. Agreed.
Look forward to reading more carefully to see how far off-base I am.
Prospective Configuration is an actual algorithm that, to my understanding, attempts to reproduce input patterns but can also engage in supervised learning.
I'm less clear on Prospective Configuration than the other two, which I've worked with directly.
What would neural activity changes look like in an ML model?
The contribution of the paper, and its actual title is about the proposed mechanism.
All the comments amounting to ‘no shit, sherlock’, are about the mangled headline, not the paper.