What's missing is a part with more plasticity that can work in parallel and bi-directionally interact with the current static models in real-time.
This would mean individually trained models based on their experience so that knowledge is not translated to context, but to weight adjustments.
Disclaimer: These are my not-terribly-informed layperson's thoughts :^)
The attention mechanism does seem to give us a certain adaptability (especially in the context of research showing chain-of-thought "hidden reasoning") but I'm not sure that it's enough.
Thing is, earlier language models used recurrent units that would be able to store intermediate data, which would give more of a foothold for these kind of on-the-fly adjustments. And here is where the theory hits the brick wall of engineering. Transformers are not just a pure machine learning innovation, the key is that they are massively scalable, and my understand is part of this comes from the _lack_ of recurrence.
I guess this is where the interest in foundation models comes from. If you could take a codebase as a whole and turn it into effective training data to adjust the weights of an existing, more broadly-trained model, But is this possible with a single codebase's worth of data?
Here again we see the power of human intelligence at work: the ability to quite consciously develop new mental models even given very little data. I imagine this is made possible by leaning on very general internal world-models that let us predict the outcomes of even quite complex unseen ("out-of-distribution") situations, and that gives us extra data. It's what we experience as the frustrations and difficulties of the learning process.
Ahmen! I attend this same church.
My favorite professor in engineering school always gave open book tests.
In the real world of work, everyone has full access to all the available data and information.
Very few jobs involve paying someone simply to look up data in a book or on the internet. What they will pay for is someone who can analyze, understand, reason and apply data and information in unique ways needed to solve problems.
Doing this is called "engineering". And this is what this professor taught.
(I do mean memorisation fairly broadly, it doesn't have to mean reciting a meaningless list of items.)