If you don't know Gary Marcus: He keeps repeating this same argument since many years. He has always downplayed the capabilities of LLMs since the beginning. He never really added much useful (constructive) content to the discussion or to the research, though (except that he is advocating symbolic AI methods). So he is not really taken seriously by the community, but rather annoying on this.
Although, of course he is not alone with his argument. E.g. even Yann LeCun is repeating a similar argument on LLMs, and many other serious researches as well, that we probably need a bit more than just LLMs + our current training method + scaling up. E.g. some model extension which handles long-term memory better (lots of work on this already), or other model architectures (e.g. Yann LeCun proposed JEPA), or online learning (unclear how this should work with LLMs), or also different training criteria, etc. In any case, multi-modality (not just text) is important. Maybe embodiment (robots, interaction with the real world) as well.
(Edit Wow, the votes on this comment go up and down. I guess it's a controversial topic.)
Right, but is any serious researcher insisting that just scaling will be enough to achieve AGI? Because while I'm a layperson on this stuff, I don't get that impression, meaning this criticism from Marcus is mostly attacking a strawman. And he's not the only one peddling it.
The understanding I have is that scalability of LLMs with data took even their developers by surprise. They kept at it because empirically, so far, it's worked. But nobody assumes it'll keep working indefinitely or that it'll lead to AGI alone. If they did, OpenAI, Google, etc. would have fired most of their researchers and would simply focus everything they have on scaling.
When you look at the history of scaling up Transformers, e.g. GPT2, most people in the community were in fact quite surprised that just scaling up those models worked so well. And every time you read a lot of opinions that this is the end and scaling them up further will not give (much) improvements. This criticism is lower now but still there.
Scaling self-attention has also worked much further than it was initially believed it could work. Initially it was always believed to be the main bottleneck, but this turned out wrong. The feed-forward layers are much more the bottleneck when looking at where most of the compute is spent.
It's still a bit unclear how much further we can take the scaling. We are mostly at the limit of what is financial feasible today but as long as some form of generalized Moore's law continues (e.g. number of transistors per dollar), it becomes financial feasible to scale further. At some point we also hit a limit of available data. But maybe self-play (or variants) might solve this.
I guess most researchers agree that just scaling up further is maybe not optimal (w.r.t. reaching AGI), or also will not be enough, but it's a somewhat open question.
There are plenty of leaders in SV, one example being Sam Altman that is preaching that hardware is key. Just take a look at nVidia's $2t market cap.
When we all know it isn't, and this can have the side effect of creating a bubble. I would be concerned with this as it can have terrible effects to AI research long-term.
The word scaling has multiple definitions. It's not just the same hardware getting cheaper over time, there have been many architectural improvements:
- Better training algorithms for training with lower precision
- fastattention, fastattention 2 for decreasing bandwidth inside the GPU
- ring attention for decreasing bandwidth across GPUs
- algorithmic improvements in handling long context window
- MoE
- Lots of algorithmic improvements in fine tuning / alignment
- GROQ hardware architecture (deterministic hardware, storing all data in SRAM for inferencing instead of using cache hierarchies)
- Improvements in tokenization
So far softmax(K * Q')*V is the only thing that hasn't (yet) been touched.
If the question is that ,,just'' improving LLM perplexity by further algorithmic and hardware improvements will lead to AGI, at this point many researchers believe that the answer is yes (and many others that the answer is no :) ).
I lost all respect to Gary Marcus when he in 2018 claimed that autonomous driving is decades away.
Oh wait. I lost all respect to those hype-men who claimed it's just around the corner.
Before transformers, people claimed that you just need bigger neural network, not something new. It did not work.
Then came transformers, the new thing can again scale up better than previous era. Transformers have obvious limit: the computational complexity of self-attention module scales quadratically with the sequence length. We are trying to get partial fix with sparcity and linear tranformers and other techniques, but it gets harder at every turn.
We make progress, but "just scaling" is not working.
I’m not sure if I’m parsing your sentence right but surely you can’t be implying that autonomous driving isn’t possible or that it is some undermined time away?
You can come down to SF and take a waymo today. It works great.
Your first paragraph is an ad-hominem that you should be ashamed of. "Not really taken seriously by the community"? Which community is that? Last time I saw Marcus he was a co-organizer of a panel on Dug Lenat and Knowledge Representation in AAAI 2024 [1]. Does that sound like the AI community is not taking him seriously? Is anyone asking you to orgranise panels in AAAI?
The question is rhetorical: I can see you're a PhD student. My advice is to learn to have some respect for the person of others, as you will want them to have respect of your person when, one day, you find yourself saying something that "the community" disagrees with. And if you're not planning to, one day, find yourself in that situation, consider the possibility that you're in the wrong job.
And what is the above article saying that you think should not be taken seriously? Is it not a widely recognised fact that neural nets performance only improves with more data, more compute and more parameters? A five year old could tell you that. Is it controversial that this is a limitation?
I did not intend to attack Gary or so in any way. But I realize that my statement is probably too strong. Of course it's not the whole AI community. My intention with my post was just to give some perspective, some background for people who have not heard about Gary Marcus before.
Maybe I'm also in a bubble, but I was speaking mostly about the people I frequently read from, i.e. lots of people from Google Brain, DeepMind, other people who frequently publish on NeurIPS, ICLR, ICML, etc. Among those people, Gary is usually not taken seriously. At least that was my impression.
But let's not make this so much about Gary: Most of these people disagree with the opinion that Gary shares, i.e. they don't really see such a big need for symbolic AI, or they see much more potential in pure neural approaches (after all, the human brain is fully neural).
> Is it not a widely recognised fact that neural nets performance only improves with more data, more compute and more parameters?
I'm not sure how you meant that to be parsed.
1) performance only improves by scaling up those factors, and can't be improved in any other way
OR
2) performance can only (can't help but) get better as you scale up
I'm guessing you meant 1), which is wrong, but just in case you meant 2), that is wrong too. Increased scaling - in the absence of other changes - will continue to improve performance until it doesn't, and no-one knows what the limit is.
As far as 1), nobody thinks that scaling up is the only way to improve the performance of these systems. Architectural advances, such as the one that created them in the first place, is the other way to improve. There have already been many architectural changes since the original transformer of 2017, and I'm sure we'll see more in the models released later this year.
You ask if it's controversial that there is a limit to how much training data is available, or how much compute can be used to train them. For training data, the informed consensus appears to be that this will not be a limiting factor; in the words of Anthropic's CEO Dario Amodei "It'd be nice [from safety perspective] if it [data availability] was [a limit], but it won't be". Synthetic data is all the rage, and these companies can generate as much as they need. There's also masses of human-generated audio/video data that has hardly been touched.
Sure, compute would eventually become a limiting factor if scaling were the only way these models were being improved (which it isn't), but there is still plenty of headroom at the moment. As long as each generation of models make meaningful advances towards AGI, then I expect the money will be found. It'd be very surprising if the technology was advancing rapidly but development curtailed by lack of money - this is ultimately a national security issue and the government could choose to fund it if they had to.
One might also imagine that as one of the "godfathers of AI" he feels a bit sidelined by the success of LLMs (especially given above), and wants to project an image of visionary ahead of the pack.
I actually agree with him that if the goal is AGI and full animal intelligence then LLMs are not really the right path (although a very useful validation of the power of prediction). We really need much greater agency (even if only in a virtual world), online learning, innate drives, prediction applied to sensory inputs and motor outputs, etc.
Still, V-JEPA is nothing more than a pre-trained transformer applied to vision (predicting latent visual representations rather than text tokens), so it is just a validation of the power of transformers, rather than being any kind of architectural advance.
I'm skeptical of both the people who claim we'll achieve AGI and self-driving cars in just a few years - as well as of the people who claim it can't be achieved by LLMs at all- ever.
I feel that we don't understand well enough (scientifically) what "general intelligence" even is, and how it comes to be in humans - to make claims either way.
To me, the only honest answer right now seems to be, that we do not know. We have absolutely no clue how close - or far - we are to AGI.
I was at an AI meetup, and I talked to someone who was in charge of the emerging technology division at a VC firm. I asked him why focus so hard on AGI, when AI tools are already looking quite impressive and are a much clearer area to focus investment on. His answer was that AGI was "easy" which I laughed at (in a good natured way) and tried to get him to elaborate on, at which point he started to get uncomfortable and made an excuse to go talk to someone else.
This same fellow was big on autonomous swarms for problem solving, but when I asked him what the problem autonomous swarms were supposed to solve that you couldn't solve more easily and quickly by a LLM talking to itself, he didn't have an answer.
Remember that these same VC firms had large crypto and blockchain divisions and were hosting lavish conferences dedicated to it. Regardless of what their job title is, if someone can't back up their talk with real world experience with the tech, there's no reason to take them seriously.
I am patiently waiting for the industry and VCs to pivot their "visions" to "vertical AI", then to "targeted ML applications", and "knowledge-assisted automation systems for the manufacturing and logistics industries". They really burned a lot of money and energy on stuff noone seems to want or need. That money is going to run out soon.
For some reason, VCs got super hype about foundation model companies, but from my perspective being very deep in this area is that there's zero moat there with the prevalence of open source, so it's a very stupid area to burn massive amounts of cash in.
Targeted AI applications and virtual "executive assistant" agents are going to be huge though.
You make it sound silly, but it does seem to make good business sense to me. I'd argue that it seems like they learned from The Bitter Lesson[0] and instead of trying to manually solve things with today's technology, are relying on the exponential improvement in general purpose AI and building something that would utilize that.
On a somewhat related note, I'm reminded of the game Crysis, which was developed with a custom engine (CryEngine 2) which was famously too demanding to run at high settings on then-existing hardware (in 2007). They bet on the likes of Nvidia to continue rapidly improving the tech, and they were absolutely right, as it was a massive success.
My impression was easy to build, as he had an engineering background. It seemed like he expected AGI was ready to emerge fully formed like Athena from the head of Zeus.
My take would be that AGI is inevitable, and it's your choice if you want to be a part of making it (or deadend branches along the way) or not. I take Ray Kurzweil's predicted timeline as being the most sensible.
This is why driverless cars are still just demos, and why LLMs will never be reliable.
It hard to build something that is yet to be even logically defined.
Hoping that AGI will somehow just "emerge" from an inert box of silicone switches (aka a "computer" as we currently know it) is the stuff of movie fantasy ... or perhaps religion.
> ...is the stuff of movie fantasy ... or perhaps religion.
IMHO, something like "movie fantasy" constitutes the core belief of a great many software engineers and other so-called rational people in the tech space.
Driverless cars are here, without liability issues they would already be widespread. But scaling isn't what got them there. It was tons of labeled data, both from rules based implementations and shadowing human drivers.
The ability to generalize completely outside training data isn't that common among humans IME. That is a high bar. How many of us have done so without at least drawing an analogy to some other experience we have had? Truly unique thinkers aren't that commonplace. I myself could have probably just asked an LLM what I would say in this post and gotten pretty close...
Evolution is an incredibly dumb, very parallel search over long timescales and that managed to turn blobs of carbon soup into brains because that's a useful (but not necessary) tool for its primary optimization goal.
But that doesn't make carbon soup magical. There's no fundamental physics that privileges information-processing on carbon atoms. We don't have the time-scales of evolution, but we can optimize so much harder on a single goal than it can.
So I don't see how it's a movie fantasy any more than bottling up stars is (well, ICBMs do deliver bottled sunshine... uh, the analogy is going too far here). Anyway, the point is that while brains and intelligence are complicated systems there isn't anything at all known that says it's fundamentally impossible to replicate their functionality or something in the general category. And scaling will be a necessary but perhaps not sufficient component of that, just because they're going to be complex systems.
The equivalent of a human brain just "emerging" from inert silicon switches without any real understanding of how or why --- that's PFM (Pure Friggin' Magic). There is no logical reason to believe it is even possible or practical --- yet people still believe.
It's the modern day version of alchemy --- trying to create a fantastical result from a chemical reaction before it was understood that nuclear physics is the mechanism required. And even with this understanding, we have yet to succeed at turning lead into gold in any practical way.
It seems evolution developed intelligence as a way to for organisms to react and move through 3D space. An advantage was gained by organisms who can "understand" and predict, but biology just reused the same hardware for location, moving and modeling the 3D word for more abstract processes like thinking and pattern recognition.
So evolution came up with solutions for surviving on planet Earth, which isn't necessarily the same as general problem solving, even though there are significant overlaps. Just the $.02 of a layperson.
The question is if other animals even do have the kind of intelligence that most people call AGI. It seems that this kind of adaptive intelligence may be unique to mammals (and perhaps a few other outliers, like the octopus), with the vast majority of life being limited to inborn reflexes, immitating their conspecifics, and trial and error.
This is actually a great perspective to have. The idea that there is no fundamental law of physics that should prevent us from replicating or exceeding the functionality of the human brain through artificial means makes this a question of when, not if.
It’s not a hope to expect machines to match or exceed the human level of intelligence, because we demonstrate human level intelligent machines by virtue of existence, and the physical limits of computation far exceed those of our specific biology.
AKA, we know it’s possible for atomic systems to be as intelligent as we are, because we are. We suspect the limit is far beyond us because computers have dramatically faster processing, essentially perfect memory systems, etc
Do you agree that AGI can emerge from a bunch of haphazardly connected Neurons, using about 700MiB of procedurally generated initialization data (human DNA) and employing 1-2 decades of low-bandwidth, low-power (~25W) training?
You might even agree that not a single directed thought went into the design of that whole system.
Having established that, it seems laughable to me that it would be an "impossible fantasy" to replicate this with purpose-built silicon.
Sure, our current whole-system understanding could still be drastically improved, and the sheer scale of the reference implementation (human connectome) is still intimidating even compared to our most advanced chips, but I have absolutely zero doubt that AGI is just a matter of time (and continuous gradual progress).
If a lack of understanding did not stop nature, why should it stop us? :P
To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.
To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.
Religion is belief in the unknown without reason or logic.
Do you know of any inanimate object that is truly "intelligent"?
Without ever knowing or seeing a single, real working example, you "believe" and have "faith" that it is possible. By so doing, you are practicing religion --- not science.
> To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.
Correction. I'm atheist. I don't see AGI for the future. Same reasons as the poster who responded to you below: evolution may not have had a design, but we know it works well on animated creatures.
We have yet to see any examples of inanimate objects exhibiting any sign of intelligence, or even instinct.
What we do know is that, even after billions of years of evolution, not a single rock has evolved to exhibit a sign of intelligence.
You're saying that with an intelligent hand directing the process, we can do better. I understand the argument, and I concede that it is a reasonable argument to make, I'm just unconvinced by it.
And the Sun just somehow emerged from a cloud of hydrogen gas.
The obvious flaw in this line of thinking (as related to AGI) is that we/humans can create/engineer the same or similar without any real understanding of the underlying mechanisms involved.
While it's true that it is worth being skeptical about the current deep learning/LLM boom, younger people who don't know who Gary Marcus is need to know that he is not really an unbiased observer here -- he has never liked neural net approaches and thinks they were a wrong turn (even though they obviously are far more successful than the older symbolic "GOFAI" methods).
I'd stand by the bitter lesson ( http://www.incompleteideas.net/IncIdeas/BitterLesson.html ) here. of course it feels better to do something clever, finetune models, do clever agents or novel algorithms instead, ... only to become beaten by the next larger scaled general model.
Scaling alone will certainly not get us to AGI, but that's trivially obvious.
The "scale is all you need" argument, from anyone intelligent, assumes that other obvious deficiencies such as lack of short-term memory (to enable more capable planning/reasoning) will also be taken care of along the way, and I'd not be surprised to see that particular one addressed in upcoming next-gen models.
There's a recent paper here from Google suggesting one way to do it, although may other ways too.
The more interesting question is what other components/capabilities needs to be added to pre-trained transformers to get to AGI, or are some of the missing pieces so fundamental that they require a new approach, and can't just be retrofitted along the way as we continue to scale up?
Although, of course he is not alone with his argument. E.g. even Yann LeCun is repeating a similar argument on LLMs, and many other serious researches as well, that we probably need a bit more than just LLMs + our current training method + scaling up. E.g. some model extension which handles long-term memory better (lots of work on this already), or other model architectures (e.g. Yann LeCun proposed JEPA), or online learning (unclear how this should work with LLMs), or also different training criteria, etc. In any case, multi-modality (not just text) is important. Maybe embodiment (robots, interaction with the real world) as well.
(Edit Wow, the votes on this comment go up and down. I guess it's a controversial topic.)
The understanding I have is that scalability of LLMs with data took even their developers by surprise. They kept at it because empirically, so far, it's worked. But nobody assumes it'll keep working indefinitely or that it'll lead to AGI alone. If they did, OpenAI, Google, etc. would have fired most of their researchers and would simply focus everything they have on scaling.
Scaling self-attention has also worked much further than it was initially believed it could work. Initially it was always believed to be the main bottleneck, but this turned out wrong. The feed-forward layers are much more the bottleneck when looking at where most of the compute is spent.
It's still a bit unclear how much further we can take the scaling. We are mostly at the limit of what is financial feasible today but as long as some form of generalized Moore's law continues (e.g. number of transistors per dollar), it becomes financial feasible to scale further. At some point we also hit a limit of available data. But maybe self-play (or variants) might solve this.
I guess most researchers agree that just scaling up further is maybe not optimal (w.r.t. reaching AGI), or also will not be enough, but it's a somewhat open question.
When we all know it isn't, and this can have the side effect of creating a bubble. I would be concerned with this as it can have terrible effects to AI research long-term.
If the question is that ,,just'' improving LLM perplexity by further algorithmic and hardware improvements will lead to AGI, at this point many researchers believe that the answer is yes (and many others that the answer is no :) ).
Oh wait. I lost all respect to those hype-men who claimed it's just around the corner.
Before transformers, people claimed that you just need bigger neural network, not something new. It did not work.
Then came transformers, the new thing can again scale up better than previous era. Transformers have obvious limit: the computational complexity of self-attention module scales quadratically with the sequence length. We are trying to get partial fix with sparcity and linear tranformers and other techniques, but it gets harder at every turn.
We make progress, but "just scaling" is not working.
You can come down to SF and take a waymo today. It works great.
The question is rhetorical: I can see you're a PhD student. My advice is to learn to have some respect for the person of others, as you will want them to have respect of your person when, one day, you find yourself saying something that "the community" disagrees with. And if you're not planning to, one day, find yourself in that situation, consider the possibility that you're in the wrong job.
And what is the above article saying that you think should not be taken seriously? Is it not a widely recognised fact that neural nets performance only improves with more data, more compute and more parameters? A five year old could tell you that. Is it controversial that this is a limitation?
_____________________
[1] https://aaai.org/aaai-conference/aaai-24-panels/
Maybe I'm also in a bubble, but I was speaking mostly about the people I frequently read from, i.e. lots of people from Google Brain, DeepMind, other people who frequently publish on NeurIPS, ICLR, ICML, etc. Among those people, Gary is usually not taken seriously. At least that was my impression.
But let's not make this so much about Gary: Most of these people disagree with the opinion that Gary shares, i.e. they don't really see such a big need for symbolic AI, or they see much more potential in pure neural approaches (after all, the human brain is fully neural).
I'm not sure how you meant that to be parsed.
1) performance only improves by scaling up those factors, and can't be improved in any other way
OR
2) performance can only (can't help but) get better as you scale up
I'm guessing you meant 1), which is wrong, but just in case you meant 2), that is wrong too. Increased scaling - in the absence of other changes - will continue to improve performance until it doesn't, and no-one knows what the limit is.
As far as 1), nobody thinks that scaling up is the only way to improve the performance of these systems. Architectural advances, such as the one that created them in the first place, is the other way to improve. There have already been many architectural changes since the original transformer of 2017, and I'm sure we'll see more in the models released later this year.
You ask if it's controversial that there is a limit to how much training data is available, or how much compute can be used to train them. For training data, the informed consensus appears to be that this will not be a limiting factor; in the words of Anthropic's CEO Dario Amodei "It'd be nice [from safety perspective] if it [data availability] was [a limit], but it won't be". Synthetic data is all the rage, and these companies can generate as much as they need. There's also masses of human-generated audio/video data that has hardly been touched.
Sure, compute would eventually become a limiting factor if scaling were the only way these models were being improved (which it isn't), but there is still plenty of headroom at the moment. As long as each generation of models make meaningful advances towards AGI, then I expect the money will be found. It'd be very surprising if the technology was advancing rapidly but development curtailed by lack of money - this is ultimately a national security issue and the government could choose to fund it if they had to.
Next year he'll be teaming up with Rudy Giuliani to tout the success of SHRLDU at Four Seasons Landscaping.
The AI community asked GPT-4 to send him an invite, and he accepted.
https://www.linkedin.com/posts/yann-lecun_what-meta-learned-...
One might also imagine that as one of the "godfathers of AI" he feels a bit sidelined by the success of LLMs (especially given above), and wants to project an image of visionary ahead of the pack.
I actually agree with him that if the goal is AGI and full animal intelligence then LLMs are not really the right path (although a very useful validation of the power of prediction). We really need much greater agency (even if only in a virtual world), online learning, innate drives, prediction applied to sensory inputs and motor outputs, etc.
Still, V-JEPA is nothing more than a pre-trained transformer applied to vision (predicting latent visual representations rather than text tokens), so it is just a validation of the power of transformers, rather than being any kind of architectural advance.
I feel that we don't understand well enough (scientifically) what "general intelligence" even is, and how it comes to be in humans - to make claims either way.
To me, the only honest answer right now seems to be, that we do not know. We have absolutely no clue how close - or far - we are to AGI.
This same fellow was big on autonomous swarms for problem solving, but when I asked him what the problem autonomous swarms were supposed to solve that you couldn't solve more easily and quickly by a LLM talking to itself, he didn't have an answer.
Targeted AI applications and virtual "executive assistant" agents are going to be huge though.
That’s all it really is
On a somewhat related note, I'm reminded of the game Crysis, which was developed with a custom engine (CryEngine 2) which was famously too demanding to run at high settings on then-existing hardware (in 2007). They bet on the likes of Nvidia to continue rapidly improving the tech, and they were absolutely right, as it was a massive success.
[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
It hard to build something that is yet to be even logically defined.
Hoping that AGI will somehow just "emerge" from an inert box of silicone switches (aka a "computer" as we currently know it) is the stuff of movie fantasy ... or perhaps religion.
IMHO, something like "movie fantasy" constitutes the core belief of a great many software engineers and other so-called rational people in the tech space.
The ability to generalize completely outside training data isn't that common among humans IME. That is a high bar. How many of us have done so without at least drawing an analogy to some other experience we have had? Truly unique thinkers aren't that commonplace. I myself could have probably just asked an LLM what I would say in this post and gotten pretty close...
So I don't see how it's a movie fantasy any more than bottling up stars is (well, ICBMs do deliver bottled sunshine... uh, the analogy is going too far here). Anyway, the point is that while brains and intelligence are complicated systems there isn't anything at all known that says it's fundamentally impossible to replicate their functionality or something in the general category. And scaling will be a necessary but perhaps not sufficient component of that, just because they're going to be complex systems.
The equivalent of a human brain just "emerging" from inert silicon switches without any real understanding of how or why --- that's PFM (Pure Friggin' Magic). There is no logical reason to believe it is even possible or practical --- yet people still believe.
It's the modern day version of alchemy --- trying to create a fantastical result from a chemical reaction before it was understood that nuclear physics is the mechanism required. And even with this understanding, we have yet to succeed at turning lead into gold in any practical way.
So evolution came up with solutions for surviving on planet Earth, which isn't necessarily the same as general problem solving, even though there are significant overlaps. Just the $.02 of a layperson.
AKA, we know it’s possible for atomic systems to be as intelligent as we are, because we are. We suspect the limit is far beyond us because computers have dramatically faster processing, essentially perfect memory systems, etc
You might even agree that not a single directed thought went into the design of that whole system.
Having established that, it seems laughable to me that it would be an "impossible fantasy" to replicate this with purpose-built silicon.
Sure, our current whole-system understanding could still be drastically improved, and the sheer scale of the reference implementation (human connectome) is still intimidating even compared to our most advanced chips, but I have absolutely zero doubt that AGI is just a matter of time (and continuous gradual progress).
If a lack of understanding did not stop nature, why should it stop us? :P
To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.
But I'm very open to have that view changed...
Religion is belief in the unknown without reason or logic.
Do you know of any inanimate object that is truly "intelligent"?
Without ever knowing or seeing a single, real working example, you "believe" and have "faith" that it is possible. By so doing, you are practicing religion --- not science.
Correction. I'm atheist. I don't see AGI for the future. Same reasons as the poster who responded to you below: evolution may not have had a design, but we know it works well on animated creatures.
We have yet to see any examples of inanimate objects exhibiting any sign of intelligence, or even instinct.
What we do know is that, even after billions of years of evolution, not a single rock has evolved to exhibit a sign of intelligence.
You're saying that with an intelligent hand directing the process, we can do better. I understand the argument, and I concede that it is a reasonable argument to make, I'm just unconvinced by it.
The obvious flaw in this line of thinking (as related to AGI) is that we/humans can create/engineer the same or similar without any real understanding of the underlying mechanisms involved.
The "scale is all you need" argument, from anyone intelligent, assumes that other obvious deficiencies such as lack of short-term memory (to enable more capable planning/reasoning) will also be taken care of along the way, and I'd not be surprised to see that particular one addressed in upcoming next-gen models.
There's a recent paper here from Google suggesting one way to do it, although may other ways too.
https://arxiv.org/pdf/2404.07143.pdf
The more interesting question is what other components/capabilities needs to be added to pre-trained transformers to get to AGI, or are some of the missing pieces so fundamental that they require a new approach, and can't just be retrofitted along the way as we continue to scale up?