Per the author’s links, he warned that deep learning was hitting a wall in both 2018 and 2022. Now would be a reasonable time to look back and say “whoops, I was wrong about that.” Instead he seems to be doubling down.
a contrarian needs to keep spruiking the point, because if he relents, he loses the core audience that listened to him. That's why it's also the same with those who keep predicting market crashes etc.
Is deep learning approaching a wall? - He doesn't make a concrete prediction, which seems like a hedge to avoid looking silly later. Similarly, I noticed a hedge in this post:
Of course it ain’t over til it’s over. Maybe pure scaling ... will somehow magically yet solve ...
---
But the paper isn't wrong either:
Deep learning thus far is data hungry - yes, absolutely
Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans
Deep learning thus far has no natural way to deal with hierarchical structure - I think this is technically true, but I would also say that a HUMAN can LEARN to use LLMs while taking these limitations into account. It's non-trivial to use them, but they are useful
Deep learning thus far has struggled with open-ended inference - same point as above -- all the limitations are of course open research questions, but it doesn't necessarily mean that scaling was "wrong". (The amount of money does seem crazy though, and if it screws up the US economy, I wouldn't be that surprised)
Deep learning thus far is not sufficiently transparent - absolutely, the scaling has greatly outpaced understanding/interpretability
Deep learning thus far has not been well integrated with prior knowledge - also seems like a valuable research direction
Deep learning thus far cannot inherently distinguish causation from correlation - ditto
Deep learning presumes a largely stable world, in ways that may be problematic - he uses the example of Google Flu Trends ... yes, deep learning cannot predict the future better than humans. That is a key point in the book "AI Snake Oil". I think this relates to the point about generalization -- deep learning is better at regurgitating and remixing the past, rather than generalizing and understanding the future.
Lots of people are saying otherwise, and then when you call them out on their predictions from 2 years ago, they have curiously short memories.
Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted - absolutely, this is the main limitation. You have to verify its answers, and this can be very costly. Deep learning is only useful when verifying say 5 solutions is significantly cheaper than coming up with one yourself.
Deep learning thus far is difficult to engineer with - this is still true, e.g. deep learning failed to solve self-driving ~10 years ago
---
So Marcus is not wrong, and has nothing to apologize for. The scaling enthusiasts were not exactly wrong either, and we'll see what happens to their companies.
It does seem similar to be dot com bubble - when the dust cleared, real value was created. But you can also see that the marketing was very self-serving.
Stuff like "AGI 2027" will come off poorly -- it's an attempt by people with little power to curry favor with powerful people. They are serving as the marketing arm, and oddly not realizing it.
"AI will write all the code" will also come off poorly. Or at least we will realize that software creation != writing code, and software creation is the valuable activity
I think it would help if either side could be more quantitative about their claims, and the problem is both narratives are usually rather weaselly. Let's take this section:
>Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans
But they do generalize to some extent, and my limited understanding is that they generalize way more than expected ("emergent abilities") from the pre-LLM era, when this prediction was made. Sutskever pretty much starts the podcast saying "Isn’t it straight out of science fiction?"
Now Gary Marcus says "limited capacity for transfer" so there is wiggle room there, but can this be quantified and compared to what is being seen today?
In the absence of concrete numbers, I would suspect he is wrong here. I mean, I still cannot mechanistically picture in my head how my intent, conveyed in high-level English, can get transformed into working code that fits just right into the rather bespoke surrounding code. Beyond coding, I've seen ChatGPT detect sarcasm in social media posts about truly absurd situations. In both cases, the test data is probably outside the distribution of the training data.
At some level, it is extracting abstract concepts from its training data, as well as my prompt and the unusual test data, even apply appropriate value judgements to those concepts where suitable, and combine everything properly to generate a correct response. These are much higher-level concepts than the ones Marcus says deep learning has no grasp of.
Absent quantifiable metrics, on a qualitative basis at least I would hold this point against him.
On a separate note:
> "AI will write all the code" will also come off poorly.
On the contrary, I think it is already true (cf agentic spec-driven development.) Sure, there are the hyper-boosters who were expecting software engineers to be replaced entirely, but looking back, claims from Dario, Satya, Pichai and their ilk were were all about "writing code" and not "creating software." They understand the difference and in retrospect were being deliberately careful in their wording while still aiming to create a splash.
Several OpenAI people said in 2023 that they were surprised by the acceptance of the public. Because they thought that LLMs were not so impressive.
The public has now caught up with that view. Familiarity breeds contempt, in this case justifiably so.
EDIT: It is interesting that in a submission about Sutskever essentially citing Sutskever is downvoted. You can do it here, but the whole of YouTube will still hate "AI".
Interesting to me, during that crazy period when Sutskever ultimately ended up leaving OpenAI, I thought perhaps he had shot himself in the foot to some degree (not that I have any insider information—just playing stupid observer from the outside).
The feeling I have now is that it was a fine decision for him to have made. It made a point at the time, perhaps moral, perhaps political. And now it seems, despite whatever cost there was for him at the time, the "golden years" of OpenAI (and LLM's in general) may have been over anyway.
To be sure, I happen to believe there is a lot of mileage for LLMs even in their current state—a lot of use-cases, integration we have yet to explore. But Sutskever I assume is a researcher and not a plumber—for him the LLM was probably over.
One wonders how long before one of these "break throughs". On one hand, they may come about serendipitously, and serendipity has no schedule. It harkens back to when A.I. itself was always "a decade away". You know, since the 1950's or so.
On the other hand, there are a lot more eyeballs on AI these days than there ever were in Minsky's* day.
(*Hate to even mention the man's name these days.)
> To be sure, I happen to believe there is a lot of mileage for LLMs even in their current state—a lot of use-cases, integration we have yet to explore. But Sutskever I assume is a researcher and not a plumber—for him the LLM was probably over.
Indeed. Humans are a sucker for a quick answer delivered confidently. And The industry coalesced around LLM's once it was able to output competent, confident, corporate (aka HR-approved) english, which for many AI/DL/ML/NN researchers was actually a bit of a bummer. Reason I say that is because that milestone suddenly made the "[AGI is] always a decade away" to seeming much more imminent. Thus the focus of investment in the space shifted from actual ML/DL/NN research to who could convert largest pile of speculatively leveraged money into pallets of GPU's and data to feed them as "throw more compute/data" at it was a quicker/more reliable way to realize performance gains than investing in research did. Yes, research would inevitably yield results, but it's incredibly hard to forceast how long it takes for research to yield tangible results and harder still to quantify that X dollars will result in Y result in Z time compared to X dollars buys Y compute deployed in Z time. With the immense speculative backed FOMO and the potential valuation/investment that could result from being "the leader" in any given regard, it's no wonder that BigTech chose to primarily invest in the latter, thus leaving to those working in the former space to start considering looking elsewhere to continue actual research.
I've been conflicted on AI/ML efforts for years. On one hand, the output of locally run inference is astounding. There are plenty of models on HuggingFace that I can run on my Mac Studio and provide real value to me every single work day. On the other hand, while I have the experience to evaluate the output, some of my younger colleagues do not. They are learning, and when I have time to help them, I certainly do, but I wish they just didn't have access to LLMs. LLMs are miracle tools in the right hands. They are dangerous conveniences in the wrong hands.
Wasted money is a totally different topic. If we view LLMs as a business opportunity, they haven't yet paid off. To imply, however, that a massive investment in GPUs is a waste seems flawed. GPUs are massively parallel compute. Were the AI market to collapse, we can imagine these GPUs being sold a severe discounts which would then likely spur some other technological innovation just as the crypto market laid the groundwork for ML/AI. When a resource gets cheap, more people gain access to it and innovation occurs. Things that were previously cost prohibitive become affordable.
So, whether or not we humans achieve AGI or make tons of money off of LLMs is somewhat irrelevant. The investment is creating goods of actual value even if those goods are currently overpriced, and should the currently intended use prove to be poor, a better and more lucrative use will be found in the event of an AI market crash.
Personally, I hope that the AGI effort is successful, and that we can all have a robot house keeper for $30k. I'd gladly trade one of the cars in my household to never do dishes, laundry, lawnmowing, or household repairs again just as I paid a few hundred to never have to vacuum my floors (though I actually still do once a month when I move furniture to vacuum places the Roomba can't go, a humanoid robot could do that for me).
I don't think it's controversial that these things are valuable but rather the cost to produce use things is up for discussion, and the real problem here. If the price is too high now, then there will be real losses people experience down the line, and real losses have real consequences.
> On one hand, the output of locally run inference is astounding. There are plenty of models on HuggingFace that I can run on my Mac Studio and provide real value to me every single work day. On the other hand, while I have the experience to evaluate the output, some of my younger colleagues do not. They are learning, and when I have time to help them, I certainly do, but I wish they just didn't have access to LLMs. LLMs are miracle tools in the right hands. They are dangerous conveniences in the wrong hands.
Is weird to me. Surely you recognise just as they don't know what they don't know (which is presumably the problem when it hallucinates), you must also have the same issue, there's just no old greybeard to wish you didn't have access.
Well, I'm the graybeard (literally and metaphorically). I know enough not to blindly trust the LLM, and I know enough to test everything whether written by human or machine. This is not always true of younger professionals.
What's the lifecycle length of GPUs? 2-4 years? By the time OpenAIs and Anthropics pivot, many GPUs will be beyond their half-life. I doubt there would be many takers for that infrastructure.
Especially given the humungous scale of infrastructure that the current approach requires. Is there another line of technology that would require remotely as much?
Note, I'm not saying there can't be. It's just that I don't think there are obvious shots at that target.
I don't think so about the gpus. It's a sunk cost that won't be repurposed easily--just look at what happened to Nortel. Did all those PBXs get repurposed? Nope--trash. Those data centers are going to eat it hard, that's my prediction. It's not a terrible thing, per se--"we" printed trillions the past few years and those events need a sink to get rid of all the excess liquidity. It's usually a big war, but not always. Last time it was a housing bubble. Everyone was going to get rich on real estate, but not really. It was just an exercise in finding bag holders. That's what this AI/data center situation amounts to as well--companies had billions in cash sitting around doing nothing, might as well spend it. Berkshire has the same problem--hundreds of billions with nowhere to be productively invested. It doesn't sound like a problem but it is.
My humble take on AGI is that we don't understand consciousness so how could we build something conscious except by accident? It seems like an extremely risky and foolish thing to attempt. Luckily, humans will fail at it.
I mean we really don't need something to be conscious, would just need it to be accurate, the current models simply aren't built on a framework to be accurate. I think that it's possible, and already being done to effectively instill accuracy. The issue there is that you no longer have the mass marketable general purpose machines that compete with things like Google / Facebook etc for eyeballs, which is where the huge money exists. So you can either sell valuable inference to a target audience and make a few billion. Or you can sell passable slop to 8 billion people and be worth a few trillion.
Just because something didn't work out doesn't mean it was a waste, and it isn't particularly clear that the the LLM boom was wasted, or that it is over, or that it isn't working. I can't figure out what people mean when they say "AGI" any more, we appear to be past that. We've got something that seems to be general and seems to be more intelligent than an average human. Apparently AGI means a sort of Einstein-Tolstoy-Jesus hybrid that can ride a unicycle and is far beyond the reach of most people I know.
> Just because something didn't work out doesn't mean it was a waste
Its all about scale.
If you spend $100 on something that didn't work out that money wasn't wasted if you learned something amazing. If you spend $1,000,000,000,000 on something that didn't work out the expectation is that you learn something close to 1,000,000,000x more than the $100 spend. If the value of learning is several orders of magnitude less than the level of investment there is absolutely tremendous waste.
For example: nobody qualifies spending a billion dollars on a failed project as value if your learning only resulted in avoiding future paper cuts.
While it doesn't seem we can agree on a meaning for AGI, I think a lot of people think of it as an intelligent entity that has 100% agency.
Currently we need to direct LLM's from task to task. They don't yet posses the capability of full real world context.
This is why I get confused when people talk about AI replacing jobs. It can replace work, but you still need skilled workers to guide them. To me, this could result in humans being even more valuable to businesses, and result in an even greater demand for labor.
If this is true, individuals need to race to learn how to use AI and use it well.
> Currently we need to direct LLM's from task to task.
Agent-loops that can work from larger scale goals work just fine. We can't letting them run with no oversight, but we certainly also don't need to micro-manage every task. Most days I'll have 3-4 agent-loops running in parallel, executing whole plans, that I only check in on occasionally.
I still need to review their output occasionally, but I certianly don't direct them task to task.
I do agree with you we still need skilled workers to guide them, so I don't think we necessarily disagree all that much, but we're past the point where they need to be micromanaged.
If we can't agree on a definition of AGI, then what good is it to say we have "human-in-the-loop AGI"? The only folks that will agree with you will be using your definition of AGI, which you haven't shared (at least in this posting). So, what is your definition of AGI?
AI capabilities today are jagged and people look at what they want to.
Boosters: it can answer PhD-level questions and it helps me a lot with my software projects.
Detractors: it can't learn to do a task it doesn't already know how to do.
Boosters: But actually it can actually sometimes do things it wouldn't be able to do otherwise if you give it lots of context and instructions.
Detractors: I want it to be able to actually figure out and retain the context itself, without being given detailed instructions every time, and do so reliably.
Boosters: But look, in this specific case it sort of does that.
Detractors: But not in my case.
Boosters: you're just using it wrong. There must be something wrong with your prompting strategy or how you manage context.
> We've got something that seems to be general and seems to be more intelligent than an average human.
We've got something that occasionally sounds as if it were more intelligent than an average human. However, if we stick to areas of interest of that average human, they'll beat the machine in reasoning, critical assessment, etc.
And in just about any area, an average human will beat the machine wherever a world model is required, i.e., a generalized understanding of how the world works.
It's not to criticize the usefulness of LLMs. Yet broad statements that an LLM is more intelligent than an average Joe are necessarily misleading.
I like how Simon Wardley assesses how good the most recent models are. He asks them to summarize an article or a book which he's deeply familiar with (his own or someone else's). It's like a test of trust. If he can't trust the summary of the stuff he knows, he can't trust the summary that's foreign to him either.
AFAICT "AGI" is a placeholder for peoples fears and hopes for massive change caused by AI. The singularity, massive job displacement, et cetera.
None of this is a binary, though. We already have AGI that is superhuman in some ways and subhuman in others. We are already using LLM's to help improve themselves. We already have job displacement.
That continuum is going to continue. AI will become more superhuman in some ways, but likely stay subhuman in others. LLM's will help improve themselves. Job displacement will increase.
Thus the question is whether this rate of change will be fast or slow. Seems mundane, but it's a big deal. Humans can adapt to slow changes, but not so well to fast ones. Thus AGI is a big deal, even if it's a crap stand in for the things people care about.
> Just because something didn't work out doesn't mean it was a waste
Here i think it's more about opportunity cost.
> I can't figure out what people mean when they say "AGI" any more, we appear to be past that
What i ask of an AGI is to not hallucinate idiotic stuff. I don't care about being bullshitted too much if the bullshit is logic, but when i ask "fix mypy errors using pydantic" and instead of declaring a type for a variable it invent weird algorithms that make no sense and don't work (and the fix would have taken 5 minutes for any average dev).I mean, Claude 4.5 and Codex have replaced my sed/search and replaces, write my sanity tests, write my commit comment, write my migration scripts (and most of my scripts), and make refactor so easy i now do one refactor every month or so, but if it is AGI, i _really_ wonder what people mean by intelligence.
> Also, if anyone wants to know what a real effort to waste a trillion dollars can buy
100% agree. Pleas Altman, Ilya and other, i will hapilly let you use whatever money you want if that money is taken from war profiteers and warmongers.
> Just because something didn't work out doesn't mean it was a waste
One thing to keep in mind, is that most of these people who go around spreading unfounded criticism of LLMs, "Gen-AI" and just generally AI aren't usually very deep into understanding computer science, and even less science itself. In their mind, if someone does an experiment, and it doesn't pan out, they'll assume that means "science itself failed", because they literally don't know how research and science work in practice.
Maybe true in general, but Gary Marcus is an experienced researcher and entrepreneur who’s been writing about AI for literally decades.
I’m quite critical, but I think we have to grant that he has plenty of credentials and understands the technical nature of what he’s critiquing quite well!
> Just because something didn't work out doesn't mean it was a waste, and it isn't particularly clear that the the LLM boom was wasted, or that it is over, or that it isn't working
Agreed. Has there been waste? Inarguably. Has the whole thing been a waste? Absolutely not. There are lessons from our past that in an ideal world would have allowed us to navigate this much more efficiently and effectively. However, if we're being honest with ourselves, that's been true of any nascent technology (especially hyped ones) for as long as we've been recording history. The path to success is paved with failure, Hindsight is 20/20, History rhymes and all that.
> I can't figure out what people mean when they say "AGI" any more
We've been asking "What is intelligence" (and/or Sentience) for as long as we've been alive, and still haven't come to a consensus on that. Plenty people will confidently claim they have an answer, which is great, but it's entirely irrelevant if there's not a broad consensus on that definition or a well defined way to verify AI/people/anything against it. Point in case...
> we appear to be past that. We've got something that seems to be general and seems to be more intelligent than an average human
Hard disagree specifically as it regards to Intelligence. They are certainly useful utilities when you use them right, but I digress. What are you basing that on? How can we be sure we're past a goal-post when we don't even know where the goal-post is? For starters, how much is Speed (or latency or IOP/TPSs or however you wish to contextualize it) a function of "intelligence"? For a tangible example of that: If an AI came to a conclusion derived from 100 separate sources, and a human manually went through those same 100 sources and came to the same conclusion, is the AI more intelligent by virtue of completing that task faster? I can absolutely see (and agree with) how that is convenient/useful, but the question specifically is: Does the speed it can provide answers (assuming they're both correct/same) make it smarter or as smart as the human?
How do they rationalize and reason their way through new problems? How do we humans? How important is the reasoning or the "how" of how it arrives at answers to the questions we ask it if the answers are correct? For a tangible example of that: What is happening when you ask an AI to compute the sum of 1 plus 1? What are we doing when we're asking to perform the same task? What about proving it to be correct? More broadly, in the context of AGI/Intelligence, does it matter if the "path of reason" differs if the answers are correct?
What about how confidently it presents those answers (correct or not)? It's well known that us humans are incredibly biased towards confidence. Personally, I might start buying into the hype the day that AI starts telling me "I'm not sure" or "I don't know." Ultimately, until I can trust it to tell me it doesn't know/isn't certain, I wont trust it when it tells me it does know/is certain, regardless of how "Correct" it may be. We'll get there one day, and until then I'm happy to use it for the utility and convenience it provides while doing my part to make it better and more useful.
Eh, tearing down a straw man is not an impressive argument from you either.
As a counter-point, LLMs still do embarrassing amounts of hallucinations, some of which are quite hilarious. When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something -- then the agents will be much closer to whatever most people imagine AGI to be.
Can they, fundamentally, do that? That is, given the current technology.
Architecturally, they don't have a concept of "not knowing." They can say "I don't know," but it simply means that it was the most likely answer based on the training data.
> When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something
ChatGPT and Gemini (and maybe others) can already perform and cite web searches, and it vastly improves their performance. ChatGPT is particularly impressive at multi-step web research. I have also witnessed them saying "I can't find the information you want" instead of hallucinating.
It's not perfect yet, but it's definitely climbing human percentiles in terms of reliability.
I think a lot of LLM detractors are still thinking of 2023-era ChatGPT. If everyone tried the most recent pro-level models with all the bells and whistles then I think there would be a lot less disagreement.
I believe in a very practical definition of AGI. AGI is a system capable of RSI. Why? Because it mimics humans. We have some behaviours that are given to us from birth, but the real power of humans is our ability to learn and improve ourselves and the environment around us.
A system capable of self improvement will be sufficient for AGI imo.
Ah - recursive self improvement. I was thinking repetitive strain injury was odd. But that's probably quite a good test although LLMs may be able to improve a bit but still not be very good. An interesting point for me is if all humans went away could the AI/robots keep on without us which would require them to be able to maintain and build power plants, chip fabs and the like. A way to go on that one.
Self improvement doesn’t mean self improvement in any possible direction without any tradeoffs. Genetic algorithms can do everything an LLM can given enough computational resources and training, but being wildly inefficient humanity can’t actually use them to make a chatbot on any even vaguely relevant timeline.
There was a lot of talk about reaching "peak AI" in early summer of this year.
I guess there is some truth to it. The last big improvement to LLMs was reasoning. It gave the existing models additional capabilities (after some re-training).
We've reached the plateau of tiny incremental updates. Like with smartphones. I sometime still use an iPhone 6s. There is no fundamental difference compared to the most current iPhone generation 10 years later. The 6s is still able to perform most of the tasks you need a smartphone to do. The new ones do it much faster, and everything works better, but the changes are not disrupting at all.
No suprise to see self-congratulations and more "I'm the only person who ever questioned genAI" nonsense as the key parts of this article. What a bore Marcus is.
Everyone misses the forest for the trees. Neural unit and layered topology deep learning applications excel at revealing and leveraging correlations (something people flippantly refer to crudely as "pattern matching") at many scales in a data set. Commercial LLMs are merely deep learning applications purpose built to reveal correlations in the corpus of internet available language. The impressive aspect of LLMs is not that they seem to give such comprehensive answers to queries, but rather what that fact says about human language itself. Human language uses thousands of tokens (degrees of freedom) which in theory could be combined in an infinite number of ways to encode information. Yet LLMs show us that we really only use our tokens in a very limited, highly correlated manner. Taking it a step further, this also demonstrates the limits of deep learning... that an LLM requires a trillion parameters and $100B to characterize the much much lower dimensionality of this data set should be a clear signal that LLMs and likely all deep learning approaches based on data alone are not a viable path to "intelligence". Anyway, I'm just a vallet (yes, FSD fans, this still exists) so what to I know?
He probably makes quite good money as the go to guy for saying AI is rubbish? https://champions-speakers.co.uk/speaker-agent/gary-marcus
But that's certainly not a nuanced / trustworthy analysis of things unless you're a top tier researcher.
> 1997 - Professor of Psychology and Neural Science
Gary Marcus is a mindless talking head "contrarian" at this point. He should get a real job.
> Yet deep learning may well be approaching a wall, much as I anticipated earlier, at beginning of the resurgence (Marcus, 2012)
(From "Deep Learning: A Critical Appraisal")
https://arxiv.org/abs/1801.00631
Here are some of the points
Is deep learning approaching a wall? - He doesn't make a concrete prediction, which seems like a hedge to avoid looking silly later. Similarly, I noticed a hedge in this post:
Of course it ain’t over til it’s over. Maybe pure scaling ... will somehow magically yet solve ...
---
But the paper isn't wrong either:
Deep learning thus far is data hungry - yes, absolutely
Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans
Deep learning thus far has no natural way to deal with hierarchical structure - I think this is technically true, but I would also say that a HUMAN can LEARN to use LLMs while taking these limitations into account. It's non-trivial to use them, but they are useful
Deep learning thus far has struggled with open-ended inference - same point as above -- all the limitations are of course open research questions, but it doesn't necessarily mean that scaling was "wrong". (The amount of money does seem crazy though, and if it screws up the US economy, I wouldn't be that surprised)
Deep learning thus far is not sufficiently transparent - absolutely, the scaling has greatly outpaced understanding/interpretability
Deep learning thus far has not been well integrated with prior knowledge - also seems like a valuable research direction
Deep learning thus far cannot inherently distinguish causation from correlation - ditto
Deep learning presumes a largely stable world, in ways that may be problematic - he uses the example of Google Flu Trends ... yes, deep learning cannot predict the future better than humans. That is a key point in the book "AI Snake Oil". I think this relates to the point about generalization -- deep learning is better at regurgitating and remixing the past, rather than generalizing and understanding the future.
Lots of people are saying otherwise, and then when you call them out on their predictions from 2 years ago, they have curiously short memories.
Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted - absolutely, this is the main limitation. You have to verify its answers, and this can be very costly. Deep learning is only useful when verifying say 5 solutions is significantly cheaper than coming up with one yourself.
Deep learning thus far is difficult to engineer with - this is still true, e.g. deep learning failed to solve self-driving ~10 years ago
---
So Marcus is not wrong, and has nothing to apologize for. The scaling enthusiasts were not exactly wrong either, and we'll see what happens to their companies.
It does seem similar to be dot com bubble - when the dust cleared, real value was created. But you can also see that the marketing was very self-serving.
Stuff like "AGI 2027" will come off poorly -- it's an attempt by people with little power to curry favor with powerful people. They are serving as the marketing arm, and oddly not realizing it.
"AI will write all the code" will also come off poorly. Or at least we will realize that software creation != writing code, and software creation is the valuable activity
>Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans
But they do generalize to some extent, and my limited understanding is that they generalize way more than expected ("emergent abilities") from the pre-LLM era, when this prediction was made. Sutskever pretty much starts the podcast saying "Isn’t it straight out of science fiction?"
Now Gary Marcus says "limited capacity for transfer" so there is wiggle room there, but can this be quantified and compared to what is being seen today?
In the absence of concrete numbers, I would suspect he is wrong here. I mean, I still cannot mechanistically picture in my head how my intent, conveyed in high-level English, can get transformed into working code that fits just right into the rather bespoke surrounding code. Beyond coding, I've seen ChatGPT detect sarcasm in social media posts about truly absurd situations. In both cases, the test data is probably outside the distribution of the training data.
At some level, it is extracting abstract concepts from its training data, as well as my prompt and the unusual test data, even apply appropriate value judgements to those concepts where suitable, and combine everything properly to generate a correct response. These are much higher-level concepts than the ones Marcus says deep learning has no grasp of.
Absent quantifiable metrics, on a qualitative basis at least I would hold this point against him.
On a separate note:
> "AI will write all the code" will also come off poorly.
On the contrary, I think it is already true (cf agentic spec-driven development.) Sure, there are the hyper-boosters who were expecting software engineers to be replaced entirely, but looking back, claims from Dario, Satya, Pichai and their ilk were were all about "writing code" and not "creating software." They understand the difference and in retrospect were being deliberately careful in their wording while still aiming to create a splash.
The public has now caught up with that view. Familiarity breeds contempt, in this case justifiably so.
EDIT: It is interesting that in a submission about Sutskever essentially citing Sutskever is downvoted. You can do it here, but the whole of YouTube will still hate "AI".
Oh please. What LLMs are doing now was complete and utter science fiction just 10 years ago (2015).
He wasn't wrong though.
The feeling I have now is that it was a fine decision for him to have made. It made a point at the time, perhaps moral, perhaps political. And now it seems, despite whatever cost there was for him at the time, the "golden years" of OpenAI (and LLM's in general) may have been over anyway.
To be sure, I happen to believe there is a lot of mileage for LLMs even in their current state—a lot of use-cases, integration we have yet to explore. But Sutskever I assume is a researcher and not a plumber—for him the LLM was probably over.
One wonders how long before one of these "break throughs". On one hand, they may come about serendipitously, and serendipity has no schedule. It harkens back to when A.I. itself was always "a decade away". You know, since the 1950's or so.
On the other hand, there are a lot more eyeballs on AI these days than there ever were in Minsky's* day.
(*Hate to even mention the man's name these days.)
Indeed. Humans are a sucker for a quick answer delivered confidently. And The industry coalesced around LLM's once it was able to output competent, confident, corporate (aka HR-approved) english, which for many AI/DL/ML/NN researchers was actually a bit of a bummer. Reason I say that is because that milestone suddenly made the "[AGI is] always a decade away" to seeming much more imminent. Thus the focus of investment in the space shifted from actual ML/DL/NN research to who could convert largest pile of speculatively leveraged money into pallets of GPU's and data to feed them as "throw more compute/data" at it was a quicker/more reliable way to realize performance gains than investing in research did. Yes, research would inevitably yield results, but it's incredibly hard to forceast how long it takes for research to yield tangible results and harder still to quantify that X dollars will result in Y result in Z time compared to X dollars buys Y compute deployed in Z time. With the immense speculative backed FOMO and the potential valuation/investment that could result from being "the leader" in any given regard, it's no wonder that BigTech chose to primarily invest in the latter, thus leaving to those working in the former space to start considering looking elsewhere to continue actual research.
Dead Comment
Wasted money is a totally different topic. If we view LLMs as a business opportunity, they haven't yet paid off. To imply, however, that a massive investment in GPUs is a waste seems flawed. GPUs are massively parallel compute. Were the AI market to collapse, we can imagine these GPUs being sold a severe discounts which would then likely spur some other technological innovation just as the crypto market laid the groundwork for ML/AI. When a resource gets cheap, more people gain access to it and innovation occurs. Things that were previously cost prohibitive become affordable.
So, whether or not we humans achieve AGI or make tons of money off of LLMs is somewhat irrelevant. The investment is creating goods of actual value even if those goods are currently overpriced, and should the currently intended use prove to be poor, a better and more lucrative use will be found in the event of an AI market crash.
Personally, I hope that the AGI effort is successful, and that we can all have a robot house keeper for $30k. I'd gladly trade one of the cars in my household to never do dishes, laundry, lawnmowing, or household repairs again just as I paid a few hundred to never have to vacuum my floors (though I actually still do once a month when I move furniture to vacuum places the Roomba can't go, a humanoid robot could do that for me).
"creating goods of actual value"
and
"creating goods of actual value for any price"
I don't think it's controversial that these things are valuable but rather the cost to produce use things is up for discussion, and the real problem here. If the price is too high now, then there will be real losses people experience down the line, and real losses have real consequences.
Is weird to me. Surely you recognise just as they don't know what they don't know (which is presumably the problem when it hallucinates), you must also have the same issue, there's just no old greybeard to wish you didn't have access.
Especially given the humungous scale of infrastructure that the current approach requires. Is there another line of technology that would require remotely as much?
Note, I'm not saying there can't be. It's just that I don't think there are obvious shots at that target.
My humble take on AGI is that we don't understand consciousness so how could we build something conscious except by accident? It seems like an extremely risky and foolish thing to attempt. Luckily, humans will fail at it.
Also, if anyone wants to know what a real effort to waste a trillion dollars can buy ... https://costsofwar.watson.brown.edu/
Its all about scale.
If you spend $100 on something that didn't work out that money wasn't wasted if you learned something amazing. If you spend $1,000,000,000,000 on something that didn't work out the expectation is that you learn something close to 1,000,000,000x more than the $100 spend. If the value of learning is several orders of magnitude less than the level of investment there is absolutely tremendous waste.
For example: nobody qualifies spending a billion dollars on a failed project as value if your learning only resulted in avoiding future paper cuts.
While it doesn't seem we can agree on a meaning for AGI, I think a lot of people think of it as an intelligent entity that has 100% agency.
Currently we need to direct LLM's from task to task. They don't yet posses the capability of full real world context.
This is why I get confused when people talk about AI replacing jobs. It can replace work, but you still need skilled workers to guide them. To me, this could result in humans being even more valuable to businesses, and result in an even greater demand for labor.
If this is true, individuals need to race to learn how to use AI and use it well.
Agent-loops that can work from larger scale goals work just fine. We can't letting them run with no oversight, but we certainly also don't need to micro-manage every task. Most days I'll have 3-4 agent-loops running in parallel, executing whole plans, that I only check in on occasionally.
I still need to review their output occasionally, but I certianly don't direct them task to task.
I do agree with you we still need skilled workers to guide them, so I don't think we necessarily disagree all that much, but we're past the point where they need to be micromanaged.
Boosters: it can answer PhD-level questions and it helps me a lot with my software projects.
Detractors: it can't learn to do a task it doesn't already know how to do.
Boosters: But actually it can actually sometimes do things it wouldn't be able to do otherwise if you give it lots of context and instructions.
Detractors: I want it to be able to actually figure out and retain the context itself, without being given detailed instructions every time, and do so reliably.
Boosters: But look, in this specific case it sort of does that.
Detractors: But not in my case.
Boosters: you're just using it wrong. There must be something wrong with your prompting strategy or how you manage context.
etc etc etc...
We've got something that occasionally sounds as if it were more intelligent than an average human. However, if we stick to areas of interest of that average human, they'll beat the machine in reasoning, critical assessment, etc.
And in just about any area, an average human will beat the machine wherever a world model is required, i.e., a generalized understanding of how the world works.
It's not to criticize the usefulness of LLMs. Yet broad statements that an LLM is more intelligent than an average Joe are necessarily misleading.
I like how Simon Wardley assesses how good the most recent models are. He asks them to summarize an article or a book which he's deeply familiar with (his own or someone else's). It's like a test of trust. If he can't trust the summary of the stuff he knows, he can't trust the summary that's foreign to him either.
None of this is a binary, though. We already have AGI that is superhuman in some ways and subhuman in others. We are already using LLM's to help improve themselves. We already have job displacement.
That continuum is going to continue. AI will become more superhuman in some ways, but likely stay subhuman in others. LLM's will help improve themselves. Job displacement will increase.
Thus the question is whether this rate of change will be fast or slow. Seems mundane, but it's a big deal. Humans can adapt to slow changes, but not so well to fast ones. Thus AGI is a big deal, even if it's a crap stand in for the things people care about.
[1] https://en.wikipedia.org/wiki/Bloom's_taxonomy
Here i think it's more about opportunity cost.
> I can't figure out what people mean when they say "AGI" any more, we appear to be past that
What i ask of an AGI is to not hallucinate idiotic stuff. I don't care about being bullshitted too much if the bullshit is logic, but when i ask "fix mypy errors using pydantic" and instead of declaring a type for a variable it invent weird algorithms that make no sense and don't work (and the fix would have taken 5 minutes for any average dev).I mean, Claude 4.5 and Codex have replaced my sed/search and replaces, write my sanity tests, write my commit comment, write my migration scripts (and most of my scripts), and make refactor so easy i now do one refactor every month or so, but if it is AGI, i _really_ wonder what people mean by intelligence.
> Also, if anyone wants to know what a real effort to waste a trillion dollars can buy
100% agree. Pleas Altman, Ilya and other, i will hapilly let you use whatever money you want if that money is taken from war profiteers and warmongers.
One thing to keep in mind, is that most of these people who go around spreading unfounded criticism of LLMs, "Gen-AI" and just generally AI aren't usually very deep into understanding computer science, and even less science itself. In their mind, if someone does an experiment, and it doesn't pan out, they'll assume that means "science itself failed", because they literally don't know how research and science work in practice.
I’m quite critical, but I think we have to grant that he has plenty of credentials and understands the technical nature of what he’s critiquing quite well!
Agreed. Has there been waste? Inarguably. Has the whole thing been a waste? Absolutely not. There are lessons from our past that in an ideal world would have allowed us to navigate this much more efficiently and effectively. However, if we're being honest with ourselves, that's been true of any nascent technology (especially hyped ones) for as long as we've been recording history. The path to success is paved with failure, Hindsight is 20/20, History rhymes and all that.
> I can't figure out what people mean when they say "AGI" any more
We've been asking "What is intelligence" (and/or Sentience) for as long as we've been alive, and still haven't come to a consensus on that. Plenty people will confidently claim they have an answer, which is great, but it's entirely irrelevant if there's not a broad consensus on that definition or a well defined way to verify AI/people/anything against it. Point in case...
> we appear to be past that. We've got something that seems to be general and seems to be more intelligent than an average human
Hard disagree specifically as it regards to Intelligence. They are certainly useful utilities when you use them right, but I digress. What are you basing that on? How can we be sure we're past a goal-post when we don't even know where the goal-post is? For starters, how much is Speed (or latency or IOP/TPSs or however you wish to contextualize it) a function of "intelligence"? For a tangible example of that: If an AI came to a conclusion derived from 100 separate sources, and a human manually went through those same 100 sources and came to the same conclusion, is the AI more intelligent by virtue of completing that task faster? I can absolutely see (and agree with) how that is convenient/useful, but the question specifically is: Does the speed it can provide answers (assuming they're both correct/same) make it smarter or as smart as the human?
How do they rationalize and reason their way through new problems? How do we humans? How important is the reasoning or the "how" of how it arrives at answers to the questions we ask it if the answers are correct? For a tangible example of that: What is happening when you ask an AI to compute the sum of 1 plus 1? What are we doing when we're asking to perform the same task? What about proving it to be correct? More broadly, in the context of AGI/Intelligence, does it matter if the "path of reason" differs if the answers are correct?
What about how confidently it presents those answers (correct or not)? It's well known that us humans are incredibly biased towards confidence. Personally, I might start buying into the hype the day that AI starts telling me "I'm not sure" or "I don't know." Ultimately, until I can trust it to tell me it doesn't know/isn't certain, I wont trust it when it tells me it does know/is certain, regardless of how "Correct" it may be. We'll get there one day, and until then I'm happy to use it for the utility and convenience it provides while doing my part to make it better and more useful.
As a counter-point, LLMs still do embarrassing amounts of hallucinations, some of which are quite hilarious. When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something -- then the agents will be much closer to whatever most people imagine AGI to be.
Have LLMs learned to say "I don't know" yet?
Can they, fundamentally, do that? That is, given the current technology.
Architecturally, they don't have a concept of "not knowing." They can say "I don't know," but it simply means that it was the most likely answer based on the training data.
A perfect example: an LLM citing chess rules and still making an illegal move: https://garymarcus.substack.com/p/generative-ais-crippling-a...
Heck, it can even say the move would have been illegal. And it would still make it.
ChatGPT and Gemini (and maybe others) can already perform and cite web searches, and it vastly improves their performance. ChatGPT is particularly impressive at multi-step web research. I have also witnessed them saying "I can't find the information you want" instead of hallucinating.
It's not perfect yet, but it's definitely climbing human percentiles in terms of reliability.
I think a lot of LLM detractors are still thinking of 2023-era ChatGPT. If everyone tried the most recent pro-level models with all the bells and whistles then I think there would be a lot less disagreement.
All the time, which you'd know very well if you'd spent much time with current-generation reasoning models.
A system capable of self improvement will be sufficient for AGI imo.
I guess there is some truth to it. The last big improvement to LLMs was reasoning. It gave the existing models additional capabilities (after some re-training).
We've reached the plateau of tiny incremental updates. Like with smartphones. I sometime still use an iPhone 6s. There is no fundamental difference compared to the most current iPhone generation 10 years later. The 6s is still able to perform most of the tasks you need a smartphone to do. The new ones do it much faster, and everything works better, but the changes are not disrupting at all.