I am surprised by how dismissive the whole post sounds. For example:
> OpenAI could in fact have a breakthrough that fundamentally changes the world
Well, it appears to me that OpenAI already has such a breakthrough- it had it roughly 4 years ago with GPT2, and it's still scaling it.
Considering that it's not yet a year since the introduction of the first ChatGPT, and given the pace at which it's evolving, I would say that the current product is already showing great promise to fundamentally change the world. I would not be surprised if just incremental changes were enough to fulfill that prediction. The impact at this point seems more limited by the ability of society to absorb and process the technology rather than intrinsic limits of the technology itself.
One thing OpenAI has now that it didn't have 4 years ago is a lot more compute power at its disposal. Sam Altman has already said "I think we're at the end of the era where it's going to be these, like, giant, giant models." If that's actually true, then the GPTX tech has largely hit a wall in which throwing more compute at it won't get the same increase in capability. Bill Gates predicted that GTP5 won't be much better than GPT4. So, any "breakthrough" really could be incremental rather than game changing.
I think there are two main source of learning for AI - the web scrape datasets, which contain our historical experiences and communications, and AI feedback generated from deployed agents. The web text is almost exhausted, or we can't scale it 100x more, but the feedback is just starting to ramp up.
Every day millions of chat sessions are recorded, and they are exceptional training examples. They would contain the kind of errors LLMs do, and the kind of demands people have, and include a human reaction to each LLM message.
The OpenAI move to create "GPTs" is showing they are actively working on improving the feedback signals by empowering the LLM with RAG, code execution and API access. In such a setup it is possible to use a model at level N to generate data at level N+1.
The keyword here is learning from feedback, which aligns with recent talk of using RL methods like AlphaZero with LLMs. A RL agent would create its own data as it goes. I think progress will be gradual, as we need to wait for the world to produce the learning feedback signal. Of course in domains where we can speed that up, AI will progress faster.
Interesting thought: by making LLMs available to the public, they are going to assist people in many ways and create effects that will percolate in the next training set: LLM inference -> text -> effects in the world -> text -> LLM training. So there is already an implicit feedback loop when we retrain the base models. GPT-5 will train on data from a world influenced and shaped with GPT-4.
The corollary of this is optimistic though. If increasingly large models don't make a big difference, this is hinting at the fact that data quality is a lot more important than the raw quantity of data. This is good news for open source models, because it would be possible to run or train viable models on less expensive hardware, and having billions of dollars in GPUs isn't as much of a moat as they are suggesting.
Skeptic and completely reactionary. I had to unfollow him on Twitter because he always has to have a "take" on every AI headline, and he's often contradictory between "AI is useless" and "AI is a huge threat".
I don't know, it is an opinion and it seems kind of well-founded (i.e. there's no evidence for groundbreaking research on OpenAI part except for scaling things up).
It seems to me that the breakthrough claim was a desperate attempt of OpenAI staffs to get their beloved boss back to OpenAI. If anything it most probably will be incremental things rather than a breakthrough but definitely not a bad things but just call a spade, a spade [1]. Ironically the breakthrough happened somewhere else in Google and for some unknown reasons as of now, Google has missed the boat of commercializing its very own invention.
Personally I have been recently have been using the ChatGPT-4 on a daily basis, and I considered myself a long time and ardent user of Google products (mainly search) for the past two decades. However it becomes increasingly frustrating for the past several years since it is getting more difficult to perform "search that matters" (going to trademark this motto). What frequently irked me the most is not when doing random or targetted search with Google, is when looking for something that you discovered previously but it's forgotten and apparently it does really matter now. It was quoted that about 10% of our time looking for items that we lost or misplaced, and the same can be said to our acquired knowledge and information. Some times we crave so much for the knowledge or the info we had but very difficult or impossible to recall. ChatGPT-4, in particular the online search with Bing feature, is extremely useful in this regard but at the same time is very limited since it is not a well supported features with many failed attempts, perhaps due to limited online data scrapping capability, and thus sub-par results. This specific feature, I call it search that matters, is like Google search with steroid and the fact that Google, with its Deepmind subsidiary, failed to utilize and monetize on this opportunity until now is just beyond me. If ChatGPT, call it ChatGPT-5 if you like, can perform this operation intuitively and seamlessly it will be a game changer but not a breakthrough. Apparently according to ChatGPT-4 you can have a game changer not a breakthrough.
The breakthrough, however, is to fundamentally improve AI or LLM itself as mentioned by Stephen Wolfram in his tutorial article on ChatGPT, not merely enhancing its existing operations [2]:
When it comes to training (AKA learning) the different “hardware” of the brain and of current computers (as well as, perhaps, some undeveloped algorithmic ideas) forces ChatGPT to use a strategy that’s probably rather different (and in some ways much less efficient) than the brain. And there’s something else as well: unlike even in typical algorithmic computation, ChatGPT doesn’t internally “have loops” or “recompute on data”. And that inevitably limits its computational capability - even with respect to current computers, but definitely with respect to the brain.
[1] Eight Things to Know about Large Language Models:
Must agree with you. Even when GPT-2 not impressed me much, but with GPT-3 and GPT-4 we constantly see significant progress, not seen on near every other technologies.
Even if GPT-5 will be fatally expensive, but still show such leap as GPT-4, it will be enough to consider OpenAI most successful AI research in human history.
And for about expenses, we just need to wait some 10 years, and GPT-3 class tech will be in every smartphone, and in 30 years, GPT-4 will become also affordable for everyone, GPT-5 will be just ordinary business workhorse.
>I am surprised by how dismissive the whole post sounds.
I wouldn't be surprised, it's Gary Marcus. He's an academic with a lot of prestige to lose if the LLM approach is actually good/useful/insightful, who's only widely publicly known now because AI has had a backlash and media needed an expert to quote for "the other side". Same as the computational linguistics researchers who always get quoted for the same reason.
In general, academics in competing fields whose funding threatens to get tanked or eclipsed by research approaches that work on principles they have fundamental academic disagreements with are going to talk negatively about the tech, no matter what it is achieving. Where I think it can be valuable to listen to them is when they're giving the technology credit - generally they'll only do that when it's something really undeniable or potentially concerning.
> it's Gary Marcus. He's an academic with a lot of prestige to lose if the LLM approach is actually good/useful/insightful
I wouldn't bet on loss of perceived expert on the other side status or prestige no matter how poorly his gloomy LLM forecasts fare. It hasn't happened from any of his previous consistently pessimistic predictions around deep learning since the beginning of its resurgence (which has clearly far surpassed its state in 2012, even if currently overhyped). To quote Gary Marcus quoting Gary Marcus:
> Yet deep learning may well be approaching a wall, much as I anticipated earlier, at beginning of the resurgence (Marcus, 2012)
This is such a case of a bad-comment that seems clever and insightful. It boils down to saying we don't need to debate or even consider the content of his arguments because we can assume he's only motivated by prestige and money (but without considering the second-order effects on his credibility and funding if he actually turns out to be proven substantially wrong in the future).
I don't know how right or wrong he is - none of us do. That's why it's all still being debated.
The one thing I know is that we can only truly understand a topic by fully understanding arguments for and against all the claims. I also know the pro-LLM set have way more money (double-digit billions as we saw just this week) and credibility to lose over this topic than Gary Marcus does.
I'm trying to refresh mind on what Q learning is, if I may think out loud here?
Q is an evaluation of how good a state is assuming the agent acts optimally going forward. In Chess for example, sacrificing a rook to capture a queen would have a high Q value. Sacing a rook is bad, but gaining the queen later is better. Q values are supposed to see past immediate consequences and reflect the long term consequences.
How do we find a Q value though? We can look several steps ahead, or we can look to the end of the game, etc. These aren't real Q values though, because optimal actions probably weren't taken. We can use Bellman equations, which roughly update one Q value by a small amount to the maximum of the next possible states, which results in the highest Q values gradually flowing backward into states that can lead to good outcomes.
I'm trying to think how this would apply to LLMs though. Where do the basic values come from? In Chess the basic values are the number of pieces, or whether or not a side ultimately wins. What are the basic values in a LLM? What is the Q agent aiming to do?
I don't think your intuition about computer chess is going to help you here with transformer architecture.
Usually in transformer models[1], for each attention head there are 3 vectors of weights, known as q(uery), k(ey) and v(alue). I was assuming that the q in q* applied to the q vector, so q-learning is training this vector. In a transformer model you don't have an objective function for any sort of state evaluation so it can't be the Q you're thinking of.
If they've done something funky with the q vector that could indeed be a breakthrough since a lot of people feel that we are sort of running out of juice with scaling transformers as-is and need a new architecture to really have a step change in capability. That's pure speculation on my part though.
Current architectures have a lot of juice left across several axes, using extraordinarily accurate scaling laws. We're nowhere close to a wall.
The industry has been focusing hard on 'System Two' approaches to augment our largely 'System One'-style models, where optimal decision policies make Q-learning a natural approach. The recent unlock here might be related to using the neural nets' general intelligence/flexibility to better broadly estimate their own states/rewards/actions (like humans). [EDIT: To be clear, the Q-learning stuff is my own speculation, whereas the 'System Two' stuff is well known.]
They’re not assuming that the Q refers to the Transformer’s “query”, they’re talking about the reward function in reinforcement learning, using chess as an example.
In computer chess, traditionally 'minimax' was the strategy that allowed you to efficiently probe ahead. The only requirement for this is that your evaluation function returns a single (scalar) value.
> Q is an evaluation of how good a state is assuming the agent acts optimally going forward.
The Q-value represents the expected utility or the total reward of taking a certain action in a given state, followed by following an optimal policy thereafter. So, yes, it does evaluate how "good" or beneficial a state is, but specifically in conjunction with a chosen action and under the assumption that the agent will act optimally in the future.
So it is conditioned on state and action, not just on state.
It's probably something to do with path finding with constraints in high dimensional search spaces I'd guess. Like quake bots used to do in the late 90s
Funny the author cherry picks the rubiks cube "breakthrough" but not AlphaGo or from GPT2 to ChatGPT. I don't think anyone could've predicted the massive jump from pre-instruct GPTs, heck even going 3.5 to 4 is a clear step difference.
If Altman was referring to Q* in his "pushing the veil of ignorance" comment then Q* is in OAI's top four breakthroughs
I think he’s using that to show that there is a precedent with this company to make misleading announcements that at first sound exciting but under the surface turn out to be overstated.
It does seem a bit odd to make a grandiose, vague announcement about a breakthrough outside of a major Jobs-esque unveiling. If you had some amazing new capability, what’s the motivation to hint at it publicly versus unveil it with a major announcement/demo? It makes me wonder if it’s an attempt to keep the public in awe with a dangling carrot. I’m also skeptical when companies’ and founders’ reputations seem to greatly outpace actual product deliveries.
AlphaGo and ChatGPT were groundbreaking and are more recent, if anything those should form a new precendent, he is deliberately ignoring them and dug out the most underwhelming one. Q* was a leak when the company was without a CEO, their DevDay was 2 week ago
That said, based on many comments here the author is a skeptic to the point that people don't take him seriously
I still think the "AI effect" is bullshit. A* was/is AI under a different meaning - it's what understood as "AI" in computer games, where the point is for the computer opponent to pretend to be smarter than a brick, and literally any trick, cheat or heuristic used to do that is part of "AI". A*? Obviously AI. Random walk? Yes. Fuzzy logic? Yes. State machines? Yes.
By way of [1], here [2] is a paper about A* also posted under CS/AI. It introduces "Q* search, a search algorithm that uses deep Q-networks to guide search" and uses it "to solve the Rubik's cube", the thing which Gary Marcus complained about OpenAI not having done with a neural network in 2019.
As pointed out by techbro92 [3] the authors do not seem to be affiliated with OpenAI, but AFAIK nobody claimed that OpenAI invented Q*. It's not hard to imagine OpenAI trying it out with their Rubik-solving robot hand and then starting to think about other things it might be applied to.
There's no magic in 'AI', all of it can be described as something else. The whole of ML is 'just statistics', especially classification.
(That said I was and I suppose am a bit of a sceptic because of that. But mainly I think because to me it's more useful to call a spade a spade and say we're doing some advanced blah stats to blah. Helps explain, onboard, and motivate the next generation (who likes stats at school? But if that's how AI works, oh suddenly it's cool). Demystify it. If you're just searching over Levenshtein distance [to a technical audience] say that, don't start waffling about 'AI'.)
I first learned it in the famous book "Artificial Intelligence: A Modern Approach" by Peter Norvig. I think 3 chapters are dedicated to search. A lot of AI can be framed as search, and a whole lot of historical AI, or mundane present AI, is straightforward search.
AI is the cutting edge of CS. When it's well understood it stops being AI. Some day LLM and neural nets will get the same reaction... who calls LLM "AI"? It's just a formula!
Is it really though? I feel like it's mostly non-technical people who think this, because back when I was a CS student, AI was just a big field of study that included many types of problem solving, from simple to complex, including all types of search such as A*.
Probably because of A*'s popularity for path finding in game development. Everything that imitates human-like behaviour in games is called "AI", even when most of it is just a big messy hairball of adhoc if-else code under the hood ;)
Wait, does that mean Q* is just tree-of-thought reasoning? Like are we at the point where OpenAI is so closed that they NIH everyone else's ideas with different names?
The Q function is a fundamental part of reinforcement learning. It's a function that gives the expected reward of an agent given a specific action and state. Deep Q Learning, that is using a neural network to estimate the Q function has been around for a while now.
> OpenAI could in fact have a breakthrough that fundamentally changes the world
Well, it appears to me that OpenAI already has such a breakthrough- it had it roughly 4 years ago with GPT2, and it's still scaling it.
Considering that it's not yet a year since the introduction of the first ChatGPT, and given the pace at which it's evolving, I would say that the current product is already showing great promise to fundamentally change the world. I would not be surprised if just incremental changes were enough to fulfill that prediction. The impact at this point seems more limited by the ability of society to absorb and process the technology rather than intrinsic limits of the technology itself.
Every day millions of chat sessions are recorded, and they are exceptional training examples. They would contain the kind of errors LLMs do, and the kind of demands people have, and include a human reaction to each LLM message.
The OpenAI move to create "GPTs" is showing they are actively working on improving the feedback signals by empowering the LLM with RAG, code execution and API access. In such a setup it is possible to use a model at level N to generate data at level N+1.
The keyword here is learning from feedback, which aligns with recent talk of using RL methods like AlphaZero with LLMs. A RL agent would create its own data as it goes. I think progress will be gradual, as we need to wait for the world to produce the learning feedback signal. Of course in domains where we can speed that up, AI will progress faster.
Interesting thought: by making LLMs available to the public, they are going to assist people in many ways and create effects that will percolate in the next training set: LLM inference -> text -> effects in the world -> text -> LLM training. So there is already an implicit feedback loop when we retrain the base models. GPT-5 will train on data from a world influenced and shaped with GPT-4.
Personally I have been recently have been using the ChatGPT-4 on a daily basis, and I considered myself a long time and ardent user of Google products (mainly search) for the past two decades. However it becomes increasingly frustrating for the past several years since it is getting more difficult to perform "search that matters" (going to trademark this motto). What frequently irked me the most is not when doing random or targetted search with Google, is when looking for something that you discovered previously but it's forgotten and apparently it does really matter now. It was quoted that about 10% of our time looking for items that we lost or misplaced, and the same can be said to our acquired knowledge and information. Some times we crave so much for the knowledge or the info we had but very difficult or impossible to recall. ChatGPT-4, in particular the online search with Bing feature, is extremely useful in this regard but at the same time is very limited since it is not a well supported features with many failed attempts, perhaps due to limited online data scrapping capability, and thus sub-par results. This specific feature, I call it search that matters, is like Google search with steroid and the fact that Google, with its Deepmind subsidiary, failed to utilize and monetize on this opportunity until now is just beyond me. If ChatGPT, call it ChatGPT-5 if you like, can perform this operation intuitively and seamlessly it will be a game changer but not a breakthrough. Apparently according to ChatGPT-4 you can have a game changer not a breakthrough.
The breakthrough, however, is to fundamentally improve AI or LLM itself as mentioned by Stephen Wolfram in his tutorial article on ChatGPT, not merely enhancing its existing operations [2]:
When it comes to training (AKA learning) the different “hardware” of the brain and of current computers (as well as, perhaps, some undeveloped algorithmic ideas) forces ChatGPT to use a strategy that’s probably rather different (and in some ways much less efficient) than the brain. And there’s something else as well: unlike even in typical algorithmic computation, ChatGPT doesn’t internally “have loops” or “recompute on data”. And that inevitably limits its computational capability - even with respect to current computers, but definitely with respect to the brain.
[1] Eight Things to Know about Large Language Models:
https://wp.nyu.edu/arg/eight-things-to-know-about-large-lang...
[2] What Is ChatGPT Doing and Why Does It Work:
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...
Even if GPT-5 will be fatally expensive, but still show such leap as GPT-4, it will be enough to consider OpenAI most successful AI research in human history.
And for about expenses, we just need to wait some 10 years, and GPT-3 class tech will be in every smartphone, and in 30 years, GPT-4 will become also affordable for everyone, GPT-5 will be just ordinary business workhorse.
Edit: Oh I see: https://azure.microsoft.com/en-us/pricing/details/cognitive-...
I wouldn't be surprised, it's Gary Marcus. He's an academic with a lot of prestige to lose if the LLM approach is actually good/useful/insightful, who's only widely publicly known now because AI has had a backlash and media needed an expert to quote for "the other side". Same as the computational linguistics researchers who always get quoted for the same reason.
In general, academics in competing fields whose funding threatens to get tanked or eclipsed by research approaches that work on principles they have fundamental academic disagreements with are going to talk negatively about the tech, no matter what it is achieving. Where I think it can be valuable to listen to them is when they're giving the technology credit - generally they'll only do that when it's something really undeniable or potentially concerning.
I wouldn't bet on loss of perceived expert on the other side status or prestige no matter how poorly his gloomy LLM forecasts fare. It hasn't happened from any of his previous consistently pessimistic predictions around deep learning since the beginning of its resurgence (which has clearly far surpassed its state in 2012, even if currently overhyped). To quote Gary Marcus quoting Gary Marcus:
> Yet deep learning may well be approaching a wall, much as I anticipated earlier, at beginning of the resurgence (Marcus, 2012)
I don't know how right or wrong he is - none of us do. That's why it's all still being debated.
The one thing I know is that we can only truly understand a topic by fully understanding arguments for and against all the claims. I also know the pro-LLM set have way more money (double-digit billions as we saw just this week) and credibility to lose over this topic than Gary Marcus does.
Q is an evaluation of how good a state is assuming the agent acts optimally going forward. In Chess for example, sacrificing a rook to capture a queen would have a high Q value. Sacing a rook is bad, but gaining the queen later is better. Q values are supposed to see past immediate consequences and reflect the long term consequences.
How do we find a Q value though? We can look several steps ahead, or we can look to the end of the game, etc. These aren't real Q values though, because optimal actions probably weren't taken. We can use Bellman equations, which roughly update one Q value by a small amount to the maximum of the next possible states, which results in the highest Q values gradually flowing backward into states that can lead to good outcomes.
I'm trying to think how this would apply to LLMs though. Where do the basic values come from? In Chess the basic values are the number of pieces, or whether or not a side ultimately wins. What are the basic values in a LLM? What is the Q agent aiming to do?
Usually in transformer models[1], for each attention head there are 3 vectors of weights, known as q(uery), k(ey) and v(alue). I was assuming that the q in q* applied to the q vector, so q-learning is training this vector. In a transformer model you don't have an objective function for any sort of state evaluation so it can't be the Q you're thinking of.
If they've done something funky with the q vector that could indeed be a breakthrough since a lot of people feel that we are sort of running out of juice with scaling transformers as-is and need a new architecture to really have a step change in capability. That's pure speculation on my part though.
[1] Here's Vaswani et al, the paper that first set out the transformer architecture https://arxiv.org/abs/1706.03762
The industry has been focusing hard on 'System Two' approaches to augment our largely 'System One'-style models, where optimal decision policies make Q-learning a natural approach. The recent unlock here might be related to using the neural nets' general intelligence/flexibility to better broadly estimate their own states/rewards/actions (like humans). [EDIT: To be clear, the Q-learning stuff is my own speculation, whereas the 'System Two' stuff is well known.]
Serendipitously, Karpathy broadly discussed these two issues yesterday! Toward the lecture's end: https://www.youtube.com/watch?v=zjkBMFhNj_g
I think Theta* is the ticket: https://en.wikipedia.org/wiki/Theta*
HN’s software doesn’t recognise an asterisk in a URL as part of the URL
Deleted Comment
The Q-value represents the expected utility or the total reward of taking a certain action in a given state, followed by following an optimal policy thereafter. So, yes, it does evaluate how "good" or beneficial a state is, but specifically in conjunction with a chosen action and under the assumption that the agent will act optimally in the future.
So it is conditioned on state and action, not just on state.
Deleted Comment
If Altman was referring to Q* in his "pushing the veil of ignorance" comment then Q* is in OAI's top four breakthroughs
It does seem a bit odd to make a grandiose, vague announcement about a breakthrough outside of a major Jobs-esque unveiling. If you had some amazing new capability, what’s the motivation to hint at it publicly versus unveil it with a major announcement/demo? It makes me wonder if it’s an attempt to keep the public in awe with a dangling carrot. I’m also skeptical when companies’ and founders’ reputations seem to greatly outpace actual product deliveries.
That said, based on many comments here the author is a skeptic to the point that people don't take him seriously
The "AI effect"[1] describes how the field tends to get left with only the cutting-edge research.
[0]: http://ai.stanford.edu/~nilsson/OnlinePubs-Nils/PublishedPap...
[1]: https://en.wikipedia.org/wiki/AI_effect
As pointed out by techbro92 [3] the authors do not seem to be affiliated with OpenAI, but AFAIK nobody claimed that OpenAI invented Q*. It's not hard to imagine OpenAI trying it out with their Rubik-solving robot hand and then starting to think about other things it might be applied to.
[1] https://news.ycombinator.com/item?id=38395959
[2] https://arxiv.org/abs/2102.04518
[3] https://news.ycombinator.com/item?id=38398243
(That said I was and I suppose am a bit of a sceptic because of that. But mainly I think because to me it's more useful to call a spade a spade and say we're doing some advanced blah stats to blah. Helps explain, onboard, and motivate the next generation (who likes stats at school? But if that's how AI works, oh suddenly it's cool). Demystify it. If you're just searching over Levenshtein distance [to a technical audience] say that, don't start waffling about 'AI'.)