Readit News logoReadit News
highfrequency · 17 days ago
It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together. Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

As a user, it feels like the race has never been as close as it is now. Perhaps dumb to extrapolate, but it makes me lean more skeptical about the hard take-off / winner-take-all mental model that has been pushed.

Would be curious to hear the take of a researcher at one of these firms - do you expect the AI offerings across competitors to become more competitive and clustered over the next few years, or less so?

jablongo · 16 days ago
It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better. I don't think thats what's going on here, but we should be ready for it. For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually? To the extent that our AGI/SI metrics are based on human judgement the cluster effect that they create may be an illusion.
Wowfunhappy · 16 days ago
> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

No, but I wouldn't be able to tell you what the player did wrong in general.

By contrast, the shortcomings of today's LLMs seem pretty obvious to me.

ohelno · 16 days ago
> it may be very difficult for us as users to discern which model is better

But one thing will stay consistent with LLMs for some time to come: they are programmed to produce output that looks acceptable, but they all unintentionally tend toward deception. You can iterate on that over and over, but there will always be some point where it will fail, and the weight of that failure will only increase as it deceives better.

Some things that seemed safe enough: Hindenburg, Titanic, Deepwater Horizon, Chernobyl, Challenger, Fukushima, Boeing 737 MAX.

torginus · 16 days ago
which is a thing with humans as well - I had a colleague with certified 150+ IQ, and other than moments of scary smart insight, he was not a superman or anything, he was surprisingly ordinary. Not to bring him down, he was a great guy, but I'd argue many of his good qualities had nothing to do with how smart he was.
blackkettle · 16 days ago
It's even more difficult because, while all the benchmarks provide some kind of 'averaged' performance metric for comparison, in my experience most users have pretty specific regular use cases, and pretty specific personal background knowledge. For instance I have a background in ML, 15 years experience in full stack programming, and primarily use LLMs for generating interface prototypes for new product concepts. We use a lot of react and chakraui for that, and I consistently get the best results out of Gemini pro for that. I tried all the available options and settled on that as the best for me and my use case. It's not the best for marketing boilerplate, or probably a million other use cases, but for me, in this particular niche it's clearly the best. Beyond that the benchmarks are irrelevant.
DoctorOetker · 16 days ago
we could run some tests to first find out if comparative performance tests can be conjured:

one can intentionally use a recent and a much older model to figure out if the tests are reliable, and in which domains it is reliable.

one can compute a models joint probability for a sequence and compare how likely each model finds the same sequence.

we could ask both to start talking about a subject, but alternatingly each can emit a token. look again at how the dumber and smarter models judge the resulting sentence does the smart one tend to pull up the quality of the resulting text, or does it tend to get dragged down more towards the dumber participant?

given enough such tests to "identify the dummy vs smart one" and verifying them on common agreement (as an extreme word2vec vs transformer) to assess the quality of the test, regardless of domain.

on the assumption that such or similar tests allow us to indicate the smarter one, i.e. assuming we find plenty such tests, we can demand model makers publish open weights so that we can publically verify performance agreements.

Another idea is self-consistency tests: a single forward inference of context size say 2048 tokens (just an example) is effectively predicting the conditional 2-gram, 3-gram, 4-gram probabilities on the input tokens. so each output token distribution is predicted on the preceding inputs, so there are 2048 input tokens and 2048 output tokens, the position 1 output token is the predicted token vector (logit vector really) that is estimated to follow the given position 1 input vector, and the position 2 output vector is the prediction following the first 2 input vectors etc. and the last vector is the predicted next token following all the 2048 input tokens. p(t_(i+1) | t_1 =a, t_2=b, ..., t_i=z).

But that is just one way the next token can be predicted using the network: another approach would be to use RMAD gradient descent, but keeping model weights fixed, and only considering the last say 512 input vectors as variable, how well did the last 512 predicted forward prediction output vectors match the gradient descent best joint probability output vectors?

This could be added as a loss term during training as well, as a form of regularization, which turns it into a kind of Energy Based Model roughly.

spot5010 · 16 days ago
My guess is that more than the raw capabilities of a model, users would be drawn more to the model's personality. A "better" model would then be one that can closely adopt the nuances that a user likes. This is a largely uninformed guess, let's see if it holds up well with time.
tbrownaw · 16 days ago
> It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better.

Even if they've saturated the distinguishable quality for tasks they can both do, I'd expect a gap in what tasks they're able to do.

tsunamifury · 16 days ago
This is the F1 vs 911 car problem. A 911 is just as fast as an f1 car to 60 (sometimes even faster) but an f1 is better at super high performance envelope above 150 in tight turns.

An average driver evaluating both would have a very hard time finding the f1s superior utility

flir · 16 days ago
> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

Yes, because I'd get them to play each other?

artursapek · 16 days ago
We’re judging them with benchmarks, not our own intuitions.
jv22222 · 16 days ago
I think Musk puts it well when he says the ultimate test is can they help improve the real world.
andrepd · 16 days ago
I could certainly tell if they played ??-level blunders, which LLMs do all the time.
unsupp0rted · 13 days ago
You don't have to be even good at chess to be able to tell when a game is won or lost, most of the time.

I don't need to understand how the AI made the app I asked for or cured my cancer, but it'll be pretty obvious when the app seems to work and the cancer seems to be gone.

I mean, I want to understand how, but I don't need to understand how, in order to benefit from it. Obviously understanding the details would help me evaluate the quality of the solution, but that's an afterthought.

snthpy · 16 days ago
That's a great point. Thanks.

Dead Comment

somenameforme · 16 days ago
If AGI is ever achieved, it would open the door to recursive self improvement that would presumably rapidly exceed human capability across any and all fields, including AI development. So the AI would be improving itself while simultaneously also making revolutionary breakthroughs in essentially all fields. And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

But I think we're not even on the path to creating AGI. We're creating software that replicate and remix human knowledge at a fixed point in time. And so it's a fixed target that you can't really exceed, which would itself already entail diminishing returns. Pair this with the fact that it's based on neural networks which also invariably reach a point of sharply diminishing returns in essentially every field they're used in, and you have something that looks much closer to what we're doing right now - where all competitors will eventually converge on something largely indistinguishable from each other, in terms of ability.

stoneyhrm1 · 16 days ago
> revolutionary breakthroughs in essentially all field

This doesn't really make sense outside computers. Since AI would be training itself, it needs to have the right answers, but as of now it doesn't really interact with the physical world. The most it could do is write code, and check things that have no room for interpretation, like speed, latency, percentage of errors, exceptions, etc.

But, what other fields would it do this in? How can it makes strives in biology, it can't dissect animals, it can't figure more out about plants that humans feed into the training data. Regarding math, math is human-defined. Humans said "addition does this", "this symbol means that", etc.

I just don't understand how AI could ever surpass anything human known before we live by the rules defined by us.

thinkingtoilet · 16 days ago
>And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

Why would you presume this? I think part of a lot of people's AI skepticism is talk like this. You have no idea. Full stop. Why wouldn't progress be linear? As new breakthroughs come, newer ones will be harder to come by. Perhaps it's exponential. Perhaps it's linear. No one knows.

jltsiren · 16 days ago
There is no particular reason to assume that recursive self-improvement would be rapid.

All the technological revolutions so far have accounted for little more than a 1.5% sustained annual productivity growth. There are always some low-hanging fruit with new technology, but once they have been picked, the effort required for each incremental improvement tends to grow exponentially.

That's my default scenario with AGI as well. After AGI arrives, it will leave humans behind very slowly.

dabockster · 16 days ago
> diminishing returns

I think this is a hard kick below the belt for anyone trying to develop AGI using current computer science.

Current AIs only really generate - no, regenerate text based on their training data. They are only as smart as other data available. Even when an AI "thinks", it's only really still processing existing data rather than making a genuinely new conclusion. It's the best text processor ever created - but it's still just a text processor at its core. And that won't change without more hard computer science being performed by humans.

So yeah, I think we're starting to hit the upper limits of what we can do with Transformers technology. I'd be very surprised if someone achieved "AGI" with current tech. And, if it did get achieved, I wouldn't consider it "production ready" until it didn't need a nuclear reactor to power it.

esafak · 16 days ago
> If AGI is ever achieved, it would open the door to recursive self improvement ...

They are unrelated. All you need is a way for continual improvement without plateauing, and this can start at any level of intelligence. As it did for us; humans were once less intelligent.

Using the flagship to bootstrap the next iteration with synthetic data is standard practice now. This was mentioned in the GPT5 presentation. At the rate things are going I think this will get us to ASI, and it's not going to feel epochal for people who have interacted with existing models, but more of the same. After all, the existing models are already smarter than most humans and most people are taking it in their stride.

The next revolution is going to be embodiment. I hope we have the commonsense to stop there, before instilling agency.

moron4hire · 16 days ago
That's only assuming there are no fundamental limits or major barriers to computation. Back a hundred years ago at the dawn of flight, one could have said a very similar thing about aircraft performance. And for a time in the 1950s, it looked like aircraft speed was growing exponentially over time. But there haven't been any new airspeed records (at least, officially recorded) since 1986, because it turns out going Mach 3+ is fairly dangerous and approaching some rather severe materials and propulsion limitations, making it not at all economical.

I would also not be surprised if the process of developing something comparable to human intelligence, assuming the extreme computation, energy, and materials issues of packing that much computation and energy into a single system could be overcome, the AI also develops something comparable to human desire and/or mental health issues. There is a not-zero chance we could end up with AI that doesn't want to do what we ask it to do or doesn't work all the time because it wants to do other things.

You can't just assume exponential growth is a forgone conclusion.

solumunus · 16 days ago
For some reason people pre suppose super intelligence into AGI. What if AGI had diminishing returns around human level intelligence? They still have to deal with all the same knowledge gaps we have.
aldousd666 · 16 days ago
Those problems aren't just waiting on smarts/intelligence. Those would require experimentation in the real world. You can't solve chemistry by just thinking about it really hard. You still have to do experiments. A super intelligent machine may be better at coming up with experiments to do than we are, but without the right stuff to do them, it can't 'solve' anything of the like.
andsoitis · 16 days ago
> So the AI would be improving itself

Why would the AI want to improve itself? From whence would that self-motivation stem?

throwawayb2025 · 16 days ago
Recursive improvement without any physical change maybe limited. If any physical change like more gpu or different network configuration is required to experiment and again change to learn from it that might not be easy. Convincing human to do on AGI behalf may not be that simple. There might be multiple path to try and teams may not agree with each other. Specially if the cost of trial is high.
ieee2 · 16 days ago
AI can be trained on some special knowledge of person A and another special knowledge of person B. These two persons may never met before and therefore they can not combine their knowledge to get some new knowledge or insight.

AI can do it fine as it knows A and B. And that is knowledge creation.

SkyMarshal · 15 days ago
> But I think we're not even on the path to creating AGI.

It seems like the LLM model will be component of an eventual AGI, it's voice per se, but not its mind. The mind still requires another innovation or breakthrough we haven't seen yet.

FuriouslyAdrift · 16 days ago
Math... lots and lots of math solutions. Like if it could figure out the numerical sign problem, it could quite possibly be able to simulate all of physics.
p0nce · 16 days ago
Well it could also self-improve increasingly slowly.
mycall · 16 days ago
You are missing the point where synthetic data, deterministic tooling (written by AI) and new discoveries by each model generation feeds into the next model. This iteration is the key to going beyond human intelligence.
beeflet · 16 days ago
Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I am not an AI researcher, but I have friends who do work in the field, and they are not worried about LLM-based AGI because of the diminishing returns on results vs amount of training data required. Maybe this is the bottleneck.

Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better. Whereas LLMs tend to regurgitate solutions to solved problems, where the solutions tend to be well-published in training data.

That being said, AGI is not a necessary requirement for AI to be totally world-changing. There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence. Search is one example where the ability to regurgitate knowledge from many domains is desirable

JohnBooty · 16 days ago

    That being said, AGI is not a necessary requirement for AI to be totally world-changing
Yeah. I don't think I actually want AGI? Even setting aside the moral/philosophical/etc "big picture" issues I don't think I even want that from a purely practical standpoint.

I think I want various forms of AI that are more focused on specific domains. I want AI tools, not companions or peers or (gulp) masters.

(Then again, people thought they wanted faster horses before they rolled out the Model T)

novok · 16 days ago
They are moving beyond just big transformer blob LLM text prediction. Mixture of Experts is not preassembled for example, it's something like x empty experts with an empty router and the experts and routing emerges naturally with training, modeling the brain part architecture we see the brain more. There is stuff "Integrated Gated Calculator (IGC)" in Jan 2025 which makes a premade calculator neural network and integrates it directly into the neural network and gets around the entire issue of making LLMs do basic number computation and the clunkiness of generating "run tool tokens". The model naturally learns to use the IGC built into itself because it will always beat any kind of computation memorization in the reward function very quickly.

Models are truly input multimodal now. Feeding an image, feeding audio and feeding text all go into separate input nodes, but it all feeds into the same inner layer set and outputs text. This also mirrors how brains work more as multiple parts integrated in one whole.

Humans in some sense are not empty brains, there is a lot of stuff baked in our DNA and as the brain grows it develops a baked in development program. This is why we need fewer examples and generalize way better.

gunnaraasen · 16 days ago
Seems like the real innovation of LLM-based AI models is the creation of a new human-computer interface.

Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.

In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.

robotnikman · 16 days ago
There is also the fact that AI lacks long term memory like humans do. If you consider context length long term memory, its incredibly short compared to that of a human. Maybe if it reaches into the billions or trillions of tokens in length we might have something comparable, or someone comes up with a new solution of some kind
mikepurvis · 16 days ago
"LLMs tend to regurgitate solutions to solved problems"

People say this, but honestly, it's not really my experience— I've given ChatGPT (and Copilot) genuinely novel coding challenges and they do a very decent job at synthesizing a new thought based on relating it to disparate source examples. Really not that dissimilar to how a human thinks about these things.

rstuart4133 · 16 days ago
> That being said, AGI is not a necessary requirement for AI to be totally world-changing.

Depends on how you define "world changing" I guess, but this world already looks different to the pre-LLM world to me.

Me asking LLM's things instead of consulting the output from other humans now takes up a significant fraction of my day. I don't google near as often, I don't trust any image or video I see as swathes of the creative professions have been replaced by output from LLM's.

It's funny, that final thing is the last thing I would have predicted. I always believed the one thing a machine could not match was human creativity, because the output of machines was always precise, repetitive and reliable. Then LLM's come along, randomly generating every token. Their primary weakness is they neither precise or reliable, but they can turn out an unending stream of unique output.

gaptoothclan · 16 days ago
I remember reading that llm’s have consumed the internet text data, I seem to remember there is an open data set for that too. Potential other sources of data would be images (probably already consumed) videos, YouTube must have such a large set of data to consume, perhaps Facebook or Instagram private content

But even with these it does not feel like AGI, that seems like the fusion reactor 20 years away argument, but instead this is coming in 2 years, but they have not even got the core technology of how to build AGI

topspin · 16 days ago
> Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I think you're on to it. Performance is clustering because a plateau is emerging. Hyper-dimensional search engines are running out of steam, and now we're optimizing.

anon7000 · 16 days ago
True. At a minimum, as long as LLMs don't include some kind of more strict representation of the world, they will fail in a lot of tasks. Hallucinations -- responding with a prediction that doesn't make any sense in the context of the response -- are still a big problem. Because LLMs never really develop rules about the world.

For example, while you can get it to predict good chess moves if you train it on enough chess games, it can't really constrain itself to the rules of chess. (https://garymarcus.substack.com/p/generative-ais-crippling-a...)

onlyrealcuzzo · 16 days ago
> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

Aren't we the summation of intelligence from quintillions of beings over hundreds of millions of years?

Have LLMs really had more data?

t0lo · 16 days ago
To be smarter than human intelligence you need smarter than human training data. Humans already innately know right and wrong a lot of the time so that doesn't leave much room.
wyager · 16 days ago
> a stochastic model for predicting text

It's fascinating to me that so many people seem totally unable to separate the training environment from the final product

FollowingTheDao · 16 days ago
The bottleneck is nothing to do with money, it’s the fact that they’re using the empty neuron theory to try to mimic human consciousness and that’s not how it works. Just look up Microtubules and consciousness, and you’ll get a better idea for what I’m talking about.

These AI computers aren’t thinking, they are just repeating.

Mistletoe · 16 days ago
What are the AI/ML/SL applications that could be more impactful than artificial general intelligence?
timeon · 16 days ago
> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

That is because with LLMs there is no intelligence. It is Artificial Knowledge. AK not AI. So AI is AGI. Not that it matters for user-cases we have, but marketing needs 'AI' because that is what we were expecting for decades. So yeah, I also do not thing we will have AGI from LLMs - nor does it matter for what we are using it.

justcallmejm · 16 days ago
It is definitively not possible. But the frontier models are no longer “just” LLMs, either. They are neurosymbolic systems (an LLM using tools); they just don’t say it transparently because it’s not a convenient narrative that intelligence comes from something outside the model, rather than from endless scaling.

At Aloe, we are model agnostic and outperforming frontier models. It’s the anrchitecture around the LLM that makes the difference. For instance our system using Gemini can do things that Gemini can’t do on its own. All an LLM will ever do is hallucinate. If you want something with human-like general intelligence, keep looking beyond LLMs.

GolDDranks · 16 days ago
I think it's very fortunate, because I used to be an AI doomer. I still kinda am, but at least I'm now about 70% convinced that the current technological paradigm is not going to lead us to a short-term AI apocalypse.

The fortunate thing is that we managed to invent an AI that is good at _copying us_ instead of being a truly maveric agent, which kinda limits it to the "average human" output.

However, I still think that all the doomer arguments are valid, in principle. We very well may be doomed in our lifetimes, so we should take the threat very seriously.

margalabargala · 16 days ago
It won't lead us to an apocalypse apocalypse, but it may well lead us to an economic crisis.
baxtr · 16 days ago
The AI dooming was never a thing for me. And I still don’t get it.

I don’t see anything that would even point into that direction.

Curious to understand where these thoughts are coming from

hattmall · 16 days ago
I don't understand the doomer mindset. Like what is it that you think AI is going to do or be capable of doing that's so bad?
KoolKat23 · 16 days ago
The only thing holding it back is lack of compute, and a lack of live world interface.
makin · 16 days ago
Companies are collections of people, and these companies keep losing key developers to the others, I think this is why the clusters happen. OpenAI is now resorting to giving million dollar bonuses to every employee just to try to keep them long term.
caconym_ · 16 days ago
If there was any indication of a hard takeoff being even slightly imminent, I really don't think key employees of the company where that was happening would be jumping ship. The amounts of money flying around are direct evidence of how desperate everybody involved is to be in the right place when (so they imagine) that takeoff happens.
kevinventullo · 16 days ago
Key developers being the leading term doesn’t exactly help the AGI narrative either.
procaryote · 16 days ago
So they're struggling to solve the alignment problem even for their employees?
indigodaddy · 16 days ago
Even to just a random sysops person?
tsunamifury · 16 days ago
No the core technology is reaching its limit already and now it needs to Proliferate into features and applications to sell.

This isn’t rocket science.

bloqs · 16 days ago
that kid at meta negotiated 250m
EthanHeilman · 16 days ago
> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This seems to be a result of using overly simplistic models of progress. A company makes a breakthrough, the next breakthrough requires exploring many more paths. It is much easier to catch up than find a breakthrough. Even if you get lucky and find the next breakthrough before everyone catches up, they will probably catch up before you find the breakthrough after that. You only have someone run away if each time you make a breakthrough, it is easier to make the next breakthrough than to catch up.

Consider the following game:

1. N parties take turns rolling a D20. If anyone rolls 20, they get 1 point.

2. If any party is 1 or more points behind, they get only need to roll a 19 or higher to get one point. That is being behind gives you a slight advantage in catching up.

While points accumulate, most of the players end up with the same score.

I ran a simulation of this game for 10,000 turns with 5 players:

Game 1: [852, 851, 851, 851, 851]

Game 2: [827, 825, 827, 826, 826]

Game 3: [827, 822, 827, 827, 826]

Game 4: [864, 863, 860, 863, 863]

Game 5: [831, 828, 836, 833, 834]

alexey-salmin · 16 days ago
Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop. Hence the expected exponential explosion in power.

But yes, so far it feels like we are in the latter stages of the innovation S-curve for transformer-based architectures. The exponent may be out there but it probably requires jumping onto a new S-curve.

Sankozi · 16 days ago
You are forgetting that we are talking about AI. That AI will be used to speed up progress on making next, better AI that will be used to speed up progress on making next, better AI that ...
nerdix · 16 days ago
Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized.

I think it's likely that we will eventually we hit a point of diminishing returns where the performance is good enough and marginal performance improvements aren't worth the high cost.

And over time, many models will reach "good enough" levels of performance including models that are open weight. And given even more time, these open weight models will be runnable on consumer level hardware. Eventually, they'll be runnable on super cheap consumer hardware (something more akin to a NPU than a $2000 RTX 5090). So your laptop in 2035 with specialize AI cores and 1TB of LPDDR10 ram is running GPT-7 level models without breaking a sweat. Maybe GPT-10 can solve some obscure math problem that your model can't but does it even matter? Would you pay for GPT-10 when running a GPT-7 level model does everything you need and is practically free?

The cloud providers will make money because there will still be a need for companies to host the models in a secure and reliable way. But a company whose main business strategy is developing the model? I'm not sure they will last without finding another way to add value.

joelthelion · 16 days ago
> Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized

This begs the question, why then do AI companies have these insane valuations? Do investorsknow something that we don't?

hnlmorg · 16 days ago
The reason AGI would create a singularity is because of its ability to self learn.

Presently we are still a long way from that. In my opinion we at least are as far away from AGI as 1970s mainframes were from LLMs.

I really don’t expect to see AGI in my lifetime.

adastra22 · 16 days ago
That is already happening. These labs are writing next gen models using next gen models, with greater levels of autonomy. That doesn’t get the hard takeoff people talk about because those hypotheticals don’t consider sources of error, noise, and drift.
mmcconnell1618 · 16 days ago
Self-learning opens new training opportunities but not at the scale or speed of current training. The world only operates at 1x speed. Today's models have been trained on written and visual content created by billions of humans over thousands of years.

You can only experience the world in one place in real time. Even if you networked a bunch of "experiencers" together to gather real time data from many places at the same time, you would need a way to learn and train on that data in real time that could incorporate all the simultaneous inputs. I don't see that capability happening anytime soon.

russellbeattie · 16 days ago
This is the key - right now each new model has had countless resources dedicated to training, then they are more or less set in stone until the next update.

These big models don't dynamically update as days pass by - they don't learn. A personal assistant service may be able to mimic learning by creating a database of your data or preferences, but your usage isn't baked back into the big underlying model permanently.

I don't agree with "in our lifetimes", but the difference between training and learning is the bright red line. Until there's a model which is able to continually update itself, it's not AGI.

My guess is that this will require both more powerful hardware and a few more software innovations. But it'll happen.

hollownobody · 16 days ago
hathawsh · 16 days ago
There are areas where we seem to be much closer to AGI than most people realize. AGI for software development, in particular, seems incredibly close. For example, Claude Code has bewildering capabilities that feel like magic. Mix it with a team of other capable development-oriented AIs and you might be able to build AI software that builds better AI software, all by itself.
layer8 · 16 days ago
The ability to self-learn is necessary, but not necessarily sufficient. We don’t have much of an understanding of the intelligence landscape beyond human-level intelligence, or even besides it. There may be other constraints and showstoppers, for example related to computability.
Muromec · 16 days ago
We have an ability to self learn right now, but we stil suck at basics
runarberg · 16 days ago
I feel like technological singularity has been pretty solidly ruled as junk science, like cold fusion, Malthusian collapse, or Lynn’s IQ regression. Technologists have made numerous predictions and hypothetical scenarios, non of which have come to fruition, nor does it seem likely at any time in the future.

I think we should be treating AGI like Cold Fusion, phrenology, or even alchemy. It is not science, but science fiction. It is not going to happen and no research into AGI will provide anything of value (except for the grifters pushing the pseudo-science).

Davidzheng · 16 days ago
should be next year in math domain tbh
tedggh · 16 days ago
In my experience and use case Grok is pretty much unusable when working with medium size codebases and systems design. ChatGPT has issues too but at least I have figured out a way around most of them, like asking for a progress and todo summary and uploading a zip file of my codebase to a new chat window say every 100 interactions, because speed degrades and hallucinations increase. Super Grok seems extremely bad at keeping context during very short interactions within a project even when providing it with a strong foundation via instructions. For example if the code name for a system or feature is called Jupiter, Grok will many times start talking about Jupiter the planet.
weego · 16 days ago
I'm still stuck at the bit where just throwing more and more data to make a very complex encyclopedia with an interesting search interface that tricks us into believing it's human-like gets us to AGI when we have no examples and thus no evidence or understanding of where the GI part comes from.

It's all just hyperbole to attract investment and shareholder value and the people peddling the idea of AGI as a tangible possibility are charlatans whose goals are not aligned with whatever people are convincing themselves are the goals.

Thr fact that so many engineers have fallen for it so completely is stunning to me and speaks volumes on the underlying health of our industry.

keernan · 16 days ago
I believe the analogy of a LLM being "a very complex encyclopedia with an interesting search interface" to be spot on.

However, I would not be so dismissive of the value. Many of us are reacting to the complete oversell of 'the encyclopedia' as being 'the eve of AGI' - as rightfully we should. But, in doing so, I believe it would be a mistake to overlook the incredible impact - and economic displacement - of having an encyclopedia comprised of all the knowledge of mankind that has "an interesting search interface" that is capable of enabling humans to use the interface to manipulate/detect connections between all that data.

habinero · 16 days ago
Me too. Some of them are frauds, but most of the weird AI-as-messiah people really believe it as far as I can tell.

The tech is neat and it can do some neat things but...it's a bullshit machine fueled by a bullshit machine hype bubble. I do not get it.

Sharlin · 16 days ago
> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Yes. And the fact they're instead clustering simply indicates that they're nowhere near AGI and are hitting diminishing returns, as they've been doing for a long time already. This should be obvious to everyone. I'm fairly sure that none of these companies has been able to use their models as a force multiplier in state-of-the-art AI research. At least not beyond a 1+ε factor. Fuck, they're just barely a force multiplier in mundane coding tasks.

ricardobayes · 16 days ago
AGI in 5/10 years is similar to "we won't have steering wheels in cars" or "we'll be asleep driving" in 5/10 years. Remember that? What happened to that? It looked so promising.
ralfd · 16 days ago
> "we'll be asleep driving" in 5/10 years. Remember that? What happened to that?

https://www.youtube.com/shorts/dLCEUSXVKAA

RealityVoid · 16 days ago
I mean, in certain US cities you can take a waymo right now. It seems that adage where we overestimate change in the short term and underestimate change in the long term fits right in here.
baxtr · 16 days ago
I have been saying this before: S-curves look a lot like exponential curves in the beginning.

Thus, it’s easy to mistake one for the other - at least initially.

jjk166 · 16 days ago
Looks like a lot of players getting closer and closer to an asymptotic limit. Initially small changes lead to big improvements causing a firm to race ahead, as they go forward performance gains from innovation become both more marginal and harder to find, nonetheless keep. I would expect them all to eventually reach the same point where they are squeezing the most possible out of an AI under the current paradigm, barring a paradigm shifting discovery before that asymptote is reached.
physix · 16 days ago
For those who happen to have a subscription to The Economist, there is a very interesting Money Talks podcast where they interview Anthropic's boss Dario Amodei[1].

There were two interesting takeaways about AGI:

1. Dario makes the remark that the term AGI/ASI is very misleading and dangerous. These terms are ill defined and it's more useful to understand that the capabilities are simply growing exponentially at the moment. If you extrapolate that, he thinks it may just "eat the majority of the economy". I don't know if this is self-serving hype, and it's not clear where we will end up with all this, but it will be disruptive, no matter what.

2. The Economist moderators however note towards the end that this industry may well tend toward commoditization. At the moment these companies produce models that people want but others can't make. But as the chip making starts to hits its limits and the information space becomes completely harvested, capability-growth might taper off, and others will catch up. The quasi-monopoly profit potentials melting away.

Putting that together, I think that although the cognitive capabilities will most likely continue to accelerate, albeit not necessarily along the lines of AGI, the economics of all this will probably not lead to a winner takes all.

[1] https://www.economist.com/podcasts/2025/07/31/artificial-int...

didibus · 16 days ago
There's already so many comparable models, and even local models are starting to approach the performance of the bigger server models.

I also feel like, it's stopped being exponential already. I mean last few releases we've only seen marginal improvements. Even this release feels marginal, I'd say it feels more like a linear improvement.

That said, we could see a winner take all due to the high cost of copying. I do think we're already approaching something where it's mostly price and who released their models last. But the cost to train is huge, and at some point it won't make sense and maybe we'll be left with 2 big players.

nopinsight · 16 days ago
1. FWIW, I watched clips from several of Dario’s interviews. His expressions and body language convey sincere concerns.

2. Commoditization can be averted with access to proprietary data. This is why all of ChatGPT, Claude, and Gemini push for agents and permissions to access your private data sources now. They will not need to train on your data directly. Just adapting the models to work better with real-world, proprietary data will yield a powerful advantage over time.

Also, the current training paradigm utilizes RL much more extensively than in previous years and can help models to specialize in chosen domains.

SecretDreams · 16 days ago
It's insane to me that anyone doesn't think the end game of this is commoditization.
j_timberlake · 16 days ago
I think you're reading way too much into OpenAI bungling its 15-month product lead, but also the whole "1 AGI company will take off" prediction is bad anyway, because it assumes governments would just let that happen. Which they wouldn't, unless the company is really really sneaky or superintelligence happens in the blink of an eye.
torginus · 16 days ago
I think OpenAI has committed hard onto the 'product company' path, and will have a tough time going back to interesting science experiments that may and may not work, but are necessary for progress.
jacquesm · 16 days ago
Governments react at a glacial pace to new technological developments. They wouldn't so much as 'let it happen' as that it had happened and they simply never noticed it until it was too late. If you are betting on the government having your back in this then I think you may end up disappointed.
knodi123 · 16 days ago
* or governments fail to look far enough ahead, due to a bunch of small-minded short-sighted greedy petty fools.

Seriously, our government just announced it's slashing half a billion dollars in vaccine research because "vaccines are deadly and ineffective", and it fired a chief statistician because the president didn't like the numbers he calculated, and it ordered the destruction of two expensive satellites because they can observe politically inconvenient climate change. THOSE are the people you are trusting to keep an eye on the pace of development inside of private, secretive AGI companies?

highfrequency · 16 days ago
> OpenAI bungling its 15-month product lead

Do you mean from ChatGPT launch or o1 launch? Curious to get your take on how they bungled the lead and what they could have done differently to preserve it. Not having thought about it too much, it seems that with the combo of 1) massive hype required for fundraising, and 2) the fact that their product can be basically reverse engineered by training a model on its curated output, it would have been near impossible to maintain a large lead.

TheoGone · 16 days ago
LLMs are good at mimicking human intuition. Still sucks at deep thinking.

LLMs PATTERN MATCH well. Good at "fast" System 1 thinking, instantly generating intuitive, fluent responses.

LLMs are good at mimicking logic, not real reasoning. Simulate "slow," deliberate System 2 thinking when prompted to work step-by-step.

The core of an LLM is not understanding but just predicting the next most word in a sequence.

LLMs are good at both associative brainstorming (System 1) and creating works within a defined structure, like a poem (System 2).

Reasoning is the Achilles heel rn. AN LLM's logic can SEEM plausible, it's based on CORRELATION, NOT deductive reasoning.

Davidzheng · 16 days ago
correlation between text can implement any algorithm, it is just the architecture which it's built on. It's like saying vacuum tube computers can't reason bc it's just air not reasoning. What the architecture is doesn't matter. It's capable of expressing reasoning as it is capable of expression any program. In fact you can easily think of a turing machine and also any markov chain as a correlation function between two states which have joint distribution exactly at places where the second state is the next state of the first state.
noduerme · 16 days ago
Here's a pessimistic view: A hard take-off at this point might be entirely possible, but it would be like a small country with nuclear weapons launching an attack on a much more developed country without them. E.g. North Korea attacking South Korea. In such a situation an aggressor would wait to reveal anything until they had the power to obliterate everything ten times over.

If I were working in a job right now where I could see and guide and retrain these models daily, and realized I had a weapon of mass destruction on my hands that could War Games the Pentagon, I'd probably walk my discoveries back too. Knowing that an unbounded number of parallel discoveries were taking place.

It won't take AGI to take down our fragile democratic civilization premised on an informed electorate making decisions in their own interests. A flood of regurgitated LLM garbage is sufficient for that. But a scorched earth attack by AGI? Whoever has that horse in their stable will absolutely keep it locked up until the moment it's released.

jacquesm · 16 days ago
Pessimistic is just another way to spell 'realistic' in this case. None of these actors are doing it for the 'good of the world' despite their aggressive claims to the contrary.
kristianc · 16 days ago
What I'm seeing is that as we get closer to supposed AGI, the models themsleves are getting less and less general. They're getting in fact more specific and clustered around high value use cases. It's kind of hard to see in this context what AGI is meant to mean.
lamontcg · 16 days ago
> they can all basically solve moderately challenging math and coding problems

Yesterday, Claude Opus 4.1 failed in trying to figure out that `-(1-alpha)` or `-1+alpha` is the same as `alpha-1`.

We are still a little bit away from AGI.

markasoftware · 16 days ago
this is what i don't get. How can GPT-5 ace obscure AIME problems while simultaneously falling into the trap of the most common fallacy about airfoils (despite there being copious training data calling it out as a fallacy)? And I believe you that in some context it failed to understand this simple rearrangement of terms; there's sometimes basic stuff I ask it that it fails at too.
shruggedatlas · 16 days ago
Is this a specific example from their demo? I just tried it and Opus 4.1 is able to solve it.
dom96 · 16 days ago
It doesn't take a researcher to realise that we have hit a wall and hit it more than a year ago now. The fact all these models are clustering around the same performance proves it.
m4x · 16 days ago
It's quite possible that the models from different companies are clustering together now because we're at a plateau point in model development, and won't see much in terms in further advances until we make the next significant breakthrough.

I don't think this has anything to do with AGI. We aren't at AGI yet. We may be close or we may be a very long way away from AGI. Either way, current models are at a plateau and all the big players have more or less caught up with each other.

kmacdough · 16 days ago
What does AGI mean to you, specifically?

As is, AI is quite intelligent, in that it can process large quantities of diverse unstructured information and build meaningful insights. And that intelligence applies across an incredibly broad set of problems and contexts. Enough that I have a hard time not calling it general. Sure, it has major flaws that are obvious to us and it's much worse at many things we care about. But that's doesn't make it not intelligent or general. If we want to set human intelligence as the baseline, we already have a word for that: superintelligence.

coderatlarge · 16 days ago
while the model companies all compete on the same benchmarks it seems likely their models will all converge towards similar outcomes unless something really unexpected happens in model space around those limit points…
kenny239 · 4 days ago
not a researcher for long enough....but we are witnessing open source effort & Chinese models starting to fall one "level" behind the most advanced models, mainly due to a lack of compute i think...

on the other hand, there are still some flaws regarding GPT-5. for example, when i use it for research it often needs multiple prompts to get the topic i truly want and sometimes it can feed me false information. so the reasoning part is not fully there yet?

ako · 16 days ago
I know there's an official AGI definition, but it seem to me that there's too much focus on the model as the thing where AGI needs to happen. But that is just focusing on knowledge in the brain. No human knows everything. We as humans rely on a ways to discover new knowledge, investigation, writing knowledge down so it can be shared, etc.

Current models, when they apply reasoning, have feedback loops using tools to trial and error, and have a short term memory (context) or multiple short term memories if you use agents, and a long term memory (markdown, rag), they can solve problems that aren't hardcoded in their brain/model. And they can store these solutions in their long term memory for later use. Or for sharing with other LLM based systems.

AGI needs to come from a system that combines LLMs + tools + memory. And i've had situations where it felt like i was working with an AGI. The LLMs seem advanced enough as the kernel for an AGI system.

The real challenge is how are you going to give these AGIs a mission/goal that they can do rather independently and don't need constant hand-holding. How does it know that it's doing the right thing. The focus currently is on writing better specifications, but humans aren't very good at creating specs for things that are uncertain. We also learn from trial and error and this also influences specs.

porphyra · 16 days ago
It seems that the new tricks that people discover to slightly improve the model, be it a new reinforcement learning technique or whatever, get leaked/shared quickly to other companies and there really isn't a big moat. I would have thought that whoever is rich enough to afford tons of compute first would start pulling away from the rest but so far that doesn't seem to be the case --- even smaller players without as much compute are staying in the race.
atleastoptimal · 16 days ago
I think there are two competing factors. On one end, to get the same kind of "increase" in intelligence each generation requires an expontentially higher amount of compute, so while GPT-3 to GPT-4 was a sort of "pure" upgrade by just making it 10x bigger, gradually you lose the ability to just get 10x GPUs for a single model. The hill keeps getting steeper so progress is slower without exponential increases (which is what is happening).

However, I do believe that once the genuine AGI threshold is reached it may cause a change in that rate. My justification is that while current models have gone from a slightly good copywriter in GPT-4 to very good copywriter in GPT-5, they've gone from sub-exceptional in ML research to sub-exceptional in ML research.

The frontier in AI is driven by the top 0.1% of AI researchers. Since improvement in these models is driven partially by the very peaks of intelligence, it won't be until models reach that level where we start to see a new paradigm. Until then it's just scale and throwing whatever works at the GPU and seeing what comes out smarter.

aydyn · 16 days ago
I think this is simply due to the fact that to train an AGI-level AI currently requires almost grid scale amounts of compute. So the current limitation is purely physical hardware. No matter how intelligent GPT-5 is, it can't conjure extra compute out of thin air.

I think you'll see the prophesized exponentiation once AI can start training itself at reasonable scale. Right now its not possible.

caycep · 16 days ago
I feel like the benchmark suites need to include algorithmic efficiency. I.e can this thing solve your complex math or coding problem in 5000 gpus instead of 10000? 500? Maybe just 1 Mac mini?
nomel · 16 days ago
Why? Cost is the only thing anyone will care about.
bmau5 · 16 days ago
The idea is that with AGI it will then be able to self improve orders of magnitude faster than it would if relying on humans for making the advances. It tracks that the improvements are all relatively similar at this point since they're all human-reliant.
Buttons840 · 16 days ago
The idea of singularity--that AI will improve itself--is that it assumes intelligence is an important part of improving AI.

The AIs improve by gradient descent, still the same as ever. It's all basic math and a little calculus, and then making tiny tweaks to improve the model over and over and over.

There's not a lot of room for intelligence to improve upon this. Nobody sits down and thinks really hard, and the result of their intelligent thinking is a better model; no, the models improve because a computer continues doing basic loops over and over and over trillions of times.

That's my impression anyway. Would love to hear contrary views. In what ways can an AI actually improve itself?

mickael-kerjean · 16 days ago
I studied machine learning in 2012, gradient descent wasn't new back then either but it was 5 years before the "attention is all you need" paper. Progress might look continuous overall but if you zoom enough it might be a bit more discrete with breakthrough that must happen to jump the discrete parts, the question to me now is "How many papers like attention is all you need before a singularity?" I don't have that answer but let's not forget, until they released chat gpt, openAI was considered a joke by many people in the field who asserted their approach was a dead end.
tejohnso · 16 days ago
I think the expectation is that it will be very close until one team reaches beyond the threshold. Then even if that team is only one month ahead, they will always be one month ahead in terms of time to catch up, but in terms of performance at a particular time their lead will continue to extend. So users will use the winner's tools, or use tools that are inferior by many orders of magnitude.

This assumes an infinite potential for improvement though. It's also possible that the winner maxes out after threshold day plus one week, and then everyone hits the same limit within a relatively short time.

petralithic · 16 days ago
It's the classic S-curve. A few years ago when we saw ChatGPT come out, we got started on the ramping up part of the curve but now we're on the slowing down part. That's just how technology goes in general.
jboggan · 16 days ago
We are not approaching the Singularity but an Asymptote
jama211 · 16 days ago
Well said. It’s clearly plateauing. It could be a localised plateau, or something more fundamental. Time will tell.
rvnx · 16 days ago
It's a very long presentation just to say that GPT-5 is slightly improved compared to GPT-4o
Lerc · 16 days ago
>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite

That seems hardy surprising considering the condition to receive the benefit has not been met.

The person who lights a campfire first will become warmer than the rest, but while they are trying to light the fire the others are gathering firewood. So while nobody has a fire, those lagging are getting closer to having a fire.

torginus · 16 days ago
My personal belief is that we are moving past the hype and kind of starting to realize the true shape of what (LLM) AI can offer us, which is a darned lot, but still, it only works well when fed the right input and handled right - which is still a learning process ongoing on both sides - AI companies need to learn to train these things into user interaction loops that match people's workflows, and people need to learn how to use these tools better.
jona777than · 16 days ago
You have seemed to pinpoint where I believe a lot of opportunity lies during this era (however long it lasts.) Custom integration of these models into specific workflows of existing companies can make a significant difference in what’s possible for said companies, the smaller more local ones especially. If people can leverage even a small percentage of what these models are capable of, that may be all they need for their use case. In that case, they wouldn’t even need to learn to use these tools, but (much like electricity) they will just plug in or flip on the switch and be in business (no pun intended.)
radu_floricica · 16 days ago
The clustering you see is because they're all optimized for the same benchmarks. In the real world OpenAI is already ahead of the rest, and Grok doesn't even belong in the same group (not that it's not a remarkable achievement to start from scratch and have a working production model in 1-2 years, and integrate it with twitter in a way that works). And Google is Google - kinda hard for them not to be in the top, for now.
andreygrehov · 15 days ago
In my experience, Grok is miles ahead of ChatGPT. I canceled my OpenAI subscription in favor of Grok. I was one of the first OpenAI subscribers.
uoaei · 16 days ago
You can't reach the moon by climbing the tallest tree.

This misunderstanding is nothing more than the classic "logistic curves look like exponential curves at the beginning". All (Transformee-based, feedforward) AI development efforts are plateauing rapidly.

AI engineers know this plateau is there, but of course every AI business has a vested interest in overpromising in order to access more funding from naive investors.

gdiamos · 16 days ago
Scaling laws enabled an investment in capital and GPU R&D to deliver 10,000x faster training.

That took the wold from autocomplete to Claude and GPT.

Another 10,000x would do it again, but who has that kind of money or R&D breakthrough?

The way scaling laws work, 5,000x and 10,000x give a pretty similar result. So why is it surprising that competitors land in the same range? It seems hard enough to beat your competitor by 2x let alone 10,000x

willsmith72 · 16 days ago
But also, AI progress is non-linear. We're more likely to have an AI winter than AGI
brk · 16 days ago
AGI is so far away from happening that it is barely worth discussing at this stage.
lqstuart · 16 days ago
It’s frequently suggested by people with no background and/or a huge financial stake in the field
cchance · 16 days ago
They have to actually reach that threshold, right now their nudging forward catching up to one another, and based on the jumps we've seen the only one actually making huge jumps sadly is Grok, which i'm pretty sure is because they have 0 safety concerns and just run full tilt lol
netcan · 16 days ago
Its certainly an interesting race to watch.

Part of the fun is that predictions get tested on short enough timescales to "experience" in a satisfying way.

Idk where that puts me, in my guess at "hard takeoff." I was reserved/skeptical about hard takeoff all along.

Even if LLMs had improved at a faster rate... I still think bottlenecks are inevitable.

That said... I do expect progress to happen in spurts anyway. It makes sense that companies of similar competence and resources get to a similar place.

The winner take all thing is a little forced. "Race to singularity" is the fun, rhetorical version of the investment case. The implied boring case is facebook, adwords, aws, apple, msft... IE the modern tech sector tends to create singular big winners... and therefore our pre-revenue market cap should be $1trn.

tamimio · 16 days ago
Because AGI is a buzzword to milk more investors' money, it will never happen, and we will only see slight incremental updates or enhancements yet linear after some timr just like literally any tech bubble since dot com to smartphones to blockchain to others.
mritterhoff · 16 days ago
You think AGI is impossible? Why?
strongpigeon · 16 days ago
I think this is because of an expectation of a snowball effect once a model becomes able to improve itself. See talks about the Singularity.

I personally think it's a pretty reductive model for what intelligence is, but a lot of people seem to strongly believe in it.

econ · 16 days ago
People always say that when new technology comes along. Usually the best tech doesn't win. In fact, if you think you can build a company just by having a better offer it's better not to bother with it. There is to much else involved.
morpheos137 · 16 days ago
There is zero reason or evidence to believe AGI is close. In fact it is a good litmus test for someone's human intelligence whether they believe it.

What do you think AGI is?

How do we go from sentence composing chat bots to General Intelligence?

Is it even logical to talk about such a thing as abstract general intelligence when every form of intelligence we see in the real world is applied to specific goals as evolved behavioral technology refined through evolution?

When LLMs start undergoing spontaneous evolution then maybe it is nearer. But now they can't. Also there is so much more to intelligence than language. In fact many animals are shockingly intelligent but they can't regurgitate web scrapings.

quatonion · 16 days ago
I know right, if I didn't know any better one might think they are all customized versions of the same base model.

To be honest that is what you would want if you were digitally transforming the planet with AI.

You would want to start with a core so that all models share similar values in order they don't bicker etc, for negotiations, trade deals, logistics.

Would also save a lot of power so you don't have to train the models again and again, which would be quite laborious and expensive.

Rather each lab would take the current best and perform some tweak or add some magic sauce then feed it back into the master batch assuming it passed muster.

Share the work, globally for a shared global future.

At least that is what I would do.

mizzao · 16 days ago
I recently wrote a little post about this exact idea: https://parsnip.substack.com/p/models-arent-moats
throwmeaway222 · 16 days ago
AGI is either impossible over LLMs or is more of an agentic flow, which means we might already be there, but the LLM is too slow and/or expensive for us to consider AGI feasible over agents.

AGI over LLMs is basically 1 billion tokens for AI to answer the question: how do you feel? and a response of "fine"

Because it would mean it's simulating everything in the world over an agentic flow considering all possible options checking memory checking the weather checking the news... activating emotional agentic subsystems, checking state... saving state...

belter · 16 days ago
Nobody seems to be on the path to AGI as long as the model of today is as good as the model of tomorrow. And as long as there are "releases". You don't release a new human every few months...LLMs are currently frozen sequence predictors whose static weights stop learning after training.

They lack writable long-term memory beyond a context window. They operate without any grounded perception-action loop to test hypotheses. And they possess no executive layer for goal directed planning or self reflection...

Achieving AGI demands continuous online learning with consolidation.

lisperforlife · 16 days ago
I don't think models are fundamentally getting better. What is happening is that we are increasing the training set, so when users use it, they are essentially testing on the training set and find that it fits their data and expectations really well. However, the moat is primarily the training data, and that is very hard to protect as the same data can be synthesized with these models. There is more innovation surrounding serving strategies and infrastructure than in the fundamental model architectures.
SkyMarshal · 15 days ago
The inflection point is recursive self-improvement. Once an AI achieves that, and I mean really achieves it - where it can start developing and deploying novel solutions to deep problems that currently bottleneck its own capabilities - that's where one would suddenly leap out in front of the pack and then begin extending its lead. Nobody's there yet though, so their performance is clustering around an asymptotic limit of what LLMs are capable of.
dvfjsdhgfv · 16 days ago
> It's frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This argument has so many weak points it deserves a separate article.

Cthulhu_ · 16 days ago
> Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

I wonder if that's because they have a lot of overlap in learning sets, algorithms used, but more importantly, whether they use the same benchmarks and optimize for them.

As the saying goes, once a metric (or benchmark score in this case) becomes a target, it ceases to be a valuable metric.

Deleted Comment

jeffnappi · 16 days ago
We have no idea what AGI might look like, for example entirely possible that if/when that threshold is reached it will be power/compute constrained in such a way that it's impact is softened. My expectation is that open models will eventually meet or exceed the capability of proprietary models and to a degree that has already happened.

It's the systems around the models where the proprietary value lies.

logicchains · 16 days ago
>It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together

It's natural if you extrapolate from training loss curves; a training process with continually diminishing returns to more training/data is generally not something that suddenly starts producing exponentially bigger improvements.

rco8786 · 16 days ago
They’re all clustered together because they’re asymptotically approaching the same local maxima, not getting closer to anything resembling “AGI”
nextlevelwizard · 16 days ago
Is it?

Nothing we have is anywhere near AGI and as models age others can copy them.

I personally think we are closing the end of improvement for LLMs with current methods. We have consumed all of the readily available data already, so there is no more good quality training material left. We either need new novel approaches or hope that if enough compute is thrown at training actual intelligence will spontaneously emerge.

flockonus · 16 days ago
If we're focusing on fast take-off scenario, this isn't a good trend to focus on.

SGI would be self-improving to some function with a shape close to linear based on the amount of time & resources. That's almost exclusively dependent on the software design, as currently transformers have shown to hit a wall at logarithmic progression x resources.

In other words, no, it has little to do with the commercial race.

babypuncher · 16 days ago
I would argue that this is because we are reaching the practical limits of this technology and AGI isn't nearly as close as people thought.
malshe · 16 days ago
> as time goes on and the models get better, the performance of the different company's gets clustered closer together

This could be partly due to normative isomorphism[1] according to the institutional theory. There is also a lot of movement of the same folks between these companies.

[1] https://youtu.be/VvaAnva109s

ants_everywhere · 16 days ago
The race has always been very close IMO. What Google had internally before ChatGPT first came out was mind blowing. ChatGPT was a let down comparatively (to me personally anyway).

Since then they've been about neck and neck with some models making different tradeoffs.

Nobody needs to reach AGI to take off. They just need to bankrupt their competitors since they're all spending so much money.

TheoGone · 10 days ago
Part of it is the top LLM companies (OpenAI, Mistral) all copy and over train, often against e.g. Claude's or DeepSeek's TOS, on each other's models.
general1726 · 16 days ago
Because they are hitting Compute Efficient Frontier. Models can't be much bigger, there is no more original data on the internet, so all models will eventually cluster to similar CEF as was described in this video 10 months ago

https://www.youtube.com/watch?v=5eqRuVp65eY

fdsjgfklsfd · 16 days ago
I think they're just reaching the limits of this architecture and when a new type is invented it will be a much bigger step.
hodgehog11 · 16 days ago
Working in the theory, I can say this is incredibly unlikely. At scale, once appropriately trained, all architectures begin to converge in performance.

It's not architectures that matter anymore, it's unlocking new objectives and modalities that open another axis to scale on.

koonsolo · 16 days ago
This confirms my suspicion that we are not at the exponential part of the curve, but the flattening one. It's easier to stay close to your competitors when everyone is at the flat curve of the innovation.

The improvements they make are marginal. How long until the next AI breakthrough? Who can tell? Because last time it took decenia.

chasd00 · 16 days ago
I think the breakthroughs now will be the application of LLMs to the rest of the world. Discovering use cases where LLMs really shine and applying them while learning and sharing the use cases where they do not.
causality0 · 16 days ago
Mental-modeling is one of the huge gaps in AI performance right now in my opinion. I could describe in detail a very strange object or situation to a human being with a pen and paper and then ask them questions about it and expect answers that meet all my described constraints. AI just isn't good for that yet.
xpe · 16 days ago
> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

That's only one part of it. Some forecasters put probabilities on each of the four quadrants in the takeoff speed (fast or slow) vs. power distribution (unipolar or multipolar) table.

neehao · 11 days ago
three points: 1. i have often wondered about whether rapid tech. progress makes underinvestment more likely.

2. ben evans frequently makes fun of the business value. pretty clear a lot of the models are commodotized.

3. strategically, the winners are platforms where the data are. if you have data in azure, that's where you will use your models. exclusive licensing could pull people to your cloud from on prem. so some gains may go to those companies ...

williamtrask · 16 days ago
Breakthroughs usually require a step-function change in data or compute. All the firms have proportional amounts. Next big jump in data is probably private data (either via de-siloing or robotics or both). Next big jump in compute is probably either analog computing or quantum. Until then... here we are.
FiniteIntegral · 16 days ago
I think part of this is due to the AI craze no longer being in the wildest west possible. Investors, or at least heads of companies believe in this as a viable economic engine so they are properly investing in what's there. Or at least, the hype hasn't slapped them in the face just yet.
germandiago · 16 days ago
Is AGI even possible? I am skeptical of that. I think they can get really good at many tasks and when used by a human expert in a field you can save lots of time and supervise and change things here and there, like sculpting.

But I doubt we will ever see a fully autonomous, reliable AGI system.

xedrac · 16 days ago
Ultimately, what drives human creativity? I'd say it's at least partially rooted in emotion and desire. Desire to live more comfortably; fear of failure or death; desire for power/influence, etc... AI is void of these things, and thus I believe we will never truly reach AGI.
Zambyte · 16 days ago
No, AGI is not possible. It is perpetually defined as just beyond current capabilities.
citizenpaul · 16 days ago
Even at the beginning of the year people were still going crazy over new model releases. Now the various model update pages are starting to average times in the months since their last update rather than days/weeks. This is across the board. Not limited to a single model.
johnnienaked · 16 days ago
These companies are racing headlong into competitive equilibrium for a product yet to be identified.
dmezzetti · 16 days ago
LLMs are basically all the same at this point. The margins are razor thin.

The real take-off / winner-take-all potential is in retrieval and knowing how to provide the best possible data to the LLM. That strategy will work regardless of the model.

jasonwilk · 16 days ago
How marginally better was Google than Yahoo when debuted? If one can develop AGI first within X timeline ahead of competitors, that alone could develop a moat for a mass market consumer product even if others get to parity .
smiley1437 · 16 days ago
Google was not marginally better Yahoo, their implementation of Markov chains in the PageRank algorithm was significantly better than Yahoo or any other contemporary search engine.

It's not obvious if a similar breakthrough could occur in AI

sylware · 16 days ago
LLMs won't probably be the models for "super intelligence".

But nowdays, how corpos can "justify" their R&D to spend gigantic amount of resources (time + hardware + energy) in models which are not LLMs?

verytrivial · 16 days ago
Well, it is perhaps frequently suggested by those Ai firms raising capital that once one of the Ai companies reaches an AGI threshhold ... It as rallying call. "Place your bets, gentlemen!"
TheoGone · 11 days ago
Part of it is they all copy and over train, often against the TOS, on each other's models.
darepublic · 16 days ago
What is the AGI threshold? That the model can manage its own self improvement better than humans can? Then the roles will be reversed -- LLM prompting the meat machines to pave its way.
mirekrusin · 16 days ago
Diversity where new model release takes the crown until next release is healthy. Shame only US companies seem to be doing it, hopefully this will change as the rest is not far off.
wouldbecouldbe · 16 days ago
It's all based on the theory of singularity. Where the AI can start trainig & relearning itself. But it looks like that's not possible with the current techniques.
menzoic · 16 days ago
The idea is that AGI will be able to self improve at an exponential rate. This is where the idea of take off comes from. That self improvement part isn’t happening today.
42lux · 16 days ago
If one achieves AGI and releases it everyone has AGI...
torginus · 16 days ago
Honestly for all the super smart people in the LessWrong singularity crowd, I feel the mental model they apply to the 'singularity' is incredibly dogmatic and crude, with the basic assumption that once a certain threshold is reached by scaling training and compute, we get human or superhuman level intelligence.

Even if we run with the assumption that LLMs can become human-level AI researchers, and are able to devise and run experiments to improve themselves, even then the runaway singularity assumption might not hold. Let's say Company A has this LLM, while company B does not.

- The automated AI researcher, like its human peers, still needs to test the ideas and run experiments, it might happen that testing (meaning compute) is the bottleneck, not the ideas, so Company A has no real advantage.

- It might also happen that AI training has some fundamental compute limit coming from information theory, analogous to the Shannon limit, and once again, more efficient compute can only approach this, not overcome it

m463 · 16 days ago
I kind of (naively?) hope that with robust competition, it will be like airlines or movie companies, where there are lots of players.
lasc4r · 16 days ago
These companies seem to think AGI will come from better LLMs, seems more like an AGI dead end that's plateaued to me.
louismerlin · 16 days ago
We joked yesterday with a colleague that it feels like the top AI companies are using the same white label backend.
eldenring · 16 days ago
A more powerful ASI, the market, is keeping everything in check. Meta's 10 figure offers are an example of this.
hoppp · 16 days ago
AGI will more probably come from google deepmind with a genie model that looks like the matrix moves already
klik99 · 16 days ago
I’ve been saying for a while if AGI is possible it’s going to take another innovation and the transformer / LLM paradigm will plateau, and innovations are hard to time. I used to get downvoted for saying that years ago and now more people are realizing it. LLMs are awesome but there is a limit, most of the interesting things in the next years will be bolting more functionality and agent stuff, introspection like Anthropic is working on and smaller, less compute hungry specialized models. There’s still a lot to explore in this paradigm, but we’re getting diminishing returns on newer models, especially when you factor in cost
BizarroLand · 16 days ago
I bet that it will only happen when the ability to process and concrete new information into its training model without retraining the entire model is standard, AND when multiple AIs with slightly different datasets are set to work together to create a consensus response approach.

It's probably never going to work with a single process without consuming the resources of the entire planet to run that process on.

zaphirplane · 16 days ago
Cats and dogs kind of also cluster together with a couple of exceptions relative to humans ;)
coldtea · 16 days ago
>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Both the AGI threshold with LLM architecture, and the idea of self-advancing AI, is pie in the sky, at least for now. These are myths of the rationalist cult.

We'd more likely see reduced returns and smaller jumps between version updates, plus regression from all the LLM produced slop that will be part of the future data.

de6u99er · 16 days ago
This is just more of the same. My guts tell me Deepmind will crack AGI.
jtfrench · 16 days ago
My gut says similar. They've been on a roll. Genie 3 looks pretty wild.
felineflock · 16 days ago
Plot twist - once GPT reached AGI, this is exactly the strategy chosen for self-preservation. Appear to not lead by too much, only enough to make everyone think we're in a close race, play dumb when needed.

Meanwhile, keep all relevant preparations in secret...

jjk166 · 16 days ago
“If the humans see me actually doing my job, it helps keep suspicions from forming about faulty governor modules.”
grey-area · 16 days ago
Perhaps they’ve just reached the limit of what LLMs can achieve?
m3kw9 · 16 days ago
Because it hasn’t taken off yet as they all get to catch up
newsclues · 16 days ago
We don’t seem to be closer to AGI however.
KoolKat23 · 16 days ago
In my opinion, it'll mirror the human world, there is place for multiple different intelligent models. Each with their own slightly different strengths/personalities. I mean there are plenty of humans that can do the same task but at the upper tier, multiple smart humans working together are needed to solve problems as they bring something different to the table. I don't see why this won't be the case with super intelligence at the cutting edge. A little bit of randomness and slightly different point of view makes a difference. The exact same two models doesn't help as one would already have thought of what the other was thinking already
aldousd666 · 16 days ago
so everyone is saying 'This can't be AGI because it isn't recursively self improving itself' or 'we haven't yet solved all the worlds chemistry and science yet'.. but they're missing the point. Those problems aren't just waiting for humans to have more brain power. We actually have to do the experiments using real physical resources that aren't available to any models. So, while I don't believe we have necessarily reached AGI yet, the 'lack of taking over' or 'solving everything' is not evidence for it.
vrighter · 16 days ago
they are improving exponentially... but the exponent is less than 1...
andrepd · 16 days ago
> once one of the AI companies reaches an AGI threshold

Why is this even an axiom, that this has to happen and it's just a matter of time?

I don't see any credible argument for the path LLM -> AGI, in fact given the slowdown in enhancement rate over the past 3 years of LLMs, despite the unprecedented firehose of trillions of dollars being sunk into them, I think it points to the contrary!

jbs789 · 16 days ago
Very well said.
arnorhs · 16 days ago
Meanwhile - I always just find myself arguing with every model while they ruthlessly try to gaslight me into believing whatever they are halucinating.

I have a had a bunch of positive experiences as well, but when it goes bad, it goes so horribly bad and off the rails.

shortrounddev2 · 16 days ago
Maybe because they haven't created an engine for AGI, but a really really impressive bullshit generator.
yieldcrv · 16 days ago
They use each other for synthesizing data sets. The only moat was the initial access to human generated data in hard to reach places. Now they use each other to reach parity for the most part.

I think user experience and pricing models are the best here. Right now everyone’s just passing down costs as they come, no real loss leaders except a free tier. I looked at reviews of some of various wrappers on app stores, people say “I hate that I have to pay for each generation and not know what I’m doing to get”, market would like a service priced very differently. Is it economical? Many will fail, one will succeed. People will copy the model of that one.

Dead Comment

hodgehog11 · 16 days ago
It's still not necessarily wrong, just unlikely. Once these developers start using the model to update itself, beyond an unknown threshold of capability, one model could start to skyrocket in performance above the rest. We're not in that phase yet, but judging from what the devs at the end were saying, we're getting uncomfortably (and irresponsibly) close.
surround · 17 days ago
GPT-5 knowledge cutoff: Sep 30, 2024 (10 months before release).

Compare that to

Gemini 2.5 Pro knowledge cutoff: Jan 2025 (3 months before release)

Claude Opus 4.1: knowledge cutoff: Mar 2025 (4 months before release)

https://platform.openai.com/docs/models/compare

https://deepmind.google/models/gemini/pro/

https://docs.anthropic.com/en/docs/about-claude/models/overv...

asboans · 16 days ago
It would be fun to train an LLM with a knowledge cutoff of 1900 or something
twh270 · 16 days ago
Someone tried this, I saw it one of the Reddit AI subs. They were training a local model on whatever they could find that was written before $cutoffDate.

Found the GitHub: https://github.com/haykgrigo3/TimeCapsuleLLM

ph4evers · 16 days ago
That’s been done to see if it could extrapolate and predict the future. Can’t find the link right now to the paper.
yanis_t · 16 days ago
Not sure we have enough data for any pre-internet date.
artursapek · 16 days ago
That would be hysterical
levocardia · 16 days ago
with web search, is knowledge cutoff really relevant anymore? Or is this more of a comment on how long it took them to do post-training?
mastercheif · 16 days ago
In my experience, web search often tanks the quality of the output.

I don't know if it's because of context clogging or that the model can't tell what's a high quality source from garbage.

I've defaulted to web search off and turn it on via the tools menu as needed.

MisterSandman · 16 days ago
It still is, not all queries trigger web search, and it takes more tokens and time to do research. ChatGPT will confidently give me outdated information, and unless I know it’s wrong and ask it to research, it wouldn’t know it is wrong. Having a more recent knowledge base can be very useful (for example, knowing who the president is without looking it up, making references to newer node versions instead of old ones)
WorldPeas · 16 days ago
The problem, perhaps illusory that it's easy to fix, is that the model will choose solutions that are a year old, e.g. thinking database/logger versions from December '24 are new and usable in a greenfield project despite newer quarterly LTS releases superseding them. I try to avoid humanizing these models, but could it be that in training/posttraining one could make it so the timestamp is fed in via the system prompt and actually respected? I've begged models to choose "new" dependencies after $DATE but they all still snap back to 2024
clickety_clack · 16 days ago
The biggest issue I can think of is code recommendations with out of date versions of packages. Maybe the quality of code has deteriorated in the past year and scraping github is not as useful to them anymore?
seanw265 · 16 days ago
Knowledge cutoff isn’t a big deal for current events. Anything truly recent will have to be fed into the context anyway.

Where it does matter is for code generation. It’s error-prone and inefficient to try teaching a model how to use a new framework version via context alone, especially if the model was trained on an older API surface.

diegocg · 16 days ago
I wonder if it would even be helpful because they avoid the increasing AI content

Deleted Comment

joshuacc · 16 days ago
Still relevant, as it means that a coding agent is more likely to get things right without searching. That saves time, money, and improves accuracy of results.

Deleted Comment

alfalfasprout · 16 days ago
It absolutely is, for example, even in coding where new design patterns or language features aren't easy to leverage.

Web search enables targeted info to be "updated" at query time. But it doesn't get used for every query and you're practically limited in how much you can query.

richardw · 16 days ago
Isn’t this an issue with eg Cloudflare removing a portion of the web? I’m all for it from the perspective of people not having their content repackaged by an LLM, but it means that web search can’t check all sources.
m3kw9 · 16 days ago
Web pages become prompt, so you still need the model to analyze
stevage · 16 days ago
I've been having a lot of issues with chatgpt's knowledge of DuckDb being out of date. It doesn't think DuckDb enforces foreign keys, for instance.
roflyear · 16 days ago
Yes, totally. The model will not know about new versions of libraries, features recently deprecated, etc..
havefunbesafe · 16 days ago
Question: do web search results that GPT kick back get "read" and backpropagated into the model?
bearjaws · 16 days ago
Falling back to web search is a crutch, its slower and often bloats context resulting in worse output.
CharlieDigital · 16 days ago
Yes, because it may not know that it needs to do a web search for the most relevant information.
LeoPanthera · 16 days ago
Gemini does cursory web searches for almost every query, presumably to fill in the gap between the knowledge cutoff and now.
verytrivial · 16 days ago
I had 2.5 Flash refuse to summarise a URL that had today's date encoded in it because "That web page is from the future so may not exist yet or may be missing" or something like that. Amusing.

2.5 Pro went ahead and summarized it (but completely ignored a # reference so summarised the wrong section of a multi-topic page, but that's a different problem.)

mock-possum · 16 days ago
I always pick Gemini if I want more current subjects / info
adhoc_slime · 16 days ago
funny result of this is that GPT5 doesn't understand the modern meaning of Vibe Coding (maximising llm code generation), it thinks it "a state where coding feels effortless, playful, and visually satisfying" and offers more content around adjusting IDE settings, and templating.
archon810 · 16 days ago
And GPT-5 nano and mini cutoff is even earlier - May 30 2024.
nialv7 · 16 days ago
maybe OpenAI have a terribly inefficient data ingestion pipeline? (wild guess) basically taking in new data is tedious so they do that infrequently and keep using old data for training.
xnx · 16 days ago
Does this indicate that OpenAI had a very long pretraining process for GPT5?
m3kw9 · 16 days ago
Maybe they have a long data cleanup process
m101 · 16 days ago
Perhaps they want to extract the logic/reason behind language over remembering facts which can be retrieved with a search.
wayeq · 15 days ago
Does the knowledge cut off date still matter all that much since all these models can do real time searches and RAG?
lurking_swe · 16 days ago
the model can do web search so this is mostly irrelevant i think.
breadwinner · 16 days ago
That could means OpenAI does not take any shortcuts when it comes to safety.
dotancohen · 16 days ago

  > GPT-5 knowledge cutoff: Sep 30, 2024
  > Gemini 2.5 Pro knowledge cutoff: Jan 2025
  > Claude Opus 4.1: knowledge cutoff: Mar 2025
A significant portion of the search results available after those dates is AI generated anyway, so what good would training on them do?

sumedh · 16 days ago
Latest tech docs about a library which you want to use in your code.
fidotron · 17 days ago
Going by the system card at: https://openai.com/index/gpt-5-system-card/

> GPT‑5 is a unified system . . .

OK

> . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).

So that's not really a unified system then, it's just supposed to appear as if it is.

This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.

hatthew · 16 days ago
I know this is just arguing semantics, but wouldn't you call it a unified system since it has a single interface that automatically interacts with different components? It's not a unified model, but it seems correct to call it a unified system.
fnordpiglet · 16 days ago
Altman et al have been discussing the many model interface in ChatGPT is confusing to users and they want to move to a unified system that exposes a model that routes based on the task rather than depending on users understanding how and when to do that. Presumably this is what they’ve been discussing for some time. I don’t know that was intended to mean they would be working toward some unified inference architecture and model, although I’m sure goal posts will be moved to ensure it’s insufficient.
sigmoid10 · 16 days ago
It's not a unified architecture transformer, but it is a unified system for chatting.
WorldPeas · 16 days ago
so openai is in the business of GPT wrappers now? I'm guessing their open model is an escape for those who wanted to have a "plain" model, though from my systematic testing, it's not much better than Kimi K2
andai · 16 days ago
> While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.

https://openai.com/index/introducing-gpt-5-for-developers/

Therenas · 16 days ago
Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.
fidotron · 16 days ago
I think that possibility is worse, because it implies a fundamental limit as opposed to a self imposed restriction, and I choose to remain optimistic.

If OpenAI really are hitting the wall on being able to scale up overall then the AI bubble will burst sooner than many are expecting.

ACCount36 · 16 days ago
That's a lie people repeat because they want it to be true.

People evaluate dataset quality over time. There's no evidence that datasets from 2022 onwards perform any worse than ones from before 2022. There is some weak evidence of an opposite effect, causes unknown.

It's easy to make "model collapse" happen in lab conditions - but in real world circumstances, it fails to materialize.

noosphr · 16 days ago
>This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.

The corollary to the bitter lesson strikes again: any hand crafted system will out perform any general system for the same budget by a wide margin.

fidotron · 16 days ago
That is, at best, wishful thinking.

In practice the whole point is the opposite is the case, which is why this direction by OpenAI is a suspicious indicator.

lacoolj · 17 days ago
Many tiny, specialized models is the way to go, and if that's what they're doing then it's a good thing.
fidotron · 17 days ago
Not at all, you will simply rediscover the bitter lesson [1] from your new composition of models.

[1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

gekoxyz · 17 days ago
We already did this for Object/Face recognition, it works but it's not the way to go. It's the way to go only if you don't have enough compute power (and data, I suspect) for a E2E network
TheOtherHobbes · 17 days ago
It's a concept of a unified system.
bjornsing · 16 days ago
You could train that architecture end-to-end though. You just have to run both models and backprop through both of them in training. Sort of like mixture of experts but with two very different experts.
dang · 17 days ago
Related ongoing thread:

GPT-5 System Card [pdf] - https://news.ycombinator.com/item?id=44827046

illiac786 · 16 days ago
I do agree that the current evolution is moving further and further away from AGI, and more toward a spectrum of niche/specialisation.

It feels less and less likely AGI is even possible with the data we have available. The one unknown is if we manage to get usable quantum computers, what that will do to AI, I am curious.

FeepingCreature · 16 days ago
If(f) it's trained end to end, it's a unified system.
mafro · 16 days ago
This is a precursor to a future model which isn't simply a router.

From the system card:

"In the near future, we plan to integrate these capabilities into a single model."

Icathian · 16 days ago
Anyone who still takes predictive statements from leadership at AI companies as anything other than meaningless noise isn't even trying.

Deleted Comment

AgentMatrixAI · 16 days ago
I'm not really convinced, the benchmark blunder was really strange but the demos were quite underwhelming, and it appears this was reflected by a huge market correction in the betting markets as to who will have the best AI by end of the year.

What excites me now is that Gemini 3.0 or some answer from Google is coming soon and that will be the one I will actually end up using. It seems like the last mover in the LLM race is more advantageous.

Buttons840 · 16 days ago
Polymarket betters are not impressed. Based upon the market odds, OpenAI had a 35% chance to have the best model (at year end), but those odds have dropped to 18% today.

(I'm mostly making this comment to document what happened for the history books.)

https://polymarket.com/event/which-company-has-best-ai-model...

vessenes · 16 days ago
After a few hours with gpt-5, I'd trade that spread. Not that I think oAI will win end of year. But I think gpt5 is better than it looks on the benchmark side. It is very very good at something we don't have a lot of benchmarks for -- keeping track of where it's at. codex is vassstly better in practice than claude code or gemini cli right now.

On the chat side, it's also quite different, and I wouldn't be surprised if people need some time to get a taste and a preference for it. I ask most models to help me build a macbook pro charger in 15th century florence with the instructions that I start with only my laptop and I can only talk for four hours of chat before the battery dies -- 5 was notable in that it thought through a bunch of second order implications of plans and offered some unusual things, including a list of instructions for a foot-treadle-based split ring commutator + generator in 15th century florentine italian(!). I have no way of verifying if the italian was correct.

Upshot - I think they did something very special with long context and iterative task management, and I would be surprised if they don't keep improving 5, based on their new branding and marketing plan.

That said, to me this is one of the first 'product release' moments in the frontier model space. 5 is not so much a model release as a polished-up, holes-fixed, annoyances-reduced/removed, 10x faster type of product launch. Google (current polymarket favorite) is remarkably bad at those product releases.

Back to betting - I bet there's a moment this year where those numbers change 10% in oAIs favor.

apetresc · 16 days ago
How on Earth does that market have Anthropic at 2%, in a dead heat with the likes of Meta? If the market was about yesterday rather than 5 months from now I think Claude would be pretty clearly the front runner. Why does the market so confidently think they’ll drop to dead last in the next little while?
jstummbillig · 16 days ago
That bet does not seem to be very illuminating. Winner is likely who happens to release closest to end of year, no?
croemer · 16 days ago
Looking at LMarena which polymarket uses, I'm not surprised. Based on the little data there is (3k duels, it's possibly worse than Gemini, it lost more to Gemini 2.5 Pro than it won in direct duels). Not sure why the ELO is still higher, possibly GPT5 did more clearly better against bad models, which I don't care about.
roflyear · 16 days ago
The Musk effect is pretty crazy. Or is there another explanation for why x can compete with Google?
boringg · 16 days ago
You don't actually hold polymarket odds with any significant weighting on actual outcomes do you?
m3kw9 · 16 days ago
Is not that they are not impressed, is just google came out with steerable video gen
riku_iki · 15 days ago
> Polymarket betters are not impressed. Based upon the market odds, OpenAI had a 35% chance to have the best model (at year end)

who will decide the winner to resolve bets?

joshmlewis · 16 days ago
I am convinced. I've been giving it tasks the past couple hours that Opus 4.1 was failing on and it not only did them but cleaned up the mess Opus made. It's the real deal.
diego_sandoval · 16 days ago
On that same vein, I had just tried Opus 4.1 yesterday, and it succesfully completed tasks that Sonnet 4 and Opus 4 failed at.
alfalfasprout · 16 days ago
Interesting, I've had the complete opposite experience. Opus 4.1 feels like a generational improvement compared to GPT-5.
energy123 · 16 days ago
And it's almost 10x cheaper via flex, and in #1 position on lmarena. It's not even close.
boomfunky · 16 days ago
The real last mover is Apple, because boy are they not moving.
manmal · 16 days ago
As an iOS dev, I really hope they acquire Anthropic before it’s too expensive.
echelon · 16 days ago
I really don't want the already trillion dollar mega monopoly to own the world.
blitzar · 16 days ago
I would rather the already trillion dollar mega monopoly own the world than "Open"Ai
someuser54541 · 16 days ago
Which betting markets were you referring to and where can they be viewed?
zamadatix · 16 days ago
Polymarket has a whole AI category https://polymarket.com/search/ai?_sort=volume of markets.

Deleted Comment

retinaros · 16 days ago
The demos were awful. It felt like watching sloppy vibe coded css UIs
m3kw9 · 16 days ago
Gpt5 high reasoning is a big step up from o3
minimaxir · 17 days ago
The marketing copy and the current livestream appear tautological: "it's better because it's better."

Not much explanation yet why GPT-5 warrants a major version bump. As usual, the model (and potentially OpenAI as a whole) will depend on output vibe checks.

WD-42 · 17 days ago
It has the last ~6 months worth of flavor of the month Javascript libraries in it's training set now, so it's "better at coding".

How is this sustainable.

sethops1 · 17 days ago
Who said anything about sustainable? The only goal here is to hobble to the next VC round. And then the next, and the next, ...
WXLCKNO · 16 days ago
It doesn't even have that, knowledge cutoff is in 2024.
jcgrillo · 17 days ago
Vast quantities of extremely dumb money
some-guy · 17 days ago
As someone who tries to push the limits of hard coding tasks (mainly refactoring old codebases) to LLMs with not much improvement since the last round of models, I'm finding that we are hitting the reduction of rate of improvement on the S-curve of quality. Obviously getting the same quality cheaper would be huge, but the quality of the output day to day isn't noticeable to me.
camdenreslink · 16 days ago
I find it struggles to even refactor codebases that aren't that large. If you have a somewhat complicated change that spans the full stack, and has some sort of wrinkle that makes it slightly more complicated than adding a data field, then even the most modern LLMs seem to trip on themselves. Even when I tell it to create a plan for implementation and write it to a markdown file and then step through those steps in a separate prompt.

Not that it makes it useless, just that we seem to not "be there" yet for the standard tasks software engineers do every day.

didibus · 16 days ago
Agree, I think they'll need to move to performance now. If a model was comparable to Claude 4, but took like 500ms or less per edit. A quicker feedback loop would be a big improvement.
krat0sprakhar · 17 days ago
> Not much explanation yet why GPT-5 warrants a major version bump

Exactly. Too many videos - too little real data / benchmarks on the page. Will wait for vibe check from simonw and others

collinmanderson · 17 days ago
> Will wait for vibe check from simonw

https://openai.com/gpt-5/?video=1108156668

2:40 "I do like how the pelican's feet are on the pedals." "That's a rare detail that most of the other models I've tried this on have missed."

4:12 "The bicycle was flawless."

5:30 Re generating documentation: "It nailed it. It gave me the exact information I needed. It gave me full architectural overview. It was clearly very good at consuming a quarter million tokens of rust." "My trust issues are beginning to fall away"

Edit: ohh he has blog post now: https://news.ycombinator.com/item?id=44828264

nicetryguy · 16 days ago
Yeah. We're entered the Smartphone stage: "You want the new one because it's the new one."
ttroyr · 3 days ago
I think the biggest tell for me was having the leader of Cursor up vouching for the model, who has been a big proponent of Claude in Cursor for the last year. Doesn't seem like a light statement.
sbinnee · 16 days ago
When they were about to release gpt4 I remember the hype was so high there were a lot of AGI debates. But then was quickly out-shadowed by more advanced models.

People knew that gpt5 wouldn’t be an AGI or even close to that. It’s just an updated version. GptN would become more or leas like an annual release.

scosman · 17 days ago
There's a bunch of benchmarks on the intro page including AIME 2025 without tools, SWE-bench Verified, Aider Polyglot, MMMU, and HealthBench Hard (not familiar with this one): https://openai.com/index/introducing-gpt-5/

Pretty par for course evals at launch setup.

jennyholzer · 16 days ago
I didn't think GPT-4 warranted a major version bump. I do not believe that Open AI's benchmarks are legitimate and I don't think they have been for quite some time, if ever.
blablablerg · 16 days ago
For fun, I asked it how much better it is than GPT-4. It started a rap battle against itself :P

https://chatgpt.com/share/6895d5da-8884-8003-bf9d-1e191b11d3...

Deleted Comment

Deleted Comment

anthonypasq · 17 days ago
its >o3 performance at gpt4 price. seems pretty obvious
thegeomaster · 17 days ago
o3 pricing: $8/Mtok out

GPT-5 pricing: $10/Mtok out

What am I missing?

pram · 17 days ago
We’re at the audiophile stage of LLMs where people are talking about the improved soundstage, tonality, reduced sibilance etc
jaredcwhite · 17 days ago
Note GPT-5's subtle mouthfeel reminiscent of cranberries with a touch of bourbon.
javchz · 17 days ago
I can already see LLMs Sommeliers: Yes, the mouthfeel and punch of GPT-5 it's comparable to the one of Grok 4, but it's tenderness lacks the crunch from Gemini 2.5 Pro.
tuesdaynight · 16 days ago
You need to burn-in your LLM by using for 100 hours before you see the true performance of it.
virgil_disgr4ce · 16 days ago
Well, reduced sibilance is an ordinary and desirable thing. A better "audiophile absurdity" example would be $77,000 cables, freezing CDs to improve sound quality, using hospital-grade outlets, cryogenically frozen outlets (lol), the list goes on and on
ezst · 16 days ago
Always have been. This LLM-centered AI boom has been my craziest and most frustrating social experiment, propped up by the rhetoric (with no evidence to back it up) that this time we finally have the keys to AGI (whatever the hell that means), and infused with enough AstroTurfing to drive the discourse into ideological stances devoid of any substance (you must either be a true believer or a naysayer). On the plus side, it appears that this hype train is taking a bump with GPT-5.
satyrun · 17 days ago
Come on, we aren't even close to the level of audiophile nonsense like worrying about what cable sounds better.
catigula · 17 days ago
Informed audiophiles rely on Klippel output now
Q6T46nT668w6i3m · 17 days ago
It’s always been this way with LLMs.
doctoboggan · 17 days ago
Watching the livestream now, the improvement over their current models on the benchmarks is very small. I know they seemed to be trying to temper our expectations leading up to this, but this is much less improvement than I was expecting
827a · 17 days ago
I have a suspicion that while the major AI companies have been pretty samey and competing in the same space for a while now, the market is going to force them to differentiate a bit, and we're going to see OpenAI begin to lose the race toward extremely high levels of intelligence instead choosing to focus on justifying their valuations by optimizing cost and for conversational/normal intelligence/personal assistant use-cases. After all, most of their users just want to use it to cheat at school, get relationship advice, and write business emails. They also have Ive's company to continue investing in.

Meanwhile, Anthropic & Google have more room in their P/S ratios to continue to spend effort on logarithmic intelligence gains.

Doesn't mean we won't see more and more intelligent models out of OpenAI, especially in the o-series, but at some point you have to make payroll and reality hits.

juped · 16 days ago
I think this is pretty much what we've already seen happening, in fact.
davidhs · 16 days ago
> I know they seemed to be trying to temper our expectations leading up to this

Before the release of the model Sam Altman tweeted a picture of the Death Star appearing over the horizon of a planet.

blitzar · 16 days ago
Is he suggesting his company is designed with a womp rat sized opening that if you shoot a bullet into makes the whole thing explode?
softwaredoug · 16 days ago
He also said he had an existential crisis that he was completely useless now at work.
diogolsq · 16 days ago
Law of diminishing returns.

We’re talking about less than a 10% performance gain, for a shitload of data, time, and money investment.

gwd · 16 days ago
I'm not sure what "10% performance gain" is supposed to mean here; but moving from "It does a decent job 95% of the time but screws it up 5%" to "It does a decent job 98% of the time and screws it up 2%" to "It does a decent job 99.5% of the time and only screws it up 0.5%" are major qualitative improvements.
illiac786 · 16 days ago
Yeah I think that throwing more and more compute at the same training data produces smaller and smaller gains.

Maybe quantum compute would be significant enough of a computing leap to meaningfully move the needle again.

z7 · 17 days ago
GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over Claude Opus 4:

https://lmarena.ai/leaderboard

virgildotcodes · 17 days ago
This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off.
zamadatix · 16 days ago
"+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even.
degrews · 16 days ago
That eval hasn't been relevant for a while now. Performance there just doesn't seem to correlate well with real-world performance.
Too · 16 days ago
What does +75 arbitrary points mean in practice? Can we come up with units that relate to something in the real world.
anyg · 16 days ago
Also, the code demos are all using GPT-5 MAX on Cursor. Most of us will not be able to use it like that all the time. They should have showed it without MAX mode as well
Workaccount2 · 17 days ago
Sam said maybe two years ago that they want to avoid "mic drop" releases, and instead want to stick to incremental steps.

This is day one, so there is probably another 10-20% in optimizations that can be squeezed out of it in the coming months.

bigmadshoe · 17 days ago
Then why increment the version number here? This is clearly styled like a "mic drop" release but without the numbers to back it up. It's a really bad look when comparing the crazy jump from GPT3 to GPT4 to this slight improvement with GPT5.
yahoozoo · 16 days ago
He said that because even then he saw the writing on the wall that LLMs will plateau.
iLoveOncall · 16 days ago
> Sam said maybe two years ago that they want to avoid "mic drop" releases, and instead want to stick to incremental steps.

He also said that AGI was coming early 2025.

People that can't stop drinking the kool aid are really becoming ridiculous.

hodgehog11 · 16 days ago
The hallucination benchmarks did show major improvement. We know existing benchmarks are nearly useless at this point. It's reliability that matters more.
jama211 · 16 days ago
I’m more worried about how they still confidently reason through things incorrectly all the time, which isn’t quite the same as hallucination, but it’s in a similar vein.
lawlessone · 17 days ago
im sure i am repeating someone else but sounds like we're coming over the s-curve
Bluestein · 17 days ago
My thought exactly.-

Diminished returns.-

... here's hoping it leads to progress.-

wahnfrieden · 17 days ago
It is at least much cheaper and seems faster.

They also announced gpt-5-pro but I haven't seen benchmarks on that yet.

doctoboggan · 17 days ago
I am hoping there is a "One more thing" that shows the pro version with great benchmark scores
og_kalu · 17 days ago
I mean that's just the consequence of releasing a new model every couple months. If Open AI stayed mostly silent since the GPT-4 release (like they did for most iterations) and only now released 5 then nobody would be complaining about weak gains in benchmarks.
jononor · 17 days ago
If everyone else had stayed silent as well, then I would agree. But as it is right now they are juuust about managing to match the current pace of the other contenders. Which actually is fine, but they have previously set quite high expectations. So some will probably be disappointed at this.
moduspol · 17 days ago
Well it was their choice to call it GPT 5 and not GPT 4.2.
cardine · 16 days ago
If they had stayed silent since GPT-4, nobody would care what OpenAI was releasing as they would have become completely irrelevant compared to Gemini/Claude.
tylermw · 17 days ago
What's going on with this plot's y-axis?

https://bsky.app/profile/tylermw.com/post/3lvtac5hues2n

haffi112 · 17 days ago
It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.

Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.

danpalmer · 17 days ago
> like the presentation is rushed or made last minute

Or written by GPT-5?

herval · 16 days ago
They never compare with other vendors
ozgung · 16 days ago
Also this coding deception rate bar tries to decieve us.

https://imgur.com/a/QkriFco

ileonichwiesz · 16 days ago
It’s beyond parody that they did something like this on a slide about deception. You couldn’t make this stuff up.
TrackerFF · 16 days ago
After reading around, it seems like they probably forgot to update/swap the slides before presentation. The graphs were correct on their website, as they launched. But the ones they used in the presentation were probably some older versions they had forgotten to fix.
rrrrrrrrrrrryan · 17 days ago
This is hilarious
moritzwarhier · 17 days ago
Probably created without thinking enabled. Lower % accuracy ensues, speaking from experience.
silverquiet · 17 days ago
Probably generated by AI.
Sateeshm · 17 days ago
If not, the person that made the chart just got $1.5M
lysecret · 17 days ago
Couldn’t believe it was real haha

Dead Comment

artemonster · 17 days ago
[flagged]
dang · 17 days ago
Please don't post like this to Hacker News, regardless of how idiotic other people are or you feel they are.

You may not owe people who you feel are idiots better, but you owe this community better if you're participating in it.

https://news.ycombinator.com/newsguidelines.html

sundarurfriend · 17 days ago
Some people have hypothesized that GPT-5 is actually about cost reduction and internal optimization for OpenAI, since there doesn't seem to be much of a leap forward, but another element that they seem to have focused on that'll probably make a huge difference to "normal" (non-tech) users is making precise and specifically worded prompts less necessary.

They've mentioned improvements in that aspects a few times now, and if it actually materializes, that would be a big leap forward for most users even if underneath GPT-4 was also technically able to do the same things if prompted just the right way.

podgietaru · 16 days ago
I just don’t know that you’d name that 5.

The jump from 3 to 4 was huge. There was an expectation for similar outputs here.

Making it cheaper is a good goal - certainly - but they needed a huge marketing win too.

bcherry · 16 days ago
yeah i think they shot themselves in the foot a bit here by creating the o series. the truth is that GPT-5 _is_ a huge step forward, for the "GPT-x" models. The current GPT-x model was basically still 4o, with 4.1 available in some capacity. GPT-5 vs GPT-4o looks like a massive upgrade.

But it's only an incremental improvement over the existing o line. So people feel like the improvement from the current OpenAI SoTA isn't there to justify a whole bump. They probably should have just called o1 GPT-5 last year.

fastball · 16 days ago
It’s a new major because they are using it to deprecate other models.
PUSH_AX · 16 days ago
This tells me we're hitting a ceiling.
techpineapple · 16 days ago
Did they really have another choice? if no big leap was on the horizon are they just never going to release 5? I mean, from a marketing perspective.
hobofan · 16 days ago
It sounded like they were very careful to always mention that those improvements were for ChatGPT, so I'm very skeptical that they translate to the API versions of GPT-5.
whywhywhywhy · 16 days ago
everything since the GPT-4 "Dev Day" downgrade from them has felt like cost reduction and internal optimization tbqh