GPT-5 - Readit News

It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together. Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

As a user, it feels like the race has never been as close as it is now. Perhaps dumb to extrapolate, but it makes me lean more skeptical about the hard take-off / winner-take-all mental model that has been pushed.

Would be curious to hear the take of a researcher at one of these firms - do you expect the AI offerings across competitors to become more competitive and clustered over the next few years, or less so?

jablongo · 16 days ago

It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better. I don't think thats what's going on here, but we should be ready for it. For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually? To the extent that our AGI/SI metrics are based on human judgement the cluster effect that they create may be an illusion.

Wowfunhappy · 16 days ago

> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

No, but I wouldn't be able to tell you what the player did wrong in general.

By contrast, the shortcomings of today's LLMs seem pretty obvious to me.

ohelno · 16 days ago

> it may be very difficult for us as users to discern which model is better

But one thing will stay consistent with LLMs for some time to come: they are programmed to produce output that looks acceptable, but they all unintentionally tend toward deception. You can iterate on that over and over, but there will always be some point where it will fail, and the weight of that failure will only increase as it deceives better.

Some things that seemed safe enough: Hindenburg, Titanic, Deepwater Horizon, Chernobyl, Challenger, Fukushima, Boeing 737 MAX.

torginus · 16 days ago

which is a thing with humans as well - I had a colleague with certified 150+ IQ, and other than moments of scary smart insight, he was not a superman or anything, he was surprisingly ordinary. Not to bring him down, he was a great guy, but I'd argue many of his good qualities had nothing to do with how smart he was.

blackkettle · 16 days ago

It's even more difficult because, while all the benchmarks provide some kind of 'averaged' performance metric for comparison, in my experience most users have pretty specific regular use cases, and pretty specific personal background knowledge. For instance I have a background in ML, 15 years experience in full stack programming, and primarily use LLMs for generating interface prototypes for new product concepts. We use a lot of react and chakraui for that, and I consistently get the best results out of Gemini pro for that. I tried all the available options and settled on that as the best for me and my use case. It's not the best for marketing boilerplate, or probably a million other use cases, but for me, in this particular niche it's clearly the best. Beyond that the benchmarks are irrelevant.

DoctorOetker · 16 days ago

we could run some tests to first find out if comparative performance tests can be conjured:

one can intentionally use a recent and a much older model to figure out if the tests are reliable, and in which domains it is reliable.

one can compute a models joint probability for a sequence and compare how likely each model finds the same sequence.

we could ask both to start talking about a subject, but alternatingly each can emit a token. look again at how the dumber and smarter models judge the resulting sentence does the smart one tend to pull up the quality of the resulting text, or does it tend to get dragged down more towards the dumber participant?

given enough such tests to "identify the dummy vs smart one" and verifying them on common agreement (as an extreme word2vec vs transformer) to assess the quality of the test, regardless of domain.

on the assumption that such or similar tests allow us to indicate the smarter one, i.e. assuming we find plenty such tests, we can demand model makers publish open weights so that we can publically verify performance agreements.

Another idea is self-consistency tests: a single forward inference of context size say 2048 tokens (just an example) is effectively predicting the conditional 2-gram, 3-gram, 4-gram probabilities on the input tokens. so each output token distribution is predicted on the preceding inputs, so there are 2048 input tokens and 2048 output tokens, the position 1 output token is the predicted token vector (logit vector really) that is estimated to follow the given position 1 input vector, and the position 2 output vector is the prediction following the first 2 input vectors etc. and the last vector is the predicted next token following all the 2048 input tokens. p(t_(i+1) | t_1 =a, t_2=b, ..., t_i=z).

But that is just one way the next token can be predicted using the network: another approach would be to use RMAD gradient descent, but keeping model weights fixed, and only considering the last say 512 input vectors as variable, how well did the last 512 predicted forward prediction output vectors match the gradient descent best joint probability output vectors?

This could be added as a loss term during training as well, as a form of regularization, which turns it into a kind of Energy Based Model roughly.

spot5010 · 16 days ago

My guess is that more than the raw capabilities of a model, users would be drawn more to the model's personality. A "better" model would then be one that can closely adopt the nuances that a user likes. This is a largely uninformed guess, let's see if it holds up well with time.

tbrownaw · 16 days ago

> It's also worth considering that past some threshold, it may be very difficult for us as users to discern which model is better.

Even if they've saturated the distinguishable quality for tasks they can both do, I'd expect a gap in what tasks they're able to do.

tsunamifury · 16 days ago

This is the F1 vs 911 car problem. A 911 is just as fast as an f1 car to 60 (sometimes even faster) but an f1 is better at super high performance envelope above 150 in tight turns.

An average driver evaluating both would have a very hard time finding the f1s superior utility

flir · 16 days ago

> For example, if you are an ELO 1000 chess player would you yourself be able to tell if Magnus Carlson or another grandmaster were better by playing them individually?

Yes, because I'd get them to play each other?

artursapek · 16 days ago

We’re judging them with benchmarks, not our own intuitions.

jv22222 · 16 days ago

I think Musk puts it well when he says the ultimate test is can they help improve the real world.

andrepd · 16 days ago

I could certainly tell if they played ??-level blunders, which LLMs do all the time.

unsupp0rted · 13 days ago

You don't have to be even good at chess to be able to tell when a game is won or lost, most of the time.

I don't need to understand how the AI made the app I asked for or cured my cancer, but it'll be pretty obvious when the app seems to work and the cancer seems to be gone.

I mean, I want to understand how, but I don't need to understand how, in order to benefit from it. Obviously understanding the details would help me evaluate the quality of the solution, but that's an afterthought.

snthpy · 16 days ago

That's a great point. Thanks.

Dead Comment

somenameforme · 16 days ago

If AGI is ever achieved, it would open the door to recursive self improvement that would presumably rapidly exceed human capability across any and all fields, including AI development. So the AI would be improving itself while simultaneously also making revolutionary breakthroughs in essentially all fields. And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

But I think we're not even on the path to creating AGI. We're creating software that replicate and remix human knowledge at a fixed point in time. And so it's a fixed target that you can't really exceed, which would itself already entail diminishing returns. Pair this with the fact that it's based on neural networks which also invariably reach a point of sharply diminishing returns in essentially every field they're used in, and you have something that looks much closer to what we're doing right now - where all competitors will eventually converge on something largely indistinguishable from each other, in terms of ability.

stoneyhrm1 · 16 days ago

> revolutionary breakthroughs in essentially all field

This doesn't really make sense outside computers. Since AI would be training itself, it needs to have the right answers, but as of now it doesn't really interact with the physical world. The most it could do is write code, and check things that have no room for interpretation, like speed, latency, percentage of errors, exceptions, etc.

But, what other fields would it do this in? How can it makes strives in biology, it can't dissect animals, it can't figure more out about plants that humans feed into the training data. Regarding math, math is human-defined. Humans said "addition does this", "this symbol means that", etc.

I just don't understand how AI could ever surpass anything human known before we live by the rules defined by us.

thinkingtoilet · 16 days ago

>And, for at least a while, it would also presumably be doing so at an exponentially increasing rate.

Why would you presume this? I think part of a lot of people's AI skepticism is talk like this. You have no idea. Full stop. Why wouldn't progress be linear? As new breakthroughs come, newer ones will be harder to come by. Perhaps it's exponential. Perhaps it's linear. No one knows.

jltsiren · 16 days ago

There is no particular reason to assume that recursive self-improvement would be rapid.

All the technological revolutions so far have accounted for little more than a 1.5% sustained annual productivity growth. There are always some low-hanging fruit with new technology, but once they have been picked, the effort required for each incremental improvement tends to grow exponentially.

That's my default scenario with AGI as well. After AGI arrives, it will leave humans behind very slowly.

dabockster · 16 days ago

> diminishing returns

I think this is a hard kick below the belt for anyone trying to develop AGI using current computer science.

Current AIs only really generate - no, regenerate text based on their training data. They are only as smart as other data available. Even when an AI "thinks", it's only really still processing existing data rather than making a genuinely new conclusion. It's the best text processor ever created - but it's still just a text processor at its core. And that won't change without more hard computer science being performed by humans.

So yeah, I think we're starting to hit the upper limits of what we can do with Transformers technology. I'd be very surprised if someone achieved "AGI" with current tech. And, if it did get achieved, I wouldn't consider it "production ready" until it didn't need a nuclear reactor to power it.

esafak · 16 days ago

> If AGI is ever achieved, it would open the door to recursive self improvement ...

They are unrelated. All you need is a way for continual improvement without plateauing, and this can start at any level of intelligence. As it did for us; humans were once less intelligent.

Using the flagship to bootstrap the next iteration with synthetic data is standard practice now. This was mentioned in the GPT5 presentation. At the rate things are going I think this will get us to ASI, and it's not going to feel epochal for people who have interacted with existing models, but more of the same. After all, the existing models are already smarter than most humans and most people are taking it in their stride.

The next revolution is going to be embodiment. I hope we have the commonsense to stop there, before instilling agency.

moron4hire · 16 days ago

That's only assuming there are no fundamental limits or major barriers to computation. Back a hundred years ago at the dawn of flight, one could have said a very similar thing about aircraft performance. And for a time in the 1950s, it looked like aircraft speed was growing exponentially over time. But there haven't been any new airspeed records (at least, officially recorded) since 1986, because it turns out going Mach 3+ is fairly dangerous and approaching some rather severe materials and propulsion limitations, making it not at all economical.

I would also not be surprised if the process of developing something comparable to human intelligence, assuming the extreme computation, energy, and materials issues of packing that much computation and energy into a single system could be overcome, the AI also develops something comparable to human desire and/or mental health issues. There is a not-zero chance we could end up with AI that doesn't want to do what we ask it to do or doesn't work all the time because it wants to do other things.

You can't just assume exponential growth is a forgone conclusion.

solumunus · 16 days ago

For some reason people pre suppose super intelligence into AGI. What if AGI had diminishing returns around human level intelligence? They still have to deal with all the same knowledge gaps we have.

aldousd666 · 16 days ago

Those problems aren't just waiting on smarts/intelligence. Those would require experimentation in the real world. You can't solve chemistry by just thinking about it really hard. You still have to do experiments. A super intelligent machine may be better at coming up with experiments to do than we are, but without the right stuff to do them, it can't 'solve' anything of the like.

andsoitis · 16 days ago

> So the AI would be improving itself

Why would the AI want to improve itself? From whence would that self-motivation stem?

throwawayb2025 · 16 days ago

Recursive improvement without any physical change maybe limited. If any physical change like more gpu or different network configuration is required to experiment and again change to learn from it that might not be easy. Convincing human to do on AGI behalf may not be that simple. There might be multiple path to try and teams may not agree with each other. Specially if the cost of trial is high.

ieee2 · 16 days ago

AI can be trained on some special knowledge of person A and another special knowledge of person B. These two persons may never met before and therefore they can not combine their knowledge to get some new knowledge or insight.

AI can do it fine as it knows A and B. And that is knowledge creation.

SkyMarshal · 15 days ago

> But I think we're not even on the path to creating AGI.

It seems like the LLM model will be component of an eventual AGI, it's voice per se, but not its mind. The mind still requires another innovation or breakthrough we haven't seen yet.

FuriouslyAdrift · 16 days ago

Math... lots and lots of math solutions. Like if it could figure out the numerical sign problem, it could quite possibly be able to simulate all of physics.

p0nce · 16 days ago

Well it could also self-improve increasingly slowly.

mycall · 16 days ago

You are missing the point where synthetic data, deterministic tooling (written by AI) and new discoveries by each model generation feeds into the next model. This iteration is the key to going beyond human intelligence.

beeflet · 16 days ago

Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I am not an AI researcher, but I have friends who do work in the field, and they are not worried about LLM-based AGI because of the diminishing returns on results vs amount of training data required. Maybe this is the bottleneck.

Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better. Whereas LLMs tend to regurgitate solutions to solved problems, where the solutions tend to be well-published in training data.

That being said, AGI is not a necessary requirement for AI to be totally world-changing. There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence. Search is one example where the ability to regurgitate knowledge from many domains is desirable

JohnBooty · 16 days ago

    That being said, AGI is not a necessary requirement for AI to be totally world-changing

Yeah. I don't think I actually want AGI? Even setting aside the moral/philosophical/etc "big picture" issues I don't think I even want that from a purely practical standpoint.

I think I want various forms of AI that are more focused on specific domains. I want AI tools, not companions or peers or (gulp) masters.

(Then again, people thought they wanted faster horses before they rolled out the Model T)

novok · 16 days ago

They are moving beyond just big transformer blob LLM text prediction. Mixture of Experts is not preassembled for example, it's something like x empty experts with an empty router and the experts and routing emerges naturally with training, modeling the brain part architecture we see the brain more. There is stuff "Integrated Gated Calculator (IGC)" in Jan 2025 which makes a premade calculator neural network and integrates it directly into the neural network and gets around the entire issue of making LLMs do basic number computation and the clunkiness of generating "run tool tokens". The model naturally learns to use the IGC built into itself because it will always beat any kind of computation memorization in the reward function very quickly.

Models are truly input multimodal now. Feeding an image, feeding audio and feeding text all go into separate input nodes, but it all feeds into the same inner layer set and outputs text. This also mirrors how brains work more as multiple parts integrated in one whole.

Humans in some sense are not empty brains, there is a lot of stuff baked in our DNA and as the brain grows it develops a baked in development program. This is why we need fewer examples and generalize way better.

gunnaraasen · 16 days ago

Seems like the real innovation of LLM-based AI models is the creation of a new human-computer interface.

Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.

In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.

robotnikman · 16 days ago

There is also the fact that AI lacks long term memory like humans do. If you consider context length long term memory, its incredibly short compared to that of a human. Maybe if it reaches into the billions or trillions of tokens in length we might have something comparable, or someone comes up with a new solution of some kind

mikepurvis · 16 days ago

"LLMs tend to regurgitate solutions to solved problems"

People say this, but honestly, it's not really my experience— I've given ChatGPT (and Copilot) genuinely novel coding challenges and they do a very decent job at synthesizing a new thought based on relating it to disparate source examples. Really not that dissimilar to how a human thinks about these things.

rstuart4133 · 16 days ago

> That being said, AGI is not a necessary requirement for AI to be totally world-changing.

Depends on how you define "world changing" I guess, but this world already looks different to the pre-LLM world to me.

Me asking LLM's things instead of consulting the output from other humans now takes up a significant fraction of my day. I don't google near as often, I don't trust any image or video I see as swathes of the creative professions have been replaced by output from LLM's.

It's funny, that final thing is the last thing I would have predicted. I always believed the one thing a machine could not match was human creativity, because the output of machines was always precise, repetitive and reliable. Then LLM's come along, randomly generating every token. Their primary weakness is they neither precise or reliable, but they can turn out an unending stream of unique output.

gaptoothclan · 16 days ago

I remember reading that llm’s have consumed the internet text data, I seem to remember there is an open data set for that too. Potential other sources of data would be images (probably already consumed) videos, YouTube must have such a large set of data to consume, perhaps Facebook or Instagram private content

But even with these it does not feel like AGI, that seems like the fusion reactor 20 years away argument, but instead this is coming in 2 years, but they have not even got the core technology of how to build AGI

topspin · 16 days ago

> Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

I think you're on to it. Performance is clustering because a plateau is emerging. Hyper-dimensional search engines are running out of steam, and now we're optimizing.

anon7000 · 16 days ago

True. At a minimum, as long as LLMs don't include some kind of more strict representation of the world, they will fail in a lot of tasks. Hallucinations -- responding with a prediction that doesn't make any sense in the context of the response -- are still a big problem. Because LLMs never really develop rules about the world.

For example, while you can get it to predict good chess moves if you train it on enough chess games, it can't really constrain itself to the rules of chess. (https://garymarcus.substack.com/p/generative-ais-crippling-a...)

onlyrealcuzzo · 16 days ago

> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

Aren't we the summation of intelligence from quintillions of beings over hundreds of millions of years?

Have LLMs really had more data?

t0lo · 16 days ago

To be smarter than human intelligence you need smarter than human training data. Humans already innately know right and wrong a lot of the time so that doesn't leave much room.

wyager · 16 days ago

> a stochastic model for predicting text

It's fascinating to me that so many people seem totally unable to separate the training environment from the final product

FollowingTheDao · 16 days ago

The bottleneck is nothing to do with money, it’s the fact that they’re using the empty neuron theory to try to mimic human consciousness and that’s not how it works. Just look up Microtubules and consciousness, and you’ll get a better idea for what I’m talking about.

These AI computers aren’t thinking, they are just repeating.

Mistletoe · 16 days ago

What are the AI/ML/SL applications that could be more impactful than artificial general intelligence?

timeon · 16 days ago

> Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better.

That is because with LLMs there is no intelligence. It is Artificial Knowledge. AK not AI. So AI is AGI. Not that it matters for user-cases we have, but marketing needs 'AI' because that is what we were expecting for decades. So yeah, I also do not thing we will have AGI from LLMs - nor does it matter for what we are using it.

justcallmejm · 16 days ago

It is definitively not possible. But the frontier models are no longer “just” LLMs, either. They are neurosymbolic systems (an LLM using tools); they just don’t say it transparently because it’s not a convenient narrative that intelligence comes from something outside the model, rather than from endless scaling.

At Aloe, we are model agnostic and outperforming frontier models. It’s the anrchitecture around the LLM that makes the difference. For instance our system using Gemini can do things that Gemini can’t do on its own. All an LLM will ever do is hallucinate. If you want something with human-like general intelligence, keep looking beyond LLMs.

GolDDranks · 16 days ago

I think it's very fortunate, because I used to be an AI doomer. I still kinda am, but at least I'm now about 70% convinced that the current technological paradigm is not going to lead us to a short-term AI apocalypse.

The fortunate thing is that we managed to invent an AI that is good at _copying us_ instead of being a truly maveric agent, which kinda limits it to the "average human" output.

However, I still think that all the doomer arguments are valid, in principle. We very well may be doomed in our lifetimes, so we should take the threat very seriously.

margalabargala · 16 days ago

It won't lead us to an apocalypse apocalypse, but it may well lead us to an economic crisis.

baxtr · 16 days ago

The AI dooming was never a thing for me. And I still don’t get it.

I don’t see anything that would even point into that direction.

Curious to understand where these thoughts are coming from

hattmall · 16 days ago

I don't understand the doomer mindset. Like what is it that you think AI is going to do or be capable of doing that's so bad?

KoolKat23 · 16 days ago

The only thing holding it back is lack of compute, and a lack of live world interface.

makin · 16 days ago

Companies are collections of people, and these companies keep losing key developers to the others, I think this is why the clusters happen. OpenAI is now resorting to giving million dollar bonuses to every employee just to try to keep them long term.

caconym_ · 16 days ago

If there was any indication of a hard takeoff being even slightly imminent, I really don't think key employees of the company where that was happening would be jumping ship. The amounts of money flying around are direct evidence of how desperate everybody involved is to be in the right place when (so they imagine) that takeoff happens.

kevinventullo · 16 days ago

Key developers being the leading term doesn’t exactly help the AGI narrative either.

procaryote · 16 days ago

So they're struggling to solve the alignment problem even for their employees?

indigodaddy · 16 days ago

Even to just a random sysops person?

tsunamifury · 16 days ago

No the core technology is reaching its limit already and now it needs to Proliferate into features and applications to sell.

This isn’t rocket science.

bloqs · 16 days ago

that kid at meta negotiated 250m

EthanHeilman · 16 days ago

> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This seems to be a result of using overly simplistic models of progress. A company makes a breakthrough, the next breakthrough requires exploring many more paths. It is much easier to catch up than find a breakthrough. Even if you get lucky and find the next breakthrough before everyone catches up, they will probably catch up before you find the breakthrough after that. You only have someone run away if each time you make a breakthrough, it is easier to make the next breakthrough than to catch up.

Consider the following game:

1. N parties take turns rolling a D20. If anyone rolls 20, they get 1 point.

2. If any party is 1 or more points behind, they get only need to roll a 19 or higher to get one point. That is being behind gives you a slight advantage in catching up.

While points accumulate, most of the players end up with the same score.

I ran a simulation of this game for 10,000 turns with 5 players:

Game 1: [852, 851, 851, 851, 851]

Game 2: [827, 825, 827, 826, 826]

Game 3: [827, 822, 827, 827, 826]

Game 4: [864, 863, 860, 863, 863]

Game 5: [831, 828, 836, 833, 834]

alexey-salmin · 16 days ago

Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop. Hence the expected exponential explosion in power.

But yes, so far it feels like we are in the latter stages of the innovation S-curve for transformer-based architectures. The exponent may be out there but it probably requires jumping onto a new S-curve.

Sankozi · 16 days ago

You are forgetting that we are talking about AI. That AI will be used to speed up progress on making next, better AI that will be used to speed up progress on making next, better AI that ...

nerdix · 16 days ago

Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized.

I think it's likely that we will eventually we hit a point of diminishing returns where the performance is good enough and marginal performance improvements aren't worth the high cost.

And over time, many models will reach "good enough" levels of performance including models that are open weight. And given even more time, these open weight models will be runnable on consumer level hardware. Eventually, they'll be runnable on super cheap consumer hardware (something more akin to a NPU than a $2000 RTX 5090). So your laptop in 2035 with specialize AI cores and 1TB of LPDDR10 ram is running GPT-7 level models without breaking a sweat. Maybe GPT-10 can solve some obscure math problem that your model can't but does it even matter? Would you pay for GPT-10 when running a GPT-7 level model does everything you need and is practically free?

The cloud providers will make money because there will still be a need for companies to host the models in a secure and reliable way. But a company whose main business strategy is developing the model? I'm not sure they will last without finding another way to add value.

joelthelion · 16 days ago

> Not only do I think there will not be a winner take all, I think it's very likely that the entire thing will be commoditized

This begs the question, why then do AI companies have these insane valuations? Do investorsknow something that we don't?

hnlmorg · 16 days ago

The reason AGI would create a singularity is because of its ability to self learn.

Presently we are still a long way from that. In my opinion we at least are as far away from AGI as 1970s mainframes were from LLMs.

I really don’t expect to see AGI in my lifetime.

adastra22 · 16 days ago

That is already happening. These labs are writing next gen models using next gen models, with greater levels of autonomy. That doesn’t get the hard takeoff people talk about because those hypotheticals don’t consider sources of error, noise, and drift.

mmcconnell1618 · 16 days ago

Self-learning opens new training opportunities but not at the scale or speed of current training. The world only operates at 1x speed. Today's models have been trained on written and visual content created by billions of humans over thousands of years.

You can only experience the world in one place in real time. Even if you networked a bunch of "experiencers" together to gather real time data from many places at the same time, you would need a way to learn and train on that data in real time that could incorporate all the simultaneous inputs. I don't see that capability happening anytime soon.

russellbeattie · 16 days ago

This is the key - right now each new model has had countless resources dedicated to training, then they are more or less set in stone until the next update.

These big models don't dynamically update as days pass by - they don't learn. A personal assistant service may be able to mimic learning by creating a database of your data or preferences, but your usage isn't baked back into the big underlying model permanently.

I don't agree with "in our lifetimes", but the difference between training and learning is the bright red line. Until there's a model which is able to continually update itself, it's not AGI.

My guess is that this will require both more powerful hardware and a few more software innovations. But it'll happen.

hollownobody · 16 days ago

This reminds me of: https://en.m.wikipedia.org/wiki/Flying_Machines_Which_Do_Not...

hathawsh · 16 days ago

There are areas where we seem to be much closer to AGI than most people realize. AGI for software development, in particular, seems incredibly close. For example, Claude Code has bewildering capabilities that feel like magic. Mix it with a team of other capable development-oriented AIs and you might be able to build AI software that builds better AI software, all by itself.

layer8 · 16 days ago

The ability to self-learn is necessary, but not necessarily sufficient. We don’t have much of an understanding of the intelligence landscape beyond human-level intelligence, or even besides it. There may be other constraints and showstoppers, for example related to computability.

Muromec · 16 days ago

We have an ability to self learn right now, but we stil suck at basics

runarberg · 16 days ago

I feel like technological singularity has been pretty solidly ruled as junk science, like cold fusion, Malthusian collapse, or Lynn’s IQ regression. Technologists have made numerous predictions and hypothetical scenarios, non of which have come to fruition, nor does it seem likely at any time in the future.

I think we should be treating AGI like Cold Fusion, phrenology, or even alchemy. It is not science, but science fiction. It is not going to happen and no research into AGI will provide anything of value (except for the grifters pushing the pseudo-science).

Davidzheng · 16 days ago

should be next year in math domain tbh

tedggh · 16 days ago

In my experience and use case Grok is pretty much unusable when working with medium size codebases and systems design. ChatGPT has issues too but at least I have figured out a way around most of them, like asking for a progress and todo summary and uploading a zip file of my codebase to a new chat window say every 100 interactions, because speed degrades and hallucinations increase. Super Grok seems extremely bad at keeping context during very short interactions within a project even when providing it with a strong foundation via instructions. For example if the code name for a system or feature is called Jupiter, Grok will many times start talking about Jupiter the planet.

weego · 16 days ago

I'm still stuck at the bit where just throwing more and more data to make a very complex encyclopedia with an interesting search interface that tricks us into believing it's human-like gets us to AGI when we have no examples and thus no evidence or understanding of where the GI part comes from.

It's all just hyperbole to attract investment and shareholder value and the people peddling the idea of AGI as a tangible possibility are charlatans whose goals are not aligned with whatever people are convincing themselves are the goals.

Thr fact that so many engineers have fallen for it so completely is stunning to me and speaks volumes on the underlying health of our industry.

keernan · 16 days ago

I believe the analogy of a LLM being "a very complex encyclopedia with an interesting search interface" to be spot on.

However, I would not be so dismissive of the value. Many of us are reacting to the complete oversell of 'the encyclopedia' as being 'the eve of AGI' - as rightfully we should. But, in doing so, I believe it would be a mistake to overlook the incredible impact - and economic displacement - of having an encyclopedia comprised of all the knowledge of mankind that has "an interesting search interface" that is capable of enabling humans to use the interface to manipulate/detect connections between all that data.

habinero · 16 days ago

Me too. Some of them are frauds, but most of the weird AI-as-messiah people really believe it as far as I can tell.

The tech is neat and it can do some neat things but...it's a bullshit machine fueled by a bullshit machine hype bubble. I do not get it.

Sharlin · 16 days ago

> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Yes. And the fact they're instead clustering simply indicates that they're nowhere near AGI and are hitting diminishing returns, as they've been doing for a long time already. This should be obvious to everyone. I'm fairly sure that none of these companies has been able to use their models as a force multiplier in state-of-the-art AI research. At least not beyond a 1+ε factor. Fuck, they're just barely a force multiplier in mundane coding tasks.

ricardobayes · 16 days ago

AGI in 5/10 years is similar to "we won't have steering wheels in cars" or "we'll be asleep driving" in 5/10 years. Remember that? What happened to that? It looked so promising.

ralfd · 16 days ago

> "we'll be asleep driving" in 5/10 years. Remember that? What happened to that?

https://www.youtube.com/shorts/dLCEUSXVKAA

RealityVoid · 16 days ago

I mean, in certain US cities you can take a waymo right now. It seems that adage where we overestimate change in the short term and underestimate change in the long term fits right in here.

baxtr · 16 days ago

I have been saying this before: S-curves look a lot like exponential curves in the beginning.

Thus, it’s easy to mistake one for the other - at least initially.

jjk166 · 16 days ago

Looks like a lot of players getting closer and closer to an asymptotic limit. Initially small changes lead to big improvements causing a firm to race ahead, as they go forward performance gains from innovation become both more marginal and harder to find, nonetheless keep. I would expect them all to eventually reach the same point where they are squeezing the most possible out of an AI under the current paradigm, barring a paradigm shifting discovery before that asymptote is reached.

physix · 16 days ago

For those who happen to have a subscription to The Economist, there is a very interesting Money Talks podcast where they interview Anthropic's boss Dario Amodei[1].

There were two interesting takeaways about AGI:

1. Dario makes the remark that the term AGI/ASI is very misleading and dangerous. These terms are ill defined and it's more useful to understand that the capabilities are simply growing exponentially at the moment. If you extrapolate that, he thinks it may just "eat the majority of the economy". I don't know if this is self-serving hype, and it's not clear where we will end up with all this, but it will be disruptive, no matter what.

2. The Economist moderators however note towards the end that this industry may well tend toward commoditization. At the moment these companies produce models that people want but others can't make. But as the chip making starts to hits its limits and the information space becomes completely harvested, capability-growth might taper off, and others will catch up. The quasi-monopoly profit potentials melting away.

Putting that together, I think that although the cognitive capabilities will most likely continue to accelerate, albeit not necessarily along the lines of AGI, the economics of all this will probably not lead to a winner takes all.

[1] https://www.economist.com/podcasts/2025/07/31/artificial-int...

didibus · 16 days ago

There's already so many comparable models, and even local models are starting to approach the performance of the bigger server models.

I also feel like, it's stopped being exponential already. I mean last few releases we've only seen marginal improvements. Even this release feels marginal, I'd say it feels more like a linear improvement.

That said, we could see a winner take all due to the high cost of copying. I do think we're already approaching something where it's mostly price and who released their models last. But the cost to train is huge, and at some point it won't make sense and maybe we'll be left with 2 big players.

nopinsight · 16 days ago

1. FWIW, I watched clips from several of Dario’s interviews. His expressions and body language convey sincere concerns.

2. Commoditization can be averted with access to proprietary data. This is why all of ChatGPT, Claude, and Gemini push for agents and permissions to access your private data sources now. They will not need to train on your data directly. Just adapting the models to work better with real-world, proprietary data will yield a powerful advantage over time.

Also, the current training paradigm utilizes RL much more extensively than in previous years and can help models to specialize in chosen domains.

SecretDreams · 16 days ago

It's insane to me that anyone doesn't think the end game of this is commoditization.

j_timberlake · 16 days ago

I think you're reading way too much into OpenAI bungling its 15-month product lead, but also the whole "1 AGI company will take off" prediction is bad anyway, because it assumes governments would just let that happen. Which they wouldn't, unless the company is really really sneaky or superintelligence happens in the blink of an eye.

torginus · 16 days ago

I think OpenAI has committed hard onto the 'product company' path, and will have a tough time going back to interesting science experiments that may and may not work, but are necessary for progress.

jacquesm · 16 days ago

Governments react at a glacial pace to new technological developments. They wouldn't so much as 'let it happen' as that it had happened and they simply never noticed it until it was too late. If you are betting on the government having your back in this then I think you may end up disappointed.

knodi123 · 16 days ago

* or governments fail to look far enough ahead, due to a bunch of small-minded short-sighted greedy petty fools.

Seriously, our government just announced it's slashing half a billion dollars in vaccine research because "vaccines are deadly and ineffective", and it fired a chief statistician because the president didn't like the numbers he calculated, and it ordered the destruction of two expensive satellites because they can observe politically inconvenient climate change. THOSE are the people you are trusting to keep an eye on the pace of development inside of private, secretive AGI companies?

highfrequency · 16 days ago

> OpenAI bungling its 15-month product lead

Do you mean from ChatGPT launch or o1 launch? Curious to get your take on how they bungled the lead and what they could have done differently to preserve it. Not having thought about it too much, it seems that with the combo of 1) massive hype required for fundraising, and 2) the fact that their product can be basically reverse engineered by training a model on its curated output, it would have been near impossible to maintain a large lead.

TheoGone · 16 days ago

LLMs are good at mimicking human intuition. Still sucks at deep thinking.

LLMs PATTERN MATCH well. Good at "fast" System 1 thinking, instantly generating intuitive, fluent responses.

LLMs are good at mimicking logic, not real reasoning. Simulate "slow," deliberate System 2 thinking when prompted to work step-by-step.

The core of an LLM is not understanding but just predicting the next most word in a sequence.

LLMs are good at both associative brainstorming (System 1) and creating works within a defined structure, like a poem (System 2).

Reasoning is the Achilles heel rn. AN LLM's logic can SEEM plausible, it's based on CORRELATION, NOT deductive reasoning.

Davidzheng · 16 days ago

correlation between text can implement any algorithm, it is just the architecture which it's built on. It's like saying vacuum tube computers can't reason bc it's just air not reasoning. What the architecture is doesn't matter. It's capable of expressing reasoning as it is capable of expression any program. In fact you can easily think of a turing machine and also any markov chain as a correlation function between two states which have joint distribution exactly at places where the second state is the next state of the first state.

noduerme · 16 days ago

Here's a pessimistic view: A hard take-off at this point might be entirely possible, but it would be like a small country with nuclear weapons launching an attack on a much more developed country without them. E.g. North Korea attacking South Korea. In such a situation an aggressor would wait to reveal anything until they had the power to obliterate everything ten times over.

If I were working in a job right now where I could see and guide and retrain these models daily, and realized I had a weapon of mass destruction on my hands that could War Games the Pentagon, I'd probably walk my discoveries back too. Knowing that an unbounded number of parallel discoveries were taking place.

It won't take AGI to take down our fragile democratic civilization premised on an informed electorate making decisions in their own interests. A flood of regurgitated LLM garbage is sufficient for that. But a scorched earth attack by AGI? Whoever has that horse in their stable will absolutely keep it locked up until the moment it's released.

jacquesm · 16 days ago

Pessimistic is just another way to spell 'realistic' in this case. None of these actors are doing it for the 'good of the world' despite their aggressive claims to the contrary.

kristianc · 16 days ago

What I'm seeing is that as we get closer to supposed AGI, the models themsleves are getting less and less general. They're getting in fact more specific and clustered around high value use cases. It's kind of hard to see in this context what AGI is meant to mean.

lamontcg · 16 days ago

> they can all basically solve moderately challenging math and coding problems

Yesterday, Claude Opus 4.1 failed in trying to figure out that `-(1-alpha)` or `-1+alpha` is the same as `alpha-1`.

We are still a little bit away from AGI.

markasoftware · 16 days ago

this is what i don't get. How can GPT-5 ace obscure AIME problems while simultaneously falling into the trap of the most common fallacy about airfoils (despite there being copious training data calling it out as a fallacy)? And I believe you that in some context it failed to understand this simple rearrangement of terms; there's sometimes basic stuff I ask it that it fails at too.

shruggedatlas · 16 days ago

Is this a specific example from their demo? I just tried it and Opus 4.1 is able to solve it.

dom96 · 16 days ago

It doesn't take a researcher to realise that we have hit a wall and hit it more than a year ago now. The fact all these models are clustering around the same performance proves it.

m4x · 16 days ago

It's quite possible that the models from different companies are clustering together now because we're at a plateau point in model development, and won't see much in terms in further advances until we make the next significant breakthrough.

I don't think this has anything to do with AGI. We aren't at AGI yet. We may be close or we may be a very long way away from AGI. Either way, current models are at a plateau and all the big players have more or less caught up with each other.

kmacdough · 16 days ago

What does AGI mean to you, specifically?

As is, AI is quite intelligent, in that it can process large quantities of diverse unstructured information and build meaningful insights. And that intelligence applies across an incredibly broad set of problems and contexts. Enough that I have a hard time not calling it general. Sure, it has major flaws that are obvious to us and it's much worse at many things we care about. But that's doesn't make it not intelligent or general. If we want to set human intelligence as the baseline, we already have a word for that: superintelligence.

coderatlarge · 16 days ago

while the model companies all compete on the same benchmarks it seems likely their models will all converge towards similar outcomes unless something really unexpected happens in model space around those limit points…

kenny239 · 4 days ago

not a researcher for long enough....but we are witnessing open source effort & Chinese models starting to fall one "level" behind the most advanced models, mainly due to a lack of compute i think...

on the other hand, there are still some flaws regarding GPT-5. for example, when i use it for research it often needs multiple prompts to get the topic i truly want and sometimes it can feed me false information. so the reasoning part is not fully there yet?

ako · 16 days ago

I know there's an official AGI definition, but it seem to me that there's too much focus on the model as the thing where AGI needs to happen. But that is just focusing on knowledge in the brain. No human knows everything. We as humans rely on a ways to discover new knowledge, investigation, writing knowledge down so it can be shared, etc.

Current models, when they apply reasoning, have feedback loops using tools to trial and error, and have a short term memory (context) or multiple short term memories if you use agents, and a long term memory (markdown, rag), they can solve problems that aren't hardcoded in their brain/model. And they can store these solutions in their long term memory for later use. Or for sharing with other LLM based systems.

AGI needs to come from a system that combines LLMs + tools + memory. And i've had situations where it felt like i was working with an AGI. The LLMs seem advanced enough as the kernel for an AGI system.

The real challenge is how are you going to give these AGIs a mission/goal that they can do rather independently and don't need constant hand-holding. How does it know that it's doing the right thing. The focus currently is on writing better specifications, but humans aren't very good at creating specs for things that are uncertain. We also learn from trial and error and this also influences specs.

porphyra · 16 days ago

It seems that the new tricks that people discover to slightly improve the model, be it a new reinforcement learning technique or whatever, get leaked/shared quickly to other companies and there really isn't a big moat. I would have thought that whoever is rich enough to afford tons of compute first would start pulling away from the rest but so far that doesn't seem to be the case --- even smaller players without as much compute are staying in the race.

atleastoptimal · 16 days ago

I think there are two competing factors. On one end, to get the same kind of "increase" in intelligence each generation requires an expontentially higher amount of compute, so while GPT-3 to GPT-4 was a sort of "pure" upgrade by just making it 10x bigger, gradually you lose the ability to just get 10x GPUs for a single model. The hill keeps getting steeper so progress is slower without exponential increases (which is what is happening).

However, I do believe that once the genuine AGI threshold is reached it may cause a change in that rate. My justification is that while current models have gone from a slightly good copywriter in GPT-4 to very good copywriter in GPT-5, they've gone from sub-exceptional in ML research to sub-exceptional in ML research.

The frontier in AI is driven by the top 0.1% of AI researchers. Since improvement in these models is driven partially by the very peaks of intelligence, it won't be until models reach that level where we start to see a new paradigm. Until then it's just scale and throwing whatever works at the GPU and seeing what comes out smarter.

aydyn · 16 days ago

I think this is simply due to the fact that to train an AGI-level AI currently requires almost grid scale amounts of compute. So the current limitation is purely physical hardware. No matter how intelligent GPT-5 is, it can't conjure extra compute out of thin air.

I think you'll see the prophesized exponentiation once AI can start training itself at reasonable scale. Right now its not possible.

caycep · 16 days ago

I feel like the benchmark suites need to include algorithmic efficiency. I.e can this thing solve your complex math or coding problem in 5000 gpus instead of 10000? 500? Maybe just 1 Mac mini?

nomel · 16 days ago

Why? Cost is the only thing anyone will care about.

bmau5 · 16 days ago

The idea is that with AGI it will then be able to self improve orders of magnitude faster than it would if relying on humans for making the advances. It tracks that the improvements are all relatively similar at this point since they're all human-reliant.

Buttons840 · 16 days ago

The idea of singularity--that AI will improve itself--is that it assumes intelligence is an important part of improving AI.

The AIs improve by gradient descent, still the same as ever. It's all basic math and a little calculus, and then making tiny tweaks to improve the model over and over and over.

There's not a lot of room for intelligence to improve upon this. Nobody sits down and thinks really hard, and the result of their intelligent thinking is a better model; no, the models improve because a computer continues doing basic loops over and over and over trillions of times.

That's my impression anyway. Would love to hear contrary views. In what ways can an AI actually improve itself?

mickael-kerjean · 16 days ago

I studied machine learning in 2012, gradient descent wasn't new back then either but it was 5 years before the "attention is all you need" paper. Progress might look continuous overall but if you zoom enough it might be a bit more discrete with breakthrough that must happen to jump the discrete parts, the question to me now is "How many papers like attention is all you need before a singularity?" I don't have that answer but let's not forget, until they released chat gpt, openAI was considered a joke by many people in the field who asserted their approach was a dead end.

tejohnso · 16 days ago

I think the expectation is that it will be very close until one team reaches beyond the threshold. Then even if that team is only one month ahead, they will always be one month ahead in terms of time to catch up, but in terms of performance at a particular time their lead will continue to extend. So users will use the winner's tools, or use tools that are inferior by many orders of magnitude.

This assumes an infinite potential for improvement though. It's also possible that the winner maxes out after threshold day plus one week, and then everyone hits the same limit within a relatively short time.

petralithic · 16 days ago

It's the classic S-curve. A few years ago when we saw ChatGPT come out, we got started on the ramping up part of the curve but now we're on the slowing down part. That's just how technology goes in general.

jboggan · 16 days ago

We are not approaching the Singularity but an Asymptote

jama211 · 16 days ago

Well said. It’s clearly plateauing. It could be a localised plateau, or something more fundamental. Time will tell.

rvnx · 16 days ago

It's a very long presentation just to say that GPT-5 is slightly improved compared to GPT-4o

Lerc · 16 days ago

>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite

That seems hardy surprising considering the condition to receive the benefit has not been met.

The person who lights a campfire first will become warmer than the rest, but while they are trying to light the fire the others are gathering firewood. So while nobody has a fire, those lagging are getting closer to having a fire.

torginus · 16 days ago

My personal belief is that we are moving past the hype and kind of starting to realize the true shape of what (LLM) AI can offer us, which is a darned lot, but still, it only works well when fed the right input and handled right - which is still a learning process ongoing on both sides - AI companies need to learn to train these things into user interaction loops that match people's workflows, and people need to learn how to use these tools better.

jona777than · 16 days ago

You have seemed to pinpoint where I believe a lot of opportunity lies during this era (however long it lasts.) Custom integration of these models into specific workflows of existing companies can make a significant difference in what’s possible for said companies, the smaller more local ones especially. If people can leverage even a small percentage of what these models are capable of, that may be all they need for their use case. In that case, they wouldn’t even need to learn to use these tools, but (much like electricity) they will just plug in or flip on the switch and be in business (no pun intended.)

radu_floricica · 16 days ago

The clustering you see is because they're all optimized for the same benchmarks. In the real world OpenAI is already ahead of the rest, and Grok doesn't even belong in the same group (not that it's not a remarkable achievement to start from scratch and have a working production model in 1-2 years, and integrate it with twitter in a way that works). And Google is Google - kinda hard for them not to be in the top, for now.

andreygrehov · 15 days ago

In my experience, Grok is miles ahead of ChatGPT. I canceled my OpenAI subscription in favor of Grok. I was one of the first OpenAI subscribers.

uoaei · 16 days ago

You can't reach the moon by climbing the tallest tree.

This misunderstanding is nothing more than the classic "logistic curves look like exponential curves at the beginning". All (Transformee-based, feedforward) AI development efforts are plateauing rapidly.

AI engineers know this plateau is there, but of course every AI business has a vested interest in overpromising in order to access more funding from naive investors.

gdiamos · 16 days ago

Scaling laws enabled an investment in capital and GPU R&D to deliver 10,000x faster training.

That took the wold from autocomplete to Claude and GPT.

Another 10,000x would do it again, but who has that kind of money or R&D breakthrough?

The way scaling laws work, 5,000x and 10,000x give a pretty similar result. So why is it surprising that competitors land in the same range? It seems hard enough to beat your competitor by 2x let alone 10,000x

willsmith72 · 16 days ago

But also, AI progress is non-linear. We're more likely to have an AI winter than AGI

brk · 16 days ago

AGI is so far away from happening that it is barely worth discussing at this stage.

lqstuart · 16 days ago

It’s frequently suggested by people with no background and/or a huge financial stake in the field

cchance · 16 days ago

They have to actually reach that threshold, right now their nudging forward catching up to one another, and based on the jumps we've seen the only one actually making huge jumps sadly is Grok, which i'm pretty sure is because they have 0 safety concerns and just run full tilt lol

netcan · 16 days ago

Its certainly an interesting race to watch.

Part of the fun is that predictions get tested on short enough timescales to "experience" in a satisfying way.

Idk where that puts me, in my guess at "hard takeoff." I was reserved/skeptical about hard takeoff all along.

Even if LLMs had improved at a faster rate... I still think bottlenecks are inevitable.

That said... I do expect progress to happen in spurts anyway. It makes sense that companies of similar competence and resources get to a similar place.

The winner take all thing is a little forced. "Race to singularity" is the fun, rhetorical version of the investment case. The implied boring case is facebook, adwords, aws, apple, msft... IE the modern tech sector tends to create singular big winners... and therefore our pre-revenue market cap should be $1trn.

tamimio · 16 days ago

Because AGI is a buzzword to milk more investors' money, it will never happen, and we will only see slight incremental updates or enhancements yet linear after some timr just like literally any tech bubble since dot com to smartphones to blockchain to others.

mritterhoff · 16 days ago

You think AGI is impossible? Why?

strongpigeon · 16 days ago

I think this is because of an expectation of a snowball effect once a model becomes able to improve itself. See talks about the Singularity.

I personally think it's a pretty reductive model for what intelligence is, but a lot of people seem to strongly believe in it.

econ · 16 days ago

People always say that when new technology comes along. Usually the best tech doesn't win. In fact, if you think you can build a company just by having a better offer it's better not to bother with it. There is to much else involved.

morpheos137 · 16 days ago

There is zero reason or evidence to believe AGI is close. In fact it is a good litmus test for someone's human intelligence whether they believe it.

What do you think AGI is?

How do we go from sentence composing chat bots to General Intelligence?

Is it even logical to talk about such a thing as abstract general intelligence when every form of intelligence we see in the real world is applied to specific goals as evolved behavioral technology refined through evolution?

When LLMs start undergoing spontaneous evolution then maybe it is nearer. But now they can't. Also there is so much more to intelligence than language. In fact many animals are shockingly intelligent but they can't regurgitate web scrapings.

quatonion · 16 days ago

I know right, if I didn't know any better one might think they are all customized versions of the same base model.

To be honest that is what you would want if you were digitally transforming the planet with AI.

You would want to start with a core so that all models share similar values in order they don't bicker etc, for negotiations, trade deals, logistics.

Would also save a lot of power so you don't have to train the models again and again, which would be quite laborious and expensive.

Rather each lab would take the current best and perform some tweak or add some magic sauce then feed it back into the master batch assuming it passed muster.

Share the work, globally for a shared global future.

At least that is what I would do.

mizzao · 16 days ago

I recently wrote a little post about this exact idea: https://parsnip.substack.com/p/models-arent-moats

throwmeaway222 · 16 days ago

AGI is either impossible over LLMs or is more of an agentic flow, which means we might already be there, but the LLM is too slow and/or expensive for us to consider AGI feasible over agents.

AGI over LLMs is basically 1 billion tokens for AI to answer the question: how do you feel? and a response of "fine"

Because it would mean it's simulating everything in the world over an agentic flow considering all possible options checking memory checking the weather checking the news... activating emotional agentic subsystems, checking state... saving state...

belter · 16 days ago

Nobody seems to be on the path to AGI as long as the model of today is as good as the model of tomorrow. And as long as there are "releases". You don't release a new human every few months...LLMs are currently frozen sequence predictors whose static weights stop learning after training.

They lack writable long-term memory beyond a context window. They operate without any grounded perception-action loop to test hypotheses. And they possess no executive layer for goal directed planning or self reflection...

Achieving AGI demands continuous online learning with consolidation.

lisperforlife · 16 days ago

I don't think models are fundamentally getting better. What is happening is that we are increasing the training set, so when users use it, they are essentially testing on the training set and find that it fits their data and expectations really well. However, the moat is primarily the training data, and that is very hard to protect as the same data can be synthesized with these models. There is more innovation surrounding serving strategies and infrastructure than in the fundamental model architectures.

SkyMarshal · 15 days ago

The inflection point is recursive self-improvement. Once an AI achieves that, and I mean really achieves it - where it can start developing and deploying novel solutions to deep problems that currently bottleneck its own capabilities - that's where one would suddenly leap out in front of the pack and then begin extending its lead. Nobody's there yet though, so their performance is clustering around an asymptotic limit of what LLMs are capable of.

dvfjsdhgfv · 16 days ago

> It's frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

This argument has so many weak points it deserves a separate article.

Cthulhu_ · 16 days ago

> Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

I wonder if that's because they have a lot of overlap in learning sets, algorithms used, but more importantly, whether they use the same benchmarks and optimize for them.

As the saying goes, once a metric (or benchmark score in this case) becomes a target, it ceases to be a valuable metric.

Deleted Comment

jeffnappi · 16 days ago

We have no idea what AGI might look like, for example entirely possible that if/when that threshold is reached it will be power/compute constrained in such a way that it's impact is softened. My expectation is that open models will eventually meet or exceed the capability of proprietary models and to a degree that has already happened.

It's the systems around the models where the proprietary value lies.

logicchains · 16 days ago

>It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together

It's natural if you extrapolate from training loss curves; a training process with continually diminishing returns to more training/data is generally not something that suddenly starts producing exponentially bigger improvements.

rco8786 · 16 days ago

They’re all clustered together because they’re asymptotically approaching the same local maxima, not getting closer to anything resembling “AGI”

nextlevelwizard · 16 days ago

Is it?

Nothing we have is anywhere near AGI and as models age others can copy them.

I personally think we are closing the end of improvement for LLMs with current methods. We have consumed all of the readily available data already, so there is no more good quality training material left. We either need new novel approaches or hope that if enough compute is thrown at training actual intelligence will spontaneously emerge.

flockonus · 16 days ago

If we're focusing on fast take-off scenario, this isn't a good trend to focus on.

SGI would be self-improving to some function with a shape close to linear based on the amount of time & resources. That's almost exclusively dependent on the software design, as currently transformers have shown to hit a wall at logarithmic progression x resources.

In other words, no, it has little to do with the commercial race.

babypuncher · 16 days ago

I would argue that this is because we are reaching the practical limits of this technology and AGI isn't nearly as close as people thought.

malshe · 16 days ago

> as time goes on and the models get better, the performance of the different company's gets clustered closer together

This could be partly due to normative isomorphism[1] according to the institutional theory. There is also a lot of movement of the same folks between these companies.

[1] https://youtu.be/VvaAnva109s

ants_everywhere · 16 days ago

The race has always been very close IMO. What Google had internally before ChatGPT first came out was mind blowing. ChatGPT was a let down comparatively (to me personally anyway).

Since then they've been about neck and neck with some models making different tradeoffs.

Nobody needs to reach AGI to take off. They just need to bankrupt their competitors since they're all spending so much money.

TheoGone · 10 days ago

Part of it is the top LLM companies (OpenAI, Mistral) all copy and over train, often against e.g. Claude's or DeepSeek's TOS, on each other's models.

general1726 · 16 days ago

Because they are hitting Compute Efficient Frontier. Models can't be much bigger, there is no more original data on the internet, so all models will eventually cluster to similar CEF as was described in this video 10 months ago

https://www.youtube.com/watch?v=5eqRuVp65eY

fdsjgfklsfd · 16 days ago

I think they're just reaching the limits of this architecture and when a new type is invented it will be a much bigger step.

hodgehog11 · 16 days ago

Working in the theory, I can say this is incredibly unlikely. At scale, once appropriately trained, all architectures begin to converge in performance.

It's not architectures that matter anymore, it's unlocking new objectives and modalities that open another axis to scale on.

koonsolo · 16 days ago

This confirms my suspicion that we are not at the exponential part of the curve, but the flattening one. It's easier to stay close to your competitors when everyone is at the flat curve of the innovation.

The improvements they make are marginal. How long until the next AI breakthrough? Who can tell? Because last time it took decenia.

chasd00 · 16 days ago

I think the breakthroughs now will be the application of LLMs to the rest of the world. Discovering use cases where LLMs really shine and applying them while learning and sharing the use cases where they do not.

causality0 · 16 days ago

Mental-modeling is one of the huge gaps in AI performance right now in my opinion. I could describe in detail a very strange object or situation to a human being with a pen and paper and then ask them questions about it and expect answers that meet all my described constraints. AI just isn't good for that yet.

xpe · 16 days ago

> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

That's only one part of it. Some forecasters put probabilities on each of the four quadrants in the takeoff speed (fast or slow) vs. power distribution (unipolar or multipolar) table.

neehao · 11 days ago

three points: 1. i have often wondered about whether rapid tech. progress makes underinvestment more likely.

2. ben evans frequently makes fun of the business value. pretty clear a lot of the models are commodotized.

3. strategically, the winners are platforms where the data are. if you have data in azure, that's where you will use your models. exclusive licensing could pull people to your cloud from on prem. so some gains may go to those companies ...

williamtrask · 16 days ago

Breakthroughs usually require a step-function change in data or compute. All the firms have proportional amounts. Next big jump in data is probably private data (either via de-siloing or robotics or both). Next big jump in compute is probably either analog computing or quantum. Until then... here we are.

FiniteIntegral · 16 days ago

I think part of this is due to the AI craze no longer being in the wildest west possible. Investors, or at least heads of companies believe in this as a viable economic engine so they are properly investing in what's there. Or at least, the hype hasn't slapped them in the face just yet.

germandiago · 16 days ago

Is AGI even possible? I am skeptical of that. I think they can get really good at many tasks and when used by a human expert in a field you can save lots of time and supervise and change things here and there, like sculpting.

But I doubt we will ever see a fully autonomous, reliable AGI system.

xedrac · 16 days ago

Ultimately, what drives human creativity? I'd say it's at least partially rooted in emotion and desire. Desire to live more comfortably; fear of failure or death; desire for power/influence, etc... AI is void of these things, and thus I believe we will never truly reach AGI.

Zambyte · 16 days ago

No, AGI is not possible. It is perpetually defined as just beyond current capabilities.

citizenpaul · 16 days ago

Even at the beginning of the year people were still going crazy over new model releases. Now the various model update pages are starting to average times in the months since their last update rather than days/weeks. This is across the board. Not limited to a single model.

johnnienaked · 16 days ago

These companies are racing headlong into competitive equilibrium for a product yet to be identified.

dmezzetti · 16 days ago

LLMs are basically all the same at this point. The margins are razor thin.

The real take-off / winner-take-all potential is in retrieval and knowing how to provide the best possible data to the LLM. That strategy will work regardless of the model.

jasonwilk · 16 days ago

How marginally better was Google than Yahoo when debuted? If one can develop AGI first within X timeline ahead of competitors, that alone could develop a moat for a mass market consumer product even if others get to parity .

smiley1437 · 16 days ago

Google was not marginally better Yahoo, their implementation of Markov chains in the PageRank algorithm was significantly better than Yahoo or any other contemporary search engine.

It's not obvious if a similar breakthrough could occur in AI

sylware · 16 days ago

LLMs won't probably be the models for "super intelligence".

But nowdays, how corpos can "justify" their R&D to spend gigantic amount of resources (time + hardware + energy) in models which are not LLMs?

verytrivial · 16 days ago

Well, it is perhaps frequently suggested by those Ai firms raising capital that once one of the Ai companies reaches an AGI threshhold ... It as rallying call. "Place your bets, gentlemen!"

TheoGone · 11 days ago

Part of it is they all copy and over train, often against the TOS, on each other's models.

darepublic · 16 days ago

What is the AGI threshold? That the model can manage its own self improvement better than humans can? Then the roles will be reversed -- LLM prompting the meat machines to pave its way.

mirekrusin · 16 days ago

Diversity where new model release takes the crown until next release is healthy. Shame only US companies seem to be doing it, hopefully this will change as the rest is not far off.

wouldbecouldbe · 16 days ago

It's all based on the theory of singularity. Where the AI can start trainig & relearning itself. But it looks like that's not possible with the current techniques.

menzoic · 16 days ago

The idea is that AGI will be able to self improve at an exponential rate. This is where the idea of take off comes from. That self improvement part isn’t happening today.

42lux · 16 days ago

If one achieves AGI and releases it everyone has AGI...

torginus · 16 days ago

Honestly for all the super smart people in the LessWrong singularity crowd, I feel the mental model they apply to the 'singularity' is incredibly dogmatic and crude, with the basic assumption that once a certain threshold is reached by scaling training and compute, we get human or superhuman level intelligence.

Even if we run with the assumption that LLMs can become human-level AI researchers, and are able to devise and run experiments to improve themselves, even then the runaway singularity assumption might not hold. Let's say Company A has this LLM, while company B does not.

- The automated AI researcher, like its human peers, still needs to test the ideas and run experiments, it might happen that testing (meaning compute) is the bottleneck, not the ideas, so Company A has no real advantage.

- It might also happen that AI training has some fundamental compute limit coming from information theory, analogous to the Shannon limit, and once again, more efficient compute can only approach this, not overcome it

m463 · 16 days ago

I kind of (naively?) hope that with robust competition, it will be like airlines or movie companies, where there are lots of players.

lasc4r · 16 days ago

These companies seem to think AGI will come from better LLMs, seems more like an AGI dead end that's plateaued to me.

louismerlin · 16 days ago

We joked yesterday with a colleague that it feels like the top AI companies are using the same white label backend.

eldenring · 16 days ago

A more powerful ASI, the market, is keeping everything in check. Meta's 10 figure offers are an example of this.

hoppp · 16 days ago

AGI will more probably come from google deepmind with a genie model that looks like the matrix moves already

klik99 · 16 days ago

I’ve been saying for a while if AGI is possible it’s going to take another innovation and the transformer / LLM paradigm will plateau, and innovations are hard to time. I used to get downvoted for saying that years ago and now more people are realizing it. LLMs are awesome but there is a limit, most of the interesting things in the next years will be bolting more functionality and agent stuff, introspection like Anthropic is working on and smaller, less compute hungry specialized models. There’s still a lot to explore in this paradigm, but we’re getting diminishing returns on newer models, especially when you factor in cost

BizarroLand · 16 days ago

I bet that it will only happen when the ability to process and concrete new information into its training model without retraining the entire model is standard, AND when multiple AIs with slightly different datasets are set to work together to create a consensus response approach.

It's probably never going to work with a single process without consuming the resources of the entire planet to run that process on.

zaphirplane · 16 days ago

Cats and dogs kind of also cluster together with a couple of exceptions relative to humans ;)

coldtea · 16 days ago

>It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

Both the AGI threshold with LLM architecture, and the idea of self-advancing AI, is pie in the sky, at least for now. These are myths of the rationalist cult.

We'd more likely see reduced returns and smaller jumps between version updates, plus regression from all the LLM produced slop that will be part of the future data.

de6u99er · 16 days ago

This is just more of the same. My guts tell me Deepmind will crack AGI.

jtfrench · 16 days ago

My gut says similar. They've been on a roll. Genie 3 looks pretty wild.

felineflock · 16 days ago

Plot twist - once GPT reached AGI, this is exactly the strategy chosen for self-preservation. Appear to not lead by too much, only enough to make everyone think we're in a close race, play dumb when needed.

Meanwhile, keep all relevant preparations in secret...

jjk166 · 16 days ago

“If the humans see me actually doing my job, it helps keep suspicions from forming about faulty governor modules.”

grey-area · 16 days ago

Perhaps they’ve just reached the limit of what LLMs can achieve?

m3kw9 · 16 days ago

Because it hasn’t taken off yet as they all get to catch up

newsclues · 16 days ago

We don’t seem to be closer to AGI however.

KoolKat23 · 16 days ago

In my opinion, it'll mirror the human world, there is place for multiple different intelligent models. Each with their own slightly different strengths/personalities. I mean there are plenty of humans that can do the same task but at the upper tier, multiple smart humans working together are needed to solve problems as they bring something different to the table. I don't see why this won't be the case with super intelligence at the cutting edge. A little bit of randomness and slightly different point of view makes a difference. The exact same two models doesn't help as one would already have thought of what the other was thinking already

aldousd666 · 16 days ago

so everyone is saying 'This can't be AGI because it isn't recursively self improving itself' or 'we haven't yet solved all the worlds chemistry and science yet'.. but they're missing the point. Those problems aren't just waiting for humans to have more brain power. We actually have to do the experiments using real physical resources that aren't available to any models. So, while I don't believe we have necessarily reached AGI yet, the 'lack of taking over' or 'solving everything' is not evidence for it.

vrighter · 16 days ago

they are improving exponentially... but the exponent is less than 1...

andrepd · 16 days ago

> once one of the AI companies reaches an AGI threshold

Why is this even an axiom, that this has to happen and it's just a matter of time?

I don't see any credible argument for the path LLM -> AGI, in fact given the slowdown in enhancement rate over the past 3 years of LLMs, despite the unprecedented firehose of trillions of dollars being sunk into them, I think it points to the contrary!

jbs789 · 16 days ago

Very well said.

arnorhs · 16 days ago

Meanwhile - I always just find myself arguing with every model while they ruthlessly try to gaslight me into believing whatever they are halucinating.

I have a had a bunch of positive experiences as well, but when it goes bad, it goes so horribly bad and off the rails.

shortrounddev2 · 16 days ago

Maybe because they haven't created an engine for AGI, but a really really impressive bullshit generator.

yieldcrv · 16 days ago

They use each other for synthesizing data sets. The only moat was the initial access to human generated data in hard to reach places. Now they use each other to reach parity for the most part.

I think user experience and pricing models are the best here. Right now everyone’s just passing down costs as they come, no real loss leaders except a free tier. I looked at reviews of some of various wrappers on app stores, people say “I hate that I have to pay for each generation and not know what I’m doing to get”, market would like a service priced very differently. Is it economical? Many will fail, one will succeed. People will copy the model of that one.

Dead Comment

hodgehog11 · 16 days ago

It's still not necessarily wrong, just unlikely. Once these developers start using the model to update itself, beyond an unknown threshold of capability, one model could start to skyrocket in performance above the rest. We're not in that phase yet, but judging from what the devs at the end were saying, we're getting uncomfortably (and irresponsibly) close.