I have a feeling LLMs could probably self improve up to a point with current capacity, then hit some kind of wall where current research is also bottle necked. I don’t think they can yet self improve exponentially without human intuition yet , and the results of this paper seem to support this conclusion as well.
Just like an LLM can vibe code a great toy app, I don’t think an LLM can come to close to producing and maintaining production ready code anytime soon. I think the same is true for iterating on thinking machines
> I don’t think they can yet self improve exponentially without human intuition yet
I agree: if they could, they would be doing it already.
Case in point: one of the first things done once ChatGPT started getting popular was "auto-gpt"; roughly, let it loose and see what happens.
The same thing will happen to any accessible model in the future. Someone, somewhere will ask it to self-improve/make as much money as possible, with as little leashes as possible. Maybe even the labs themselves do that, as part of their post-training ops for new models.
Therefore, we can assume that if the existing models _could_ be doing that, they _would_ be doing that.
That doesn't say anything about new models released 6 months or 2 years from now.
Note that this isn't improving the LLM itself, but the software glue around it (i.e. agentic loops, tools, etc). The fact that using the same LLM got ~20% increase on the aider leaderboard speaks more about aider as a collection of software glue, than it does about the model.
I do wonder though if big labs are running this with model training episodes as well.
Don't take this the wrong way, your opinion is also vibes.
Let's ground that a bit.
Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.
Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.
Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.
ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.
There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.
Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).
Edit - added "(as we know it)". And fixed missing word.
The proof they are not "smart" in the way intelligence is normally defined, is that the models need to "read" all the books in the world. To perform at a level close to an expert on the domain, who read just two or three of the most representative books on his own domain.
We will be on the way to AGI when your model can learn Python just by reading the Python docs...Once...
The wall is training data. An AI can't produce its own training data because an AI can't be smarter than its own training data. This is a well known regression problem and one I personally believe is not solvable. (A softer assertion would be: it's not solvable with current technology.)
I use to think this but no one I have read believes data is the problem.
Amodei explains that if data, model size and compute scale up linearly, then the reaction happens.
I don't understand why data wouldn't be a problem but it seems like if it was, we would have ran into this problem already and it has already been overcome with synthetic data.
I don't have the link on hand, but people have already proven that LLMs can both generate new problems for themselves and train on them. Not sure why it would be surprising though - we do it all the time ourselves.
> I don’t think they can yet self improve exponentially without human intuition yet
Even if they had human level intuition, they wouldn't be able to improve exponentially without human money, and they would need an exponentially growing amount of it to do so.
Ai code assistants have some peculiar problems. They often fall into loops and errors of perception.
They can’t reason about high level architecture well. They will often flip flop between two possible ways of doing things.
It’s possible that good coding rules might help, but I expect they will have weird rabbit hole errors.
That being said they can write thousands of lines an hour and can probably do things that would be impossible for a human. (Imagine having the LLM skip code and spit out compiled binaries as one example)
Historically learning and AI systems, if you plug the output into the input (more or less), spiral off into lala land.
I think this happens with humans in places like social media echo chambers (or parts of academia) when they talk and talk and talk a whole lot without contact with any outer reality. It can be a source of creativity but also madness and insane ideas.
I’m quite firmly on the side of learning requiring either direct or indirect (informed by others) embodiment, or at least access to something outside. I don’t think a closed system can learn, and I suspect that this may reflect the fact that entropy increases in a closed system (second law).
As I said recently in another thread, I think self contemplating self improving “foom” AI scenarios are proposing informatic perpetual motion or infinite energy machines.
Not wrong, but it's been said that a videoclip of an apple falling on Newton's head is technically enough information to infer the theory of relativity. You don't need a lot of grass, with a well-ordered mind.
I agree , it might incrementally optimize itself very well, but i think for now at least anything super innovative will still come from a human that can think beyond a few steps.
There are surely far better possible architectures, training methods etc that would initially lead to worse performance if approached stepwise.
what is there to improve? the transformer architecture is extremely simple. you gonna add another kv layer? you gonna tweak the nonlinearities? you gonna add 1 to one of the dimensions? you gonna inject a weird layer (which could have been in the weights anyways due to kolmogorov theorem)?
realistically the best you could do is evolve the prompt. maybe you could change input data preprocessing?
anyways the idea of current llm architectures self-improving via its own code seems silly as there are surprisingly few knobs to turn, and it's ~super expensive to train.
as a side note it's impressive how resistant the current architecture is to incremental RL away from results, since if even one "undesired input" result is multiple tokens, the coupling between the tokens is difficult to disentangle. (how do you separate jinping from jin-gitaxias for example)
It's radically different from human improvement. Imagine if you were handed a notebook with a bunch of writing that abruptly ends. You're asked to read it and then write one more word. Then you have a bout of amnesia and you go back to the beginning with no knowledge of the notebook's contents, and the cycle repeats. That's what LLMs do, just really fast.
You could still accomplish some things this way. You could even "improve" by leaving information in the notebook for your future self to see. But you could never "learn" anything bigger than what fits into the notebook. You could tell your future self about a new technique for finding integrals, but you couldn't learn calculus.
Can't find the reference now, but remember reading an article on evolving FPGA designs. The found optimum however only worked on the specific FPGA it was evolved on, since the algo had started to use some out-of-spec "features" of the specific chip. Obviously that can be fixed with proper constraints, but seems like a trap that could be stepped into again - i.e. the LLM is now really fast but only on GPUs that come from the same batch of wafers.
This is where it networks itself into a hive mind with each AI node specializing in some task or function networked with hyper speed data buses. Humans do the same both within their own brains and as cohesive teams, who cross check and validate each other. At some point it becomes self aware.
Well LLMs are not capable of coming up with new paradigms or solve problems in a novel way, just efficiently do what's already be done or apply already found solutions, so they might be able to come up with improvements that have been missed by it's programmers but nothing that outside of our current understanding
I've built a coding assistant over the last two days. The first 100 lines or so were handwritten. The rest has been written by the assistant itself.
It's written its system prompt. It's written its tools. Its written the code to reload the improved tools into itself.
And it knows it is working on itself - it frequently tries to use the enhanced functionality, and then expresses what in a human would be frustration at not having immediate access.
Once by trying to use ps to find its own pid in an apparent attempt to find a way to reload itself (that's the reason it gå before trying to run ps, anyway)
All its commits are now authored by the tool, including the commit messages. It needs to be good, and convincing, and having run the linter and the test suite for me to let it commit, but I agree a substantial majority of the time. It's only caused regressions once or twice.
A bit more scaffolding to trigger an automatic rollback in the case of failure and giving it access to a model I won't be charged by the token for, and I'd be tempted to let it out of the box, so to speak.
Today it wrote its own plan for what to add next. I then only told it to execute it.
A minor separate goal oriented layer guiding the planning, and it could run in a loop.
Odds are it'd run off the rails pretty quickly, but I kinda want to see how far it gets.
It's talking to a model over an API. Currently using Claude. Certainly would not be reasonable to do from scratch. The basic starting point to make a coding assistant is basically reading text from the user, feeding it to the model over the API, and giving it a couple of basic tools. Really the models can handle starting with just the ability to execute shell commands (with confirmation, unless you're braver than me), and from that you can bootstrap by asking it to suggest and write additional tools for itself.
Problem:
1) we want to train on GitHub repos
2) most datasets are spoiled. Training on GitHub would definitely spoil
Solution:
Hand write new problems!!!
... leetcode style ....
... and we'll check if it passes test
Example:
What's the decimal part of this float?
Surely in all of GitHub such code doesn't exist!
Sure in all of GitHub we can filter such code out by ngram!
Maybe my favorite part is that it has 60 authors and became the de facto benchmark for awhile
I find the thing really missing from current crop of AI systems is continuous retraining with short feedback loops. Sounds expensive to be sure, but it seems like what biological systems do naturally. But would be pretty awesome to watch happen
It’s more like a nightly training, isn’t it? IIUC the human brain learns from its experiences while it’s asleep, so it might be kind of like taking things out of context windows and fine tuning on them every night.
Correct and working on it. You can take the approach of mixed experts and train the network in chunks that share known interfaces over which they communicate results. These chunks can be trained on their own, but you cannot have a set training set here.
Then if you go further and alter the architecture by introducing clean category theory morphisms and build from there you can have a dynamic network - but you will still have to retrain this network every time you change the structure.
You can spin this further and know the need for a real-world training set and a loss function that will have to competete against other networks. In the end a human brain is already best at this and embodied in the real world.
What i want to add here is that our neurons not take in weights - they also fire depending on whether one input comes after another or before and differs down to the nanoseconds here - unmatched in IT and ofc heaps more efficient.
I still would say its possible though and currently work on 4D lifeforms built on dynamic compute graphs that can do this in a set virtual environment.
So this is pretty awesome stuff, but its a long fetch from anything we do right now.
Model weights are code, for a dive into that see [0]. That shows how to encode Boolean logic using NAND gates in an MLP.
The expressivity is there, the only question is how to encode useful functions into those weights, especially when we don’t know how to write those functions by hand.
If it can generate the model (from training data) then presumably that'd be fine, but the iteration time would be huge and expensive enough to be currently impractical.
Or yeah if it can modify its own weights sensibly, which feels ... impossible really.
To be fair, go back five years and most of the LLM stuff seemed impossible. Maybe with LoRA (Low-rank adaptation) and some imagination, in another five years self-improving models will be the new normal.
The size and cost are easily solvable. Load the software and hardware into a space probe, along with enough solar panels to power it. Include some magnets, copper, and sand for future manufacturing needs, as well as a couple electric motors and cameras so it can bootstrap itself.
In a couple thousand years it'll return to Earth and either destroy us or solve all humanity's problems (maybe both).
Why is modifying weights sensibly impossible? Is it because a modification's "sensibility" is measurable only post facto, and we can have no confidence in any weight-based hypothesis?
I'm surprised they still hold out hope that this kind of mechanism could ultimately help with AI safety, when they already observed how the reward-hacking safeguard was itself duly reward-hacked. Predictably so, or at least it is to me, after getting a very enlightening introduction to AI safety via Rob Miles' brilliant youtube videos on the subject. See for example https://youtu.be/0pgEMWy70Qk
"We did notice, and documented in our paper, instances when the DGM hacked its reward function.. To see if DGM could fix this issue.. We created a “tool use hallucination” reward function.. in some cases, it removed the markers we use in the reward function to detect hallucination (despite our explicit instruction not to do so), hacking our hallucination detection function to report false successes."
So, empirical evidence of theoretically postulated phenomena. Seems unsurprising.
Reward hacking is a well known and tracked problem at frontier labs - Claude 4’s system card reports on it for instance. It’s not surprising that a framework built on current llms would have reward hacking tendencies.
For this part of the stack the interesting question to me is how to identify and mitigate.
Just like an LLM can vibe code a great toy app, I don’t think an LLM can come to close to producing and maintaining production ready code anytime soon. I think the same is true for iterating on thinking machines
I agree: if they could, they would be doing it already.
Case in point: one of the first things done once ChatGPT started getting popular was "auto-gpt"; roughly, let it loose and see what happens.
The same thing will happen to any accessible model in the future. Someone, somewhere will ask it to self-improve/make as much money as possible, with as little leashes as possible. Maybe even the labs themselves do that, as part of their post-training ops for new models.
Therefore, we can assume that if the existing models _could_ be doing that, they _would_ be doing that.
That doesn't say anything about new models released 6 months or 2 years from now.
I do wonder though if big labs are running this with model training episodes as well.
Let's ground that a bit.
Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.
Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.
Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.
ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.
There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.
Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).
Edit - added "(as we know it)". And fixed missing word.
As long as AI is guessing answers based on what it has seen before, it's not happening.
I'm sorry. It doesn't matter how many bazillions you would cash in if it did, still not happening.
It's all wishful thinking.
And more to do with "fluid, adaptable intelligence, that learns on the fly"
Saving this. One less overconfident AI zealot, the better.
We will be on the way to AGI when your model can learn Python just by reading the Python docs...Once...
Amodei explains that if data, model size and compute scale up linearly, then the reaction happens.
I don't understand why data wouldn't be a problem but it seems like if it was, we would have ran into this problem already and it has already been overcome with synthetic data.
I'm not sure how much an agent could do though given the right tools. access to a task mgt system, test tracker. robust requirements/use cases.
That's probably the next big breakthrough
Even if they had human level intuition, they wouldn't be able to improve exponentially without human money, and they would need an exponentially growing amount of it to do so.
That being said they can write thousands of lines an hour and can probably do things that would be impossible for a human. (Imagine having the LLM skip code and spit out compiled binaries as one example)
I think this happens with humans in places like social media echo chambers (or parts of academia) when they talk and talk and talk a whole lot without contact with any outer reality. It can be a source of creativity but also madness and insane ideas.
I’m quite firmly on the side of learning requiring either direct or indirect (informed by others) embodiment, or at least access to something outside. I don’t think a closed system can learn, and I suspect that this may reflect the fact that entropy increases in a closed system (second law).
As I said recently in another thread, I think self contemplating self improving “foom” AI scenarios are proposing informatic perpetual motion or infinite energy machines.
Everything has to “touch grass.”
Not wrong, but it's been said that a videoclip of an apple falling on Newton's head is technically enough information to infer the theory of relativity. You don't need a lot of grass, with a well-ordered mind.
Who is claiming anything can self improve exponentially?
Oh, this part is taking too long, let's replace it with an empty function.
Oh wait, now it's not working, let's add the function.
Oh, this part is taking too long...
It would be hilarious if this world wasn't full of idiots.
realistically the best you could do is evolve the prompt. maybe you could change input data preprocessing?
anyways the idea of current llm architectures self-improving via its own code seems silly as there are surprisingly few knobs to turn, and it's ~super expensive to train.
as a side note it's impressive how resistant the current architecture is to incremental RL away from results, since if even one "undesired input" result is multiple tokens, the coupling between the tokens is difficult to disentangle. (how do you separate jinping from jin-gitaxias for example)
It’s not far off from human improvement. Our improvement is limited to what we can remember as well.
We go a bit further in the sense that the neural network itself can grow new modules.
You could still accomplish some things this way. You could even "improve" by leaving information in the notebook for your future self to see. But you could never "learn" anything bigger than what fits into the notebook. You could tell your future self about a new technique for finding integrals, but you couldn't learn calculus.
This is where you lost me.
Always the same supernatural beliefs, not even an attempt of an explanation in sight.
It's written its system prompt. It's written its tools. Its written the code to reload the improved tools into itself.
And it knows it is working on itself - it frequently tries to use the enhanced functionality, and then expresses what in a human would be frustration at not having immediate access.
Once by trying to use ps to find its own pid in an apparent attempt to find a way to reload itself (that's the reason it gå before trying to run ps, anyway)
All its commits are now authored by the tool, including the commit messages. It needs to be good, and convincing, and having run the linter and the test suite for me to let it commit, but I agree a substantial majority of the time. It's only caused regressions once or twice.
A bit more scaffolding to trigger an automatic rollback in the case of failure and giving it access to a model I won't be charged by the token for, and I'd be tempted to let it out of the box, so to speak.
Today it wrote its own plan for what to add next. I then only told it to execute it.
A minor separate goal oriented layer guiding the planning, and it could run in a loop.
Odds are it'd run off the rails pretty quickly, but I kinda want to see how far it gets.
One of the examples in the dataset they took from
https://github.com/pvlib/pvlib-python/issues/1028
What the AI is expected to do
https://github.com/pvlib/pvlib-python/pull/1181/commits/89d2...
Make your own mind about the test.
Sure in all of GitHub we can filter such code out by ngram!
Maybe my favorite part is that it has 60 authors and became the de facto benchmark for awhile
Then if you go further and alter the architecture by introducing clean category theory morphisms and build from there you can have a dynamic network - but you will still have to retrain this network every time you change the structure.
You can spin this further and know the need for a real-world training set and a loss function that will have to competete against other networks. In the end a human brain is already best at this and embodied in the real world.
What i want to add here is that our neurons not take in weights - they also fire depending on whether one input comes after another or before and differs down to the nanoseconds here - unmatched in IT and ofc heaps more efficient.
I still would say its possible though and currently work on 4D lifeforms built on dynamic compute graphs that can do this in a set virtual environment.
So this is pretty awesome stuff, but its a long fetch from anything we do right now.
The expressivity is there, the only question is how to encode useful functions into those weights, especially when we don’t know how to write those functions by hand.
[0] http://neuralnetworksanddeeplearning.com/chap1.html
Or yeah if it can modify its own weights sensibly, which feels ... impossible really.
To be fair, go back five years and most of the LLM stuff seemed impossible. Maybe with LoRA (Low-rank adaptation) and some imagination, in another five years self-improving models will be the new normal.
In a couple thousand years it'll return to Earth and either destroy us or solve all humanity's problems (maybe both).
What's the difference?
Give it some serious thought. Challenge whichever answer you come up with. I guarantee this will be trickier than you think
So, empirical evidence of theoretically postulated phenomena. Seems unsurprising.
For this part of the stack the interesting question to me is how to identify and mitigate.
"A single run of the DGM on SWE-bench...takes about 2 weeks and incurs significant API costs." ($22,000)