Andrej Karpathy – It will take a decade to work through the issues with agents

>What takes the long amount of time and the way to think about it is that it’s a march of nines. Every single nine is a constant amount of work. Every single nine is the same amount of work. When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

I think this is an important way of understanding AI progress. Capability improvements often look exponential on a particular fixed benchmark, but the difficulty of the next step up is also often exponential, and so you get net linear improvement with a wider perspective.

ekjhgkejhgk · 2 months ago

The interview which I've watched recently with Rich Sutton left me with the impression that AGI is not just a matter of adding more 9s.

The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to udnerstand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.

LarsDu88 · 2 months ago

This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.

In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.

Animal brains such as our own have evolved to compress information about our world to aide in survival. LLMs and recent diffusion/conditional flow matching models have been quite successful in compressing the "text world" and the "pixel world" to score good loss metrics on training data.

It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.

tyre · 2 months ago

There is some evidence from Anthropic that LLMs do model the world. This paper[0] tracing their "thought" is fascinating. Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.

It does have clusters of parameters that correlate with concepts, not just randomly "after X word tends to have Y word." Otherwise you would expect all of Chinese to be grouped in one place, all of French in another, all of English in another. This is empirically not the case.

I don't know whether to understand knowledge you have to have a model of the world, but at least as far as language, LLMs very much do seem to have modeling.

[0]: https://www.anthropic.com/research/tracing-thoughts-language...

godelski · 2 months ago

  > that to understand knowledge you have to have a model of the world.

You have a small but important mistake. It's to recite (or even apply) knowledge. To understand does actually require a world model.

Think of it this way: can you pass a test without understanding the test material? Certainly we all saw people we thought were idiots do well in class while we've also seen people we thought were geniuses fail. The test and understanding usually correlates but it's not perfect, right?

The reason I say understanding requires a world model (and I would not say LLMs understand) is because to understand you have to be able to detail things. Look at physics, or the far more detail oriented math. Physicists don't conclude things just off of experimental results. It's an important part, but not the whole story. They also write equations, ones which are counterfactual. You can call this compression if you want (I would and do), but it's only that because of the generalization. But it also only has that power because of the details and nuance.

With AI many of these people have been screaming for years (check my history) that what we're doing won't get us all the way there. Not because we want to stop the progress, but because we wanted to ensure continued and accelerate progress. We knew the limits and were saying "let's try to get ahead of this problem" but were told "that'll never be a problem. And if it is, we'll deal with it when we deal with it." It's why Chollet made the claim that LLMs have actually held AI progress back. Because the story that was sold was "AGI is solved, we just need to scale" (i.e. more money). I do still wonder how different things would be if those of us pushing back were able to continue and scale our works (research isn't free, so yes, people did stop us). We always had the math to show that scale wasn't enough, but it's easy to say "you don't need math" when you can see progress. The math never said no progress nor no acceleration, the math said there's a wall and it's easier to adjust now than when we're closer and moving faster. Sadly I don't think we'll ever shift the money over. We still evaluate success weirdly. Successful predictions don't matter. You're still heralded if you made a lot of money in VR and Bitcoin, right?

DrewADesign · 2 months ago

I think current AI is a human language/behavior mirror. A cat might believe they see another cat looking in a mirror, but you can’t create a new cat by creating a perfect mirror.

imtringued · 2 months ago

Model based reinforcement learning is a thing and it is kind of a crazy idea. Look up temporal difference model predictive control.

The fundamental idea behind temporal difference is that you can record any observable data stream over time and predict the difference between past and present based on your decision variables (e.g. camera movement, actuator movement, and so on). Think of it like the Minecraft clone called Oasis AI. The AI predicts the response to a user provided action.

Now imagine if it worked as presented. The data problem would be solved, because you are receiving a constant stream of data every single second. If anything, the RL algorithms are nowhere near where they need to be and continual learning has not been solved yet, but the best known way is through automatic continual learning ala Schmidhuber (co-inventor of LSTMs along with Hochreiter).

So, model based control is solved right? Everything that can be observed can be controlled once you have a model!

Wrong. Unfortunately. You still need the rest of reinforcement learning: an objective and a way to integrate the model. It turns out that reconstructing the observations is too computationally challenging and the standard computational tricks like U-Nets learn a latent representation that is optimized for reconstruction rather than for your RL objectives. There is a data exchange problem that can only realistically be solved by throwing an even bigger model at it, but here is why that won't work either:

Model predictive control tries to find the best trajectory over a receding horizon. It is inherently future oriented. This means that you need to optimize through your big model and that is expensive to do.

So you're going to have to take shortcuts by optimizing for a specific task. You reduce the dimension of the latent space and stop reconstructing the observations. The price? You are now learning a latent space for your particular task, which is less demanding. The dream of continual learning with infinite data shatters and you are brought down to earth: it's better than what came before, but not that much better.

senko · 2 months ago

The thing is, achieving say, 99.99999% reliable AI would be spectacularly useful even if it's a dead end from the AGI perspective.

People routinely conflate the "useful LLMs" and "AGI", likely because AGI has been so hyped up, but you don't need AGI to have useful AI.

It's like saying the Internet is dead end because it didn't lead to telepathy. It didn't, but it sure as hell is useful.

It's beneficial to have both discussions: whether and how to achieve AGI and how to grapple with it, and how to improve a reliability, performance and cost of LLMs for more prosaic use cases.

It's just that they are separate discussions.

Animats · 2 months ago

> The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to understand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.

That's the basic success of LLMs. They don't have much of a model of the world, and they still work. "Attention is all you need". Good Old Fashioned AI was all about developing models, yet that was a dead end.

There's been some progress on representation in an unexpected area. Try Perchance's AI character chat. It seems to be an ordinary chatbot. But at any point in the conversation, you can ask it to generate a picture, which it does using a Stable Diffusion type system. You can generate several pictures, and pick the one you like best. Then let the LLM continue the conversation continue from there.

It works from a character sheet, which it will create if asked. It's possible to start from an image and get to a character sheet and a story. The back and forth between the visual and textural domains seems to help.

For storytelling, such system may need to generate the collateral materials needed for a stage or screen production - storyboards, scripts with stage directions, character summaries, artwork of sets, blocking (where everybody is positioned on stage), character sheets (poses and costumes) etc. Those are the modeling tools real productions use to keep a work created by many people on track. Those are a form of world model for storytelling.

I've been amazed at how good the results I can get from this thing are. You have to coax it a bit. It tends to stay stuck in a scene unless you push the plot forward. But give it a hint of what happens next and it will run with it.

[1]https://perchance.org/ai-character-chat

wkat4242 · 2 months ago

Absolutely. AGI isn't a matter of adding more 9s. It's a matter of solving more "???"s. And those require not just work but also a healthy serving of luck.

As I understand it, to the breadth of LLMs was also something that was stumbled on kinda by accident, I understand they got developed as translators and were just 'smarter' than expected.

Also, to understand the world you don't need language. People don't think in language. Thought is understanding. Language is knowledge transfer and expression.

Deleted Comment

bentt · 2 months ago

I think this a useful challenge to our normal way of thinking.

At the same time, "the world" exists only in our imagination (per our brain). Therefore, if LLMs need a model of a world, and they're trained on the corpus of human knowledge (which passed through our brains), then what's the difference, especially when LLMs are going back into our brains anyway?

zwnow · 2 months ago

A world model can not exist, the context windows aren't even near big enough for that. Weird that every serious scientist agrees on AGI not being a thing in the next decades. LLMs are good if you train them for a specific thing. Not so much if you expect them to explain the whole world to you. This is not possible yet.

theptip · 2 months ago

> LLMs seem to udnerstand language therefore they've trained a model of the world.

This isn’t the claim, obviously. LLMs seem to understand a lot more than just language. If you’ve worked with one for hundreds of hours actually exercising frontier capabilities I don’t see how you could think otherwise.

exe34 · 2 months ago

To me, it's a matter of a very big checklist - you can keep adding tasks to the list, but if it keeps marching onwards checking things off your list, some day you will get there. whether it's a linear or asymptotic march, only time will tell.

skopje · 2 months ago

What "9" do you add to AGI? I don't think we even have the axes defined, let alone a way to measure them. "Mistakes per query?" It's like Cantor's diagonal test, where do we even start?

DanielHB · 2 months ago

Problem is that these models feels like they are 8 and getting more 8's

(maybe 7)

harrall · 2 months ago

I don’t have a deep understand of LLMs but don’t they fundamentally work on tokens and generate a multi-dimensional statistical relationship map between tokens?

So it doesn’t have to be LLM. You could theoretically have image tokens (though I don’t know in practice, but the important part is the statistical map).

And it’s not like my brain doesn’t work like that either. When I say a funny joke in response to people in a group, I can clearly observe my brain pull together related “tokens” (Mary just talked about X, X is related to Y, Y is relevant to Bob), filter them, sort them and then spit out a joke. And that happens in like less than a second.

sysguest · 2 months ago

yeah that "model of the world" would mean:

babies are already born with "the model of the world"

but a lot of experiments on babies/young kids tell otherwise

xmichael909 · 2 months ago

love the intentional use of udnerstand, brilliant!

zaphos · 2 months ago

"just a matter of adding more 9s" is a wild place to use a "just" ...

danielvaughn · 2 months ago

I have a very surface level understanding of AI, and yet this always seemed obvious to me. It's almost a fundamental law of the universe that complexity of any kind has a long tail. So you can get AI to faithfully replicate 90% of a particular domain skill. That's phenomenal, and by itself can yield value for companies. But the journey from 90%-100% is going to be a very difficult march.

tim333 · 2 months ago

The nines comment was in the context of self driving cars which I can see because you are never perfect driving and accidents can be fatal.

Some AI is like chess though, where they steadily advance in ELO ranking.

Forgeties79 · 2 months ago

The last mile problem is inescapable!

fair_enough · 2 months ago

Reminds me of a time-honored aphorism in running:

A marathon consists of two halves: the first 20 miles, and then the last 10k (6.2mi) when you're more sore and tired than you've ever been in your life.

jakeydus · 2 months ago

This is 100% unrelated to the original article but I feel like there's an underreported additional first half. As a bigger runner who still loves to run, the first two or three miles before I have enough endorphins to get into the zen state that makes me love running is the first half, then it's 17 miles of this amazing meditative mindset. Then the last 10k sucks.

rootusrootus · 2 months ago

I suspect that is true for many difficult physical goals.

My dad told me that the first time you climb a mountain, there will likely be a moment not too distant from the top when you would be willing to just sit down and never move again, even at the risk to your own life. Even as you can see the goal not far away.

He also said that it was a dangerous enough situation that as a climb leader he'd start kicking you if he had to, if you sat down like that and refused to keep climbing. I'm not a climber myself, though, so this is hearsay, and my dad is long dead and unable to remind me of what details I've forgotten.

tylerflick · 2 months ago

I think I hated life most after 20 miles. Especially in training.

sarchertech · 2 months ago

Why just run 20 miles then?

TeMPOraL · 2 months ago

FWIW, Karpathy literally says, multiple times, that he thinks we never left the exponential - that all human progress over last 4+ centuries averages out to that smooth ~2% growth rate exponential curve, that electricity and computing and AI are just ways we keep it going, and we'll continue on that curve for the time being.

It's the major point of contention between him and the host (who thinks growth rate will increase).

DanHulton · 2 months ago

The thing about this, though - cars have been built before. We understand what's necessary to get those 9s. I'm sure there were some new problems that had to be solved along the way, but fundamentally, "build good car" is known to be achievable, so the process of "adding 9s" there makes sense.

But this method of AI is still pretty new, and we don't know it's upper limits. It may be that there are no more 9s to add, or that any more 9s cost prohibitively more. We might be effectively stuck at 91.25626726...% forever.

Not to be a doomer, but I DO think that anyone who is significantly invested in AI really has to have a plan in case that ends up being true. We can't just keep on saying "they'll get there some day" and acting as if it's true. (I mean you can, just not without consequences.)

danielmarkbruce · 2 months ago

While you are right about the broader (and sort of ill defined) chase toward 'AGI' - another way to look at it is the self driving car - they got there eventually.And, if you work on applications using LLMs you can pretty easily see that Karpathy's sentiment is likely correct. You see it because you do it. Even simple applications are shaped like this, albeit each 9 takes less time than self driving cars for a simple app.. it still feels about right.

godelski · 2 months ago

It's a good way to think about lots of things. It's Pareto efficiency. The 80/20 rule

20% of your effort gets you 80% of the way. But most of your time is spent getting that last 20%. People often don't realize that this is fractal like in nature, as it draws from the power distribution. So of that 20% you still have left, the same holds true. 20% of your time (20% * 80% = 16% -> 36%) to get 80% (80% * 20% => 96%) again and again. The 80/20 numbers aren't actually realistic (or constant) but it's a decent guide.

It's also something tech has been struggling with lately. Move fast and break things is a great way to get most of the way there. But you also left a wake of destruction and tabled a million little things along the way. Someone needs to go back and clean things up. Someone needs to revisit those tabled things. While each thing might be little, we solve big problems by breaking them down into little ones. So each big problem is a sum of many little ones, meaning they shouldn't be quickly dismissed. And like the 9's analogy, 99.9% of the time is still 9hrs of downtime a year. It is still 1e6 cases out of 1e9. A million cases is not a small problem. Scale is great and has made our field amazing, but it is a double edged sword.

I think it's also something people struggle with. It's very easy to become above average, or even well above average at something. Just trying will often get you above average. It can make you feel like you know way more but the trap is that while in some domains above average is not far from mastery in other domains above average is closer to no skill than it is to mastery. Like how having $100m puts your wealth closer to a homeless person than a billionaire. At $100m you feel way closer to the billionaire because you're much further up than the person with nothing but the curve is exponential.

010101010101 · 2 months ago

https://youtu.be/bpiu8UtQ-6E?si=ogmfFPbmLICoMvr3

"I'm closer to LeBron than you are to me."

omidsa1 · 2 months ago

I also quite like the way he puts it. However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI. That's why we can expect fast acceleration to take off within two years.

breuleux · 2 months ago

I don't think we can be confident that this is how it works. It may very well be that our level of intelligence has a hard limit to how many nines we can add, and AGI just pushes the limit further, but doesn't make it faster per se.

It may also be that we're looking at this the wrong way altogether. If you compare the natural world with what humans have achieved, for instance, both things are qualitatively different, they have basically nothing to do with each other. Humanity isn't "adding nines" to what Nature was doing, we're just doing our own thing. Likewise, whatever "nines" AGI may be singularly good at adding may be in directions that are orthogonal to everything we've been doing.

Progress doesn't really go forward. It goes sideways.

AnimalMuppet · 2 months ago

Isn't that one of the measures of when it becomes an AGI? So that doesn't help you with however many nines we are away from getting an AGI.

Even if you don't like that definition, you still have the question of how many nines we are away from having an AI that can contribute to its own development.

I don't think you know the answer to that. And therefore I think your "fast acceleration within two years" is unsupported, just wishful thinking. If you've got actual evidence, I would like to hear it.

aughtdev · 2 months ago

I doubt this. General intelligence will be a step change not a gentle ramp. If we get to an architecture intelligent enough to meaningfully contribute to AI development, we'll have already made it. It'll simply be a matter of scale. There's no 99% AGI that can help build 100% AGI but for some reason can't drive a car or cook a meal or work an office job.

rpcope1 · 2 months ago

> However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI.

There's a massive planet-sized CITATION NEEDED here, otherwise that's weapons grade copium.

Yoric · 2 months ago

It's a possibility, but far from certainty.

If you look at it differently, assembly language may have been one nine, compilers may have been the next nine, successive generations of language until ${your favorite language} one more nine, and yet, they didn't get us noticeably closer to AGI.

Deleted Comment

techblueberry · 2 months ago

I think the 9's include this assumption.

sdenton4 · 2 months ago

Ha, I often speak of doing the first 90% of the work, and then moving on to the following 90% of the work...

JimDabell · 2 months ago

> The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

— Tom Cargill, Bell Labs (September 1985)

https://dl.acm.org/doi/pdf/10.1145/4284.315122

inerte · 2 months ago

I use "The project is 90% ready, now we only have to do the other half"

yoyohello13 · 2 months ago

I think a ton of people see a line going up and they think exponential. When in Reality, the vast majority of the time it’s actually logistic.

tibbar · 2 months ago

Given the physical limits of the universe and our planet in particular, yeah, this is pretty much always true. The interesting question is: what is that limit, and: how many orders of magnitude are we away from leveling off?

misnome · 2 months ago

I mean the cost line does look somewhat exponential…

HarHarVeryFunny · 2 months ago

I think the point Andrej was making here is that in some areas, such as self driving, the cost of failure is extremely high (maybe death), so 99.9% reliable doesn't cut it, and therefore doesn't mean you are almost done, or have done 99.9% of the work. It's "The last 10% is 90% of the work" recursively applied.

He was also pointing out that the same high cost of failure consideration applies to many software systems (depending on what they are doing/controlling). We may already be at the level where AI coding agents are adequate for some less critical applications, but yet far away from them being a general developer replacement. I see software development as something that uses closer to 100% of your brain than 10% - we may well not see AI coding agents approach human reliability levels until we have human level AGI.

The AI snake oil salesmen/CEOs like to throw out competitive coding or math olympiad benchmarks as if they are somehow indicative of the readiness of AI for other tasks, but reliability matters. Nobody dies or loses millions of dollars if you get a math problem wrong.

joe_the_user · 2 months ago

The thing is, the example of the "march of nines" is self-driving cars. These deal with roads and roads are interface between the chaos of the overall world and a system that has quite well-defined rules.

I can imagine other task on a human/rules-based "frontier" would have a similar quality. But I think there are others that are going to be inaccessible entirely "until AGI" (or something). Humanoid robots moving freely in human society would an example I think.

somanyphotons · 2 months ago

This is an amazing quote that really applies to all software development

Veserv · 2 months ago

Drawn from Karpathy killing a bunch of people by knowingly delivering defective autonomous driving software instead of applying basic engineering ethics and refusing to deploy the dangerous product he was in charge of.

zeroonetwothree · 2 months ago

Well, maybe not all. I’ve definitely built CRUD UIs that were linear in effort. But certainly anything technically challenging or novel.

zeroonetwothree · 2 months ago

When I worked at Facebook they had a slogan that captured this idea pretty well: “this journey is 1% finished”.

gowld · 2 months ago

Copied from Amazon's "Day 1".

czk · 2 months ago

like leveling to 99 in old school runescape

fbrchps · 2 months ago

The first 92% and the last 92%, exactly.

zeroonetwothree · 2 months ago

Or Diablo 2

wilfredk · 2 months ago

Perfect analogy.

tekbruh9000 · 2 months ago

Infinitely big little numbers

Academia has rediscovered itself

Signal attenuation, a byproduct of entropy, due to generational churn means there's little guarantee.

Occam's Razor; Karpathy knows the future or he is self selecting biology trying to avoid manual labor?

His statements have more in common with Nostradamus. It's the toxic positivity form of "the end is nigh". It's "Heaven exists you just have to do this work to get there."

Physics always wins and statistics is not physics. Gamblers fallacy; improvement of statistical odds does not improve probability. Probability remains the same this is all promises of some people who have no idea or interest in doing anything else with their lives; so stay the course.

startupsfail · 2 months ago

>> Heaven exists you just have to do this work to get there.

Or perhaps Karpathy has a higher level understanding and can see a bigger picture?

You've said something about heaven. Are you able to understand this statement, for example: "Heaven is a memeplex, it exists." ?

jlas · 2 months ago

Notably the scaling law paper shows result graphs on log-scale

rcxdude · 2 months ago

In my experience with AI it's steeper than that: the jump from 90% to 99% is much harder than the jump from 0 to 90%

ojr · 2 months ago

if it works 90% of the time that means it fails 10% of the time, to get to 1% failure rate is a 10x improvement and from 1% failure rate to a 0.1% failure rate is also a 10x improvement

First time being hearing it be called "march of nines", did Tesla make the term, I thought it was an Amazon thing

atleastoptimal · 2 months ago

something that replaces humans doesn’t need to be 99.9999% reliable, it just has to be better than the humans it replaces.

rrrrrrrrrrrryan · 2 months ago

But to be accepted by people, it has to be better than humans in the specific ways that humans are good at things. And less bad than humans in the ways that they're bad at things.

When automated solutions fail in strange alien ways, it understandably freaks people out. Nobody wants to worry about if a car will suddenly serve into oncoming traffic because of a sensor malfunction. Comparing incidents-per-miles-driven might make sense from a utilitarian perspective, just isn't good enough for humans to accept replacement tech psychologically, so we do have to chase those 9s until they can handle all the edge cases at least as well as humans.

red75prime · 2 months ago

The question is how many nines are humans.

notTooFarGone · 2 months ago

Humans adapt and become more nines the more they learn about something. Humans also are liable in a lawful sense. This is a huge factor in any AI use case.

kordlessagain · 2 months ago

So when you say first 9, you mean like Anthropic's uptime on models, right?

jakeydus · 2 months ago

You know what they say, a Silicon Valley 9 is a 10 anywhere else. Or something like that.

Yoric · 2 months ago

I assume you're describing the fact that Silicon Valley culture keeps pushing out products before they're fully baked?

Dead Comment

It looks like Andrej's definition of "agent" here is an entity that can replace a human employee entirely - from the first few minutes of the conversation:

When you’re talking about an agent, or what the labs have in mind and maybe what I have in mind as well, you should think of it almost like an employee or an intern that you would hire to work with you. For example, you work with some employees here. When would you prefer to have an agent like Claude or Codex do that work?

Currently, of course they can’t. What would it take for them to be able to do that? Why don’t you do it today? The reason you don’t do it today is because they just don’t work. They don’t have enough intelligence, they’re not multimodal enough, they can’t do computer use and all this stuff.

They don’t do a lot of the things you’ve alluded to earlier. They don’t have continual learning. You can’t just tell them something and they’ll remember it. They’re cognitively lacking and it’s just not working. It will take about a decade to work through all of those issues.

sarchertech · 2 months ago

He’s not just talking about agents good enough to replace workers. He’s talking about whether agents are currently useful at all.

>Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop. They’re not coming to terms with it, and maybe they’re trying to fundraise or something like that. I’m not sure what’s going on, but we’re at this intermediate stage. The models are amazing. They still need a lot of work. For now, autocomplete is my sweet spot. But sometimes, for some types of code, I will go to an LLM agent.

>They kept trying to mess up the style. They’re way too over-defensive. They make all these try-catch statements. They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

sothatsit · 2 months ago

I don't think he is saying agents are not useful at all, just that they are not anywhere near the capability of human software developers. Karpathy later says he used agents to write the Rust translation of algorithms he wrote in Python. He also explicitly says that agents can be useful for writing boilerplate or for code that can be very commonly found online. So I don't think he is saying they are not useful at all. Instead, he is just holding agents to a higher standard of working on a novel new codebase, and saying they don't pass that bar.

Tbh I think people underestimate how much software development work is just writing boilerplate or common patterns though. A very large percentage of the web development work I do is just writing CRUD boilerplate, and agents are great at it. I also find them invaluable for searching through large codebases, and for basic code review, but I see these use-cases discussed less even though they're a big part of what I find useful from agents.

consumer451 · 2 months ago

I am just some shmoe, but I agree with that assessment. My biggest take-away is that we got super lucky.

At least now we have a slight chance to prepare for the potential economic and social impacts.

kubb · 2 months ago

My ever growing reporting chain is incredibly invested in having autonomous agents next year.

eddiewithzato · 2 months ago

Because that's the definition that is leading to all these investments, the promise that very soon they will reach it. If Altman said plainly that LLMs will never reach that stage, there would be a lot less investment into the industry.

aik · 2 months ago

Hard disagree. You don’t need AGI to transform countless workflows within companies, current LLMs can do it. A lot of the current investments are to help with the demand with current generation LLMs (and use cases we know will keep opening up with incremental improvements). Are you aware of how intensely all the main companies that host leading models (azure, aws, etc) are throttling usage due to not enough data center capacity? (Eg. At my company we have 100x more demand than we can get capacity for, and we’re barely getting started. We have a roadmap with 1000x+ the current demand and we’re a relatively small company.)

AGI would be more impactful of course, and some use cases aren’t possible until we have it, but that doesn’t diminish the value of current AI.

bbor · 2 months ago

Quite telling -- thanks for the insightful comment as always, Simon. Didn't know that, even though I've been discussing this on and off all day on Reddit.

He's a smart man with well-reasoned arguments, but I think he's also a bit poisoned by working at such a huge org, with all the constraints that comes with. Like, this:

  You can’t just tell them something and they’ll remember it.

It might take a decade to work through this issue if you just want to put a single LLM in a single computer and have it be a fully-fledged human, sure. And since he works at a company making some of the most advanced LLMs in the world, that perspective makes sense! But of course that's not how it's actually going to be (/already is).

LLMs are a necessary part of AGI(/"agents") due to their ability to avoid the Frame Problem[1], but they're far from the only needed thing. We're pretty dang good at "remembering things" with computers already, and connecting that with LLM ensembles isn't going to take anywhere close to 10 years. Arguably, we're already doing it pretty darn well in unified systems[2]...

If anyone's unfamiliar and finds my comment interesting, I highly recommend Minsky's work on the Society of Mind, which handled this topic definitively over 20 years ago. Namely;

A short summary of "Connectionism and Society of Mind" for laypeople at DARPA: https://apps.dtic.mil/sti/tr/pdf/ADA200313.pdf

A description of the book itself, available via Amazon in 48h or via PDF: https://en.wikipedia.org/wiki/Society_of_Mind

By far my favorite paper on the topic of connectionist+symbolist syncreticism, though a tad long: https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa...

[1] https://plato.stanford.edu/entries/frame-problem/

[2] https://github.com/modelcontextprotocol/servers/tree/main/sr...

erichocean · 2 months ago

> You can’t just tell them something and they’ll remember it.

I find it fascinating that this is the problem people consistently think we're a decade away on.

If you can't do this, you don't have employee-like AI agents, you have AI-enhanced scripting. It's basically the first thing you have to be able to do to credibly replace an actual human employee.

ambicapter · 2 months ago

Do you have a comment? Most of what you've said here is a quote.

simonw · 2 months ago

This is part of my wider hobby of collecting definitions of "agents" - you can see more in my collection here: https://simonwillison.net/tags/agent-definitions/

In this case the specific definition matters because the title of the HN submission is "it will take a decade to work through the issues with agents."