Cheating Is All You Need

I'm 100% an AI skeptic. I don't think we're near AGI, well any reasonable definition of it. I don't think we're remotely near a super intelligence or the singularity or whatever. But that doesn't matter. These tools will change how work is done. What matters is what he says here:

> Peeps, let’s do some really simple back-of-envelope math. Trust me, it won’t be difficult math.

> You get the LLM to draft some code for you that’s 80% complete/correct.

> You tweak the last 20% by hand.

> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

The hard part is the engineering, yes, and now we can actually focus on it. 90% of the work that software engineers do isn't novel. People aren't sitting around coding complicated algorithms that require PhDs. Most of the work is doing pretty boring stuff. And even if LLMs can only write 50% of that you're still going to get a massive boost in productivity.

Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

yoyohello13 · a year ago

> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

Is that exciting though? A lot of the code we've currently got is garbage. We are already slogging through a morass of code. I could be wrong, but I don't think LLMs are going to change this trend.

bdangubic · a year ago

it is GREAT time to slowly get into the consulting side of our industry. there will be so much mess that I think there will be troves of opportunity to come in on a contract and fix messes. there always was but there will be a ton more!

hirvi74 · a year ago

> A lot of the code we've currently got is garbage.

That is why I have never understood the harsh criticism of quality of LLM generated code. What do people think much of the models were trained on? Garbage in -> Garbage out.

trescenzi · a year ago

I think it's exciting that more people will be able to build more things. I agree though that most code is garbage, and that it's likely going to get worse. The future can be both concerning and exciting.

accrual · a year ago

Personal anecdote: I pushed some LLM generated code into Prod during a call with the client today. It was simple stuff, just a string manipulation function, about 10 LOC. Absolutely could have done it by hand but it was really cool to not have to think about the details - I just gave some inputs and desired outputs and had the function in hand and tested it. Instead of saying "I'll reply when the code is done" I was able to say "it's done, go ahead and continue" and that was a cool moment for me.

roland35 · a year ago

That's the perfect application for llm code! It is a well defined problem, with known inputs and outputs, and probably many examples the llm slurped up from open source code.

And that isn't a bad thing! Now us engineers need to focus on the high level stuff, like what code to write in the first place.

pton_xd · a year ago

> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

To state the obvious -- only if all units of work are of equal difficulty.

NoGravitas · a year ago

The last 20% always takes 80% of the effort, of course.

deadbabe · a year ago

The original calculation is not even right, because 100% of the work software engineers do probably isn’t even writing code. A lot of the time they’re debugging code or attending some kind of meeting. So if only 20% of their time is coding and they do 1/5th of it, they’re like what, 16% more productive overall?

Now factor in the expense of using LLMs, it’s not worth it, the math isn’t mathing. You can’t even squeeze a 2x improvement, let alone the 5-10x that has become expected in the tech industry.

JohnMakin · a year ago

Also absent in these discussions is who maintains the 80% that wasn’t written by them when it needs to either be fixed or adapted to a new use case. This is where I typically see “fast” AI prototypes fall over completely. Will that get better? I’m ambivalent but could concede ‘probably’ but at the end of the day I don’t think you can get away from the fact that for every non trivial programming task, there does need to be a human somewhere to verify/validate work, or at the very least understand it enough to debug it. This seems like an inescapable fact.

Terr_ · a year ago

My concern is that it's always harder to comprehend and debug code you didn't write, so there are hidden costs to using a copy-pasted solution even if you "get to the testing phase faster." Shaving percentages in one place could add different ones elsewhere.

Imagine if your manager announced a wonderful new productivity process: From now on a herd of excited interns will get the first stab at feature requests, and allll that the senior developers need to do is take that output and juuuust spruce it up a bit.

Then you have plenty of time for those TPS reports. (Now, later in my career, I have a lot more sympathy for Tom Smykowski, speaking with customers so the engineers don't have to.)

eric-hu · a year ago

> I don't think we're near AGI, well any reasonable definition of it.

But wait just hear me out for a second! What if we've also sufficiently lowered the definition of "GI" at the same time...

stouset · a year ago

> Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

This is quite literally what junior engineers are.

I've worked at many places filled to the brim with junior engineers. They weren't enjoyable places to work. Those businesses did not succeed. They sure did produce a lot of code though.

EA-3167 · a year ago

I'd also be very suspicious of this Pareto distribution, to me it implies that the 20% of work we're talking about is the hard part. Spitting out code without understanding it definitely sounds like the easy part, the part that doesn't require much if any thinking. I'd be much more interested to see a breakdown of TIME, not volume of code; how much time does using an LLM save (or not) in the context of a given task?

seba_dos1 · a year ago

The thing with junior engineers is that they may eventually become proper seniors and you'll play your role in it by working with them. Nobody learns anything from working with LLMs.

You can't become a senior and skip being a junior.

j45 · a year ago

This isn't about the promise of AI to me, it's more about this type of AI technology, which should more accurately be called LLMs, as there are others.

Skepticism aside, looking at what's been possible or similar, much like Microsoft word introduces more and more grammar related features, LLMs are reasonably decent for the use case of supplying your input to it and speeding up iterations with you at the drivers seat. Whether it's text in a word processor, or code in an editor, there is likely some meaningful improvements to not dismiss the technology.

The inverse of that, where someone wants a magic seed prompt to do everything, could prove to be a bit more challenging where existing models are trained on what they know, and steps to move beyond that where they learn appear to be a few years away.

hirvi74 · a year ago

> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

I think it will cut both ways. I guess it depends on the type of work. I feel like expectations will fill to meet the space of the increased productivity. At least, I can see this being true in a professional sense.

If I could be five times more productive i.e., work 80% less, then I would be quite excited. However, I imagine most of us will be still be working just as much -- if not more.

Maybe I am just jaded, but I can't help but be reminded of the old saying, "The reward for digging the biggest hole is a bigger shovel."

For personal endeavors, I am nothing but excited for this technology.

thunky · a year ago

> I imagine most of us will be still be working just as much -- if not more

Yes but there will be far less of us working.

Which, in a healthy society would actually be a good thing.

> You get the LLM to draft some code for you that’s 80% complete/correct.

> You tweak the last 20% by hand.

> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive

Except, it’s probably four times as hard to verify the 80% right code as it is to just write the code yourself in the first place.

Diggsey · a year ago

Using AI is like trying to build a product with only extremely junior devs. You can get something working fairly quickly but sooner rather than later it's going to collapse in on itself and there won't be anyone around capable of fixing it. Moving past that requires someone with more experience to figure out what they did and eventually fix it, which ends up being far more expensive than correcting things along the way.

The error rate needs to come down significantly from where it's currently at, and we'd also need AIs that cover each others weaknesses - at the moment you can't just run another AI to fix the 20% of errors the first one generated because they tend to have overlapping blind spots.

Having said all that, I think AI is hugely useful as a programming tool. Here are the places I've got most value:

- Better IDE auto-complete.

- Repetetive re-factors that are beyond what's possible to do with a regex. For example, replaing one function with another whose usage is subtly different. LLMs can crack out those changes across a codebase and it's easy to review.

- Summarizing and extracting intent from dense technical content. I'm often implementing some part of a web standard, and being able to quickly ask an AI questions, and have it give references back to specific parts of the standard is super useful.

- Bootstrapping tests. The hardest part of writing tests in my experience is writing the first one - there's always a bit of boilerplate that's annoying to write. The AI is pretty good at generating a smoke test and then you can use that a basis to do proper test coverage.

I'm sure there are more. The point is that it's currently incredibly valuable as a tool, but it's not yet capable of being an agent.

onemoresoop · a year ago

It seems like nobody cares about the code quality, it just has to work until the conpany is acquired

remram · a year ago

The problem is that now, when your colleagues/contributors push out shitty code that you have to fix... you don't know if they even tried or just put in bad code from an LLM.

mintplant · a year ago

Yep, this is a hell I've come to know over the past year. You get people trying to build things they really have no business building, not without at least doing their own research and learning first, but the LLM in their editor lets them spit out something that looks convincing enough to get shipped, unless someone with domain experience looks at it more closely and realizes it's all wrong. It's not the same as bodged-together code from Stack Overflow because the LLM code is better at getting past the sniff test, and magical thinking around generative AI leads people to put undue trust in it. I don't think we've collectively appreciated the negative impact this is going to have on software quality going forward.

BeetleB · a year ago

> you don't know if they even tried or just put in bad code from an LLM

You don't need to. Focus on the work, not the person.

I recall a coworker screaming at a junior developer during a code review, and I set up a meeting later on to discuss his reaction. Why was he so upset? Because the junior developer was being lazy and not trying.

How did he know this?

He didn't. He used very dubious behavioral signals.

This was pre-LLM. As applicable now as it was then. If their code is shitty, let them know and leave it at that.

On the flip side, I often review code that I think has been written by an LLM. When the code is good, I don't really care if he wrote it or asked GPT to write it. As long as he's functioning well enough for the current job.

LeafItAlone · a year ago

Does it matter where the bad code came from?

Deleted Comment

rad_gruchalski · a year ago

Yeah, but you can teach your colleagues to get better. An LLM will give you the same crap every time.

onethumb · a year ago

I've deployed code with Claude that was >100X faster, all-in (verification, testing, etc) than doing it myself. (I've been writing code professionally and successfully for >30 years).

>100X isn't typical, I'd guess typical in my case is >2X, but it's happened at large scale and saved me months of work in this single instance. (Years of work overall)

machinekob · a year ago

If you wirte 100x faster code you could probably automate almost all of it away as it seems to be super trivial and already resolved problems. I also use Claude for my coding a lot, but i’m not sure about real gains if it even give me noticeable speed improvement as i’m in a loop of “waiting” and fixing bugs that LLM made still it is super useful for writing big doc strings and smaller tests maybe if i focus on some basic tasks like classical backend or frontend it’ll be more useful

__loam · a year ago

It's fascinating watching people in ostensibly leadership positions in our industry continue to fundamentally misunderstand how productivity works in knowledge work.

ezekg · a year ago

They never understood. AI is the new Mythical Man-Month.

seabombs · a year ago

Totally. It used to be that LoC was widely accepted as a a terrible metric to measure dev productivity by, but when talking about AI code-gen tools it's like it's all that matters.

ezekg · a year ago

This is pretty much it. I don't trust code produced by AI, tbh, because it almost always has logical errors, or it ignored part of the prompt so it's subtly wrong. I really like AI for prototyping and sketching, but then I usually take that prototype, maybe write some solid tests, and then write the real implementation.

energy123 · a year ago

If the code these LLMs produce has logical errors and bugs, I have formed the opinion that this is mostly user error. LLMs produce great code if they're used properly. By that I mean: strictly control the maximum level of complexity that you're asking the LLMs to solve.

GPT-4o can only handle rudimentary coding tasks. So I only ask it rudimentary questions. And it's rarely wrong or bugged. o1-2024-12-17 is much better, it can handle "mid" complexity tasks, it's usually never wrong on those, but a step above and it will output buggy code.

This is subjective and not perfect, but when you use the models more you can get a sense for where it starts falling apart and stay below that threshold. This means chunking your problem into multiple prompts, where each chunk is nearing the maximum level of complexity that the model can handle.

golergka · a year ago

I've built a pet project with LLMs recently. It's rough around the edges, with bugs, incomplete functionality and docs, but I've built it around 4 times as fast as I would have done it myself.

https://github.com/golergka/hn-blog-recommender

Basically 95% of the code is written by Aider. I still had to do QA, make high-level decisions, review code it wrote, assign tasks and tell it when to refactor stuff.

I would never use this workflow to write a program where absolute correctness is top priority, like a radiation therapy controller. But from this point on, I'll never waste time building pet projects, small web app and utilities myself.

__loam · a year ago

And you'll be a worse programmer for it because you won't have the reps.

Deleted Comment

vunderba · a year ago

Agreed. To add to this: The article is using a rather poor comparison because it makes the assumption that all sections of a program take an equal amount of time to write.

An LLM might be able to hammer out the unit tests, boilerplate react logic, and the express backend in 5 minutes, but if it collapses on the "20%" that you need actual assistance with, then the productivity boost is significantly less than suggested.

saghm · a year ago

Yeah, this is almost literally the 80-20 rule[1]; often getting the first 80% of something done takes 20% of the work, and getting the last 20% takes 80% of the work. Debugging the last 20% is already the hard part for most things, and as you mention, it's going to be a lot harder debugging LLM-generated code.

[1]: https://en.wikipedia.org/wiki/Pareto_principle

gregw2 · a year ago

re: debugging

I dont disagree but here's another spin on what you observed.

Kernighan's law says you have to be twice as intelligent/skilled when debugging the code as you were when writing it.

A LLM centric corollary would be -don't have the LLM write code at the boundary of its capabilities in your domain.

It's a little hard these days to know where that line lies with the rate of improvements but perhaps is an intuition people are still new at with LLMs.

bongodongobob · a year ago

Except you saved yourself all that mental effort. If it takes the same amount of time and I don't need to put 110% focus in, it's a win.

fiddlerwoaroof · a year ago

Verifying code is much more mental effort than writing it because you have to reconstruct the data structure and the logic of the code in order to show that the code you’re reading is correct.

yoyohello13 · a year ago

You also absorbed no knowledge of the problem domain. I'm pretty positive on LLMs but with the caveat that the programmer still should be an active participant in the development of the solution.

phyzome · a year ago

No, now I get to be a code reviewer instead of a writer, and that's a lot more mental energy for me.

edanm · a year ago

> Except, it’s probably four times as hard to verify the 80% right code as it is to just write the code yourself in the first place.

Why would that be true in general? You think it just so happens that a new tool gives an X speedup in ability but costs an exactly equivalent X slowdown in other places?

fiddlerwoaroof · a year ago

No, the claim is just that it’s harder to read and verify existing code than to write new code. Depending on the level of certainty you want, verifying code is much more expensive than writing code, which is why there are so many bugs and why PR review is, imo, practically worthless.

Switching the job of programming from designing and writing code to reading and verifying code isn’t a good trade-off in terms of making programming easier. And my first speei node with AI tools has done nothing to change my mind on this point.

johnfn · a year ago

The thing that everyone seems to miss here is it's not surprising which code falls into the 80% or 20%. I can guess if an AI will write something bug-free with almost 100% accuracy. For the remaining 20%, whatever, it gives me a fine enough starting point.

taneq · a year ago

To debug code that doesn't work, you generally have to fully understand the goal and the code.

To write code that does work, you generally have to fully understand the goal and what code to use.

So unless you just flat-out don't know what code to use (ie. libraries, architecture, approach etc.) it's going to be no faster and probably less satisfying to fix LLM code as it would be to just do it yourself.

Where LLMs do shine in coding (in my limited experience) though is discovery. If you need to write something in a mostly unfamiliar language and ecosystem they can be great for getting off the ground. I'm still not convinced that the mental gear shifting between 'student asking questions' and 'researcher/analyst/coder mode' is worth the slightly more targeted responses vs. just looking up tutorials, but it's more useful scenario.

Deleted Comment

yapyap · a year ago

why dont they get another LLM to do the last 20%, are they stupid?

Dead Comment