Software development jobs must be very diverse if even this anti-vibe-coding guy thinks AI coding definitely makes developers more productive.
In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements. I work in backend, API development, which is completely different from fullstack development with backend development. If you ask PMs about backend requirements, they will dodge you, and if you ask front-end or web developers, they are waiting for you to provide them the API. The hardest part is understanding the requirements. It's not because of illiteracy. It's because software development is a lot more than coding and requires critical thinking to discover the requirements.
> Software development jobs must be very diverse if even this anti-vibe-coding guy thinks AI coding definitely makes developers more productive.
As a Professor of English who teaches programming to humanities students, the writer has had an extremely interesting and unusual academic career [1]. He sounds awesome, but I think it's fair to suggest he may not have much experience of large scale commercial software development or be particularly well placed to predict what will or will not work in that environment. (Not that he necessarily claims to, but it's implicit in strong predictions about what the "future of programming" will be.)
Hard to say but to back his claim that he was programming since the 90's his CV shows he was working on stuff that's clearly more than your basic undergraduate skill level since the early 2000's. I'd be willing to bet he has more years under his belt than most HN users. I mean I'm considered old here, in my mid 30's, and this guy has been programming most my life. Though that doesn't explicitly imply experience, or more specifically experience in what.
That said, I think people really under appreciate how diverse programmers actually are. I started in physics and came over when I went to grad school. While I wouldn't expect a physicist to do super well on leetcode problems I've seen those same people write incredible code that's optimized for HPC systems and they're really good at tracing bottlenecks (it's a skill that translates from physics really really well). Hell, the best programmer I've ever met got that way because he was doing his PhD in mechanical engineering. He's practically the leading expert in data streaming for HPC systems and gained this skill because he needed more performance for his other work.
There's a lot of different types of programmers out there but I think it's too easy to think the field is narrow.
That was such a strange aspect. If you will excuse my use of the tortured analogy of comparing programming to wood working, there are is a lot of talk about hand tools versus power tools, but for people who aren't in a production capacity--not making cabinets for a living, not making furniture for a living--you see people choosing to exclusively use hand tools because they just enjoy it more. There isn't pressure about "you most use power tools or else you're in self-denial about their superiority." Well , at least for people who actually practice the hobby. You'll find plenty of armchair woodworkers in the comments section on YouTube. But I digress. For someone who claims to enjoy programming for the sake of programming, it was a very strange statement to make about coding.
I very much enjoy the act of programming, but I'm also a professional software developer. Incidentally, I've almost always worked in fields where subtly wrong answers could get someone hurt or killed. I just can't imagine either giving up my joy in the former case or abdicating my responsibility to understand my code in the latter.
And this is why the wood working analogy falls down. The scale at which damage can occur due to the decision to use power tools over hand tools is, for most practical purposes, limited to just myself. With computers, we can share our fuck ups with the whole world.
Exactly, I don't think ppl understand why programming languages even came about anymore. Lotsa ppl don't understand why a natural language is not suitable for programming and by extension prompting an LLM
I have done both strict back-end, strict front-end, full stack, QA automation and some devops as well, I worked in an all Linux shop where we were encouraged by great senior devs to always strive for better software all around. I think you're right, it mostly depends on your mindset and how much you expose yourself to the craft. I can tackle obscure front-end things sometimes better than back-end issues despite hating front-end but knowing enough to be dangerous. (My first job in tech really had me doing everything imaginable)
I find the LLMs boost my productivity because I've always had a sort of architectural mindset, I love looking up projects that solve specific problems and keeping them on the back of my mind, turns out I was building myself up for instructing LLMs on how to build me software, and it takes several months worth of effort and spits it out in a few hours.
Speaking of vibe coding in archaic languages, I'm using LLMs to understand old Shockwave Lingo to translate it to a more modern language, so I can rebuild a legacy game in a modern language. Maybe once I spin up my blog again I'll start documenting that fun journey.
Well, I think we can say C is archaic when most developers write in something that for one isn't C, two isn't a language itself written in C, or three isn't running on something written in C :)
Hehe. In the "someone should make a website"™ department: using a crap tons of legacy protocols and plugins semi-interoperable with modern while offering legacy browsers loaded with legacy plugins something usable to test with, i.e.,
- SSL 2.0-TLS 1.1, HTTP/0.9-HTTP/1.1, ftp, WAIS, gopher, finger, telnet, rwho, TinyFugue MUD, UUCP email, SHOUTcast streaming some public domain radio whatever
- <blink>, <marquee>, <object>, XHTML, SGML
- Java <applet>, Java Web Start
- MSJVM/J++, ActiveX, Silverlight
- Flash, Shockwave (of course), Adobe Air
- (Cosmo) VRML
- Joke ActiveX control or toolbar that turns a Win 9x/NT-XP box into a "real" ProgressBar95. ;)
(Gov't mandated PSA: Run vintage {good,bad}ness with care.)
The thing is that some imagined AI that can reliably produce reliable software will also likely be able to be smart enough to come up with the requirements on its own. If vibe coding is that capable, then even vibe coding itself is redundant. In other words, vibe coding cannot possibly be "the future", because the moment vibe coding can do all that, vibe coding doesn't need to exist.
The converse is that if vibe coding is the future, that means we assume there are things the AI cannot do well (such as come up with requirements), at which point it's also likely it cannot actually vibe code that well.
The general problem is that once we start talking about imagined AI capabilities, both the capabilities and the constraints become arbitrary. If we imagine an AI that does X but not Y, we could just as easily imagine an AI that does both X and Y.
This is the most coherent comment in this thread. People who believe in vibe coding but not in generalizing it to “engineering”... brother the LLMs speak English. They can even hold conversations with your uncle.
Following similar thinking, there's no world in which AI becomes exactly capable of replacing all software developers and then stops there, miraculously saving the jobs of everyone else next to and above them in the corporate hierarchy. There may be a human, C-suite driven cost-cutting effort to pause progress there for some brief time, but if AI can do all dev work, there's no reason it can't do all office work to replace every human in front of a keyboard. Either we're all similarly affected, or else AI still isn't good enough, in which case fleets of programmers are still needed, and among those, the presumed "helpfulness" of AI will vary wildly. Not unlike what we see already.
My bet is that it will be good enough to devise the requirements.
They already can brainstorm new features and make roadmaps. If you give them more context about the business strategy/goals then they will make better guesses. If you give them more details about the user personas / feedback / etc they will prioritize better.
We're still just working our way up the ladder of systematizing that context, building better abstractions, workflows, etc.
If you were to start a new company with an AI assistant and feed it every piece of information (which it structures / summarizes synthesizes etc in a systematic way) even with finite context it's going to be damn good. I mean just imagine a system that can continuously read and structure all the data from regular news, market reports, competitor press releases, public user forums, sales call transcripts, etc etc. It's the dream of "big data".
I agree with the first part which is basically 'being able to do a software engineers full job' is basically ASI/AGI complete.
But I think it is certainly possible that we reach a point/plateau where everything is just 'english -> code' compilation but that 'vibe coding' compilation step is really really good.
What do you mean "come up with the requirements"? Like if self-driving cars got so good that they didn't just drive you somewhere but decided where you should go?
Yup. I would never be able to give my Jira tickets to an LLM because they're too damn vague or incomplete. Getting the requirements first needs 4 rounds of lobbying with all stakeholders.
We had a client who'd create incredibly detailed Jira tickets. Their lead developer (also their only developer) would write exactly how he'd want us to implement a given feature, and what the expected output would be.
The guy is also a complete tool. I'd point out that what he described wasn't actually what they needed, and that there functionality was ... strange and didn't actually do anything useful. We'd be told to just do as we where being told, seeing as they where the ones paying the bills. Sometimes we'd read between the lines, and just deliver what was actually needed, then we'd be told just do as we where told next time, and they'd then use the code we wrote anyway. At some point we got tired of the complaining and just did exactly as the tasks described, complete with tests that showed that everything worked as specified. Then we where told that our deliveries didn't work, because that wasn't what they'd asked for, but couldn't tell us where we misunderstood the Jira task. Plus the tests showed that the code functioned as specified.
Even if the Jira tasks are in a state where it seems like you could feed them directly to an LLM, there's no context (or incorrect context) and how is a chatbot to know that the author of the task is a moron?
I write a library which is used by customers to implement integrations with our platform. The #1 thing I think about is not
> How do I express this code in Typescript?
it's
> What is the best way to express this idea in a way that won't confuse or anger our users? Where in the library should I put this new idea? Upstream of X? Downstream of Y? How do I make it flexible so they can choose how to integrate this? Or maybe I don't want to make it flexible - maybe I want to force them to use this new format?
> Plus making sure that whatever changes I make are non-breaking, which means that if I update some function with new parameters, they need to be made optional, so now I need to remember, downstream, that this particular argument may or may not be `undefined` because I don't want to break implementations from customers who just upgraded the most recent minor or patch version
The majority of the problems I solve are philosophical, not linguistic
> very few people can correctly articulate requirements
The observation from Lean is that the faster you can build a prototype, the faster you can validate the real/unspoken/unclear requirements.
This applies for backends too. A lot of the “enterprise-y” patterns like BFFs, hexagonal, and so on, will make it really easy to compose new APIs from your building blocks. We don’t do this now because it’s too expensive to write all the boilerplate involved. But one BFF microservice per customer would be totally feasible for a sales engineer to vibe code, in the right architecture.
> the bigger bottleneck to productivity is that very few people can correctly articulate requirements.
One could argue that "vibe coding" forces you (eventually) to think in terms of requirements. There's a range of approaches, from "nitpick over every line written by AI" to "yolo this entire thing", but one thing they have in common is they all accelerate failure if the specs are not there. You very quickly find out you don't know where you're going.
I see this in my work as well, the biggest bottleneck is squeezing coherent, well-defined requirements out of PMs. It's easy to get a vision board, endless stacks of slides about priorities and direction, even great big nests of AWS / Azure thingnames masquerading as architecture diagrams. But actual "this is the functionality we want to implement and here are the key characteristics of it" detail? Absolutely scarce.
To be honest I've never worked in an environment that seemed too complex. On my side my primary blocker is writing code. I have an unending list of features, protocols, experiments, etc. to implement, and so far the main limit was the time necessary to actually write the damn code.
That sounds like papier mache more than bridge building, forever pasting more code on as ideas and time permit without the foresight to engineer or architect towards some cohesive long-term vision.
Most software products built that way seem to move fast at first but become monstrous abominations over time. If those are the only places you keep finding yourself in, be careful!
I don’t want to imply this is your case, because of course I’ve no idea how you work. But I’ve seen way too often, the reason for so many separate features is:
A) as stated by parent comment, the ones doing req. mngmt. Are doing a poor job of abstracting the requirements, and what could be done as one feature suddenly turns in 25.
B) in a similar manner as A, all solutions imply writing more and more code, and never refactor and abstract parts away.
Is there anyone doing dev work that operates in an environment where people can clearly articulate what they want? I've never worked in a place like that in 20 years doing software.
In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements. [...] software development is a lot more than coding and requires critical thinking to discover the requirements.
Very well said. More often than not, the job isn't to translate the product requirements into compiling/correctly executing computer code, but rather to reveal the hidden contradictions in a seemingly straightforward natural-language feature specification.
Once these are ironed out, the translation into code quite often does become a somewhat mechanical exercise, at least in my line of work.
We're basically the lawyers the person finding the magic lamp should have consulted with before opening their mouth while facing the genie ;)
I don’t mind the coding, it’s the requirements gathering and status meetings I want AI to automate away. Those are the parts I don’t like and where we’d see the biggest productivity gains. They are also the hardest to solve for, because so much of it is subjective. It also often involves decisions from leadership which can come with a lot of personal bias and occasionally some ego.
I don't think the author would disagree with you. Ad you point out coding is just one part of software development. I understand his point to be that the coding portion of the job is going to be very different going forward. A skilled developer is still going to need to understand frameworks and tradeoffs so that they can turn requirements into a potential solution. It just they might not be coding up the implementation.
Yeah, the hardest part is understanding the requirements. But it then still takes hours and hours and hours to actually build the damn thing.
Except that now it still takes me the same time to understand the requirements ... and then the coding takes 1/2 or 1/3 of the time. The coding also always takes 1/3 of the effort so I leave my job less burned out.
Context: web app development agency.
I really don't understand this "if it does not replace me 100% it's not making me more productive" mentality. Yeah, it's not a perfect replacement for a senior developer ... but it is like putting the senior developer on a bike and pretending that it's not making them go any faster because they are still using their legs.
I constantly run into issues where features are planned and broken down outside-in, and it always makes perfect sense if you consider it in terms of the pure user interface and behaviour. It completely breaks down when you consider the API, or the backend, is a cross-cutting concern across many of those tidy looking tasks and cannot map to them 1:1 without creating an absolute mess.
Trying to insert myself, or the right backend people, into the process, is more challenging now than it used to be, and a bad API can make or break the user experience as the UI gets tangled in the web of spaghetti.
It hobbles the effectiveness of whatever you could get an LLM to do because you’re already starting on the backfoot, requirements-wise.
This is one reason I think spec driven development is never really going to work the way people claim it should. It's MUCH harder to write a truly correct, comprehensive, and useful spec than the code in many cases.
I like my requirements articulated so clearly and unambiguously that an extremely dumb electronic logic machine can follow every aspect of the requirements and implement them "perfectly" (limited only by the physical reliability of the machine).
This means your difficulty is not programming per se, but that you are working on a very suboptimal industry / company / system. With all due respect, you use programming at work, but true programming is the act of creating a system that you or your team designed and want to make alive. Confusing the reality of writing code for a living in some company with what Programming with capitalized P is, produces a lot of misunderstanding.
>In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements.
Agreed.
In addition, on the other side of the pipeline, code reviews are another bottleneck. We could have more MRs in review thanks to AI, but we can't really move at the speed of LLM's outputs unless we blindly trust it (or trust another AI to do the reviews, at which point what are we doing here at all...)
> very few people can correctly articulate requirements
This is the new programming. Programming and requirements are both a form of semantics. One conveys meaning to a computer at a lower level, the other conveys it to a human at a higher level. Well now we need to convey it at a higher level to an LLM so it can take care of the lower-level translation.
I wonder if the LLM will eventually skip the programming part and just start moving bits around in response to requirements?
My solution as a consultant was to build some artifact that we could use as a starting point. Otherwise, you're sitting around spinning your wheels and billing big $ and the pressure is mounting. Building something at least allows you to demonstrate you are working on their behalf with the promise that it will be refined or completely changed as needed. It's very hard when you don't get people who can send down requirements, but that was like 100% of the places I worked. I very seldom ran into people who could articulate what they needed until I stepped up, showed them something they could sort of stand on, and then go from there.
Mythical Man Month had it all--build one to throw away.
The solo projects I do are 10x, the team projects I do maybe 2-3x in productivity. I think in big companies it's much much less.
Highest gains are def in full stack frameworks (like nextjs), with Database ORM, and building large features in one go, not having to go back & forth with stakeholders or collegues.
This feels like addressing a point TFA did not make. TFA talks mostly about vibe-coding speeding up coding, whereas your comment is about software development as a whole. As you point out, coding is just one aspect of engineering and we must be clear about what "productivity" we are talking about.
Sure, there are the overhypers who talk about software engineers getting entirely replaced, but I get the sense those are not people who've ever done software development in their lives. And I have not seen any credible person claiming that engineering as whole can be done by AI.
On the other hand, the most grounded comments about AI-assisted programming everywhere are about the code, and maybe some architecture and design aspects. I personally, along with many other commenters here and actual large-scale studies, have found that AI does significantly boost coding productivity.
So yes, actual software engineering is much more than coding. But note that even if coding is, say, only 25% of engineering (there are actually studies about this), putting a significant dent in that is still a huge boost to overall productivity.
Convince your PMs to use an LLM to help "breadboard" their requirements. It's a really good use case. They can ask their dumb questions they are afraid to and an LLM will do a decent job of parsing their ideas, asking questions, and putting together a halfway decent set of requirements.
PMs wouldn't be able to ask the right questions. They have zero experience with developer experience (DevEx) and they only have experience with user experience (UX).
Sounds like you work with inexperienced PMs that are not doing their job, did you try having a serious conversation about this pattern with them? I'm pretty sure some communication would go a long way towards getting you on a better collaboration groove.
I've been doing API development for over ten years and worked at different companies. Most PMs are not technical and it's the development team's job figure out the technical specifications for APIs we build. If you press the PMs, they will ask the engineering/development manager for the written technical requirements, and if the manager is not technical, they will assign it to the developers/engineers. Technical requirements for an API are really a system design question.
and in reality, all the separate roles should be deprecated
we vibe requirements to our ticket tracker with an api key, vibe code ticket effort, and manage the state of the tickets via our commits and pull requests and deployments
just teach the guy the product manager is shielding you from not to micromanage and all the frictions are gone
in this same year I've worked at an organization that didn't allow AI use at all, and by Q2, Co-Pilot was somehow solving their data security concerns (gigglesnort)
in a different organization none of those restrictions are there and the productivity boost is through an order of magnitude greater
This is like saying the typewriter won’t make a newspaper company more productive because the biggest bottlenecks are the research and review processes rather than the typing. It’s absolutely true, but it was still worth it to go up to typewriters, and the fact that people were spending less effort and time on the handwriting part helps all aspects of energy levels etc across their job.
The only class I've ever failed was a c++ class where the instructor was so terrible at explaining the tasks that I literally could not figure out what he wanted.
I had to retake it with the same instructor but by some luck I was able to take it online, where I would spend the majority of the time trying to decipher what he was asking me to do.
Ultimately I found that the actual ask was being given as a 3 second aside in a 50 minute lecture. Once I figured out his quirk I was able to isolate the ask and code it up, ended with an A+ in the class on the second take.
I would like to say that I learned a lot about programming from that teacher, but what I actually learned is what you're saying.
Smart, educated, capable people are broken when it comes to clearly communicating their needs to other people just slightly outside of their domain. If you can learn the skill of figuring out what the hell they're asking for and delivering that, that one skill will be more valuable to you in your career than competency itself.
If AI doesn't make you more productive you're using it wrong, end of story.
Even if you don't let it author or write a single line of code, from collecting information, inspecting code, reviewing requirements, reviewing PRs, finding bugs, hell even researching information online, there's so many things it does well and fast that if you're not leveraging it, you're either in denial or have ai skill issues period.
Not to refute your point but I’ve met overly confident people with “AI skills” who are “extremely productive” with it, while producing garbage without knowing, or not being able to tell the difference.
It sounds like you're the one in denial? AI makes some things faster, like working in a language I don't know very well. It makes other things slower, like working in a language I already know very well. In both cases, writing code is a small percentage of the total development effort.
I have successfully vibe-coded features in C. I still don't like C. The agent forgets to free memory latter just like a human would and has to go back and fix it later.
On the other hand, I've enjoyed vibe coding Rust more, because I'm interested in Rust and felt like my understanding approved along they way as I saw what code was produced.
A lot of coding "talent" isn't skill with the language, it's learning all the particularities of the dependencies: The details of the Smithay package in Rust, the complex set of GTK modules or the Wayland protocol implementation.
On a good day, AI can help navigate all that "book knowledge" faster.
Something I've noticed that I never really see called out is how easy it is to review rust code diffs. I spent a lot of my career maintaining company internal forks of large open source C programs, but recently have been working in rust. The things I spent a lot of time chasing down while reviewing C code diffs, particularly of newer team members, is if they paid attention to all the memory assumptions that were non-local to the change they made. Eg. I'd ask them "the way you called this function implies it _always_ frees the memory behind that char*. Is that the case?" If they didn't know the answer immediately I'd be worried and spend a lot more time investigating the change before approving.
With rust, what I see is generally what I get. I'm not worried about heisenbug gotchas lurking in innocent looking changes. If someone is going to be vibe coding, and truly doesn't care about the language the product ends up in, they might as well do it in a language that has rigid guardrails.
How do LLMs deal with Rust (compared to other languages)? I think this might actually be the time to finally give the language a try. LLMs really lowered the barrier for staying productive while learning.
It's really funny how much better the AI is at writing python and javascript than it is C/C++. For one thing it proves the point that those languages really are just way harder to write. And another thing, it's funny that the AI makes the exact same mistakes a human would in C++. I don't know if it's that the AI was trained on human mistakes, or just that these languages have such strong wells of footguns that even an alien intelligence gets trapped in them.
So in essense I have to disagree with the author's suggestion to vibe code in C instead of Python. I think the python usability features that were made for humans actually help the AI the exact same ways.
There are all kinds of other ways that vibe coding should change one's design though. It's way easier now to roll your own version of some UI or utility library instead of importing one to save time. It's way easier now to drop down into C++ for a critical section and have the AI handle the annoying data marshalling. Things like that are the real unlock in my opinion.
> I don't know if it's that the AI was trained on human mistakes, or just that these languages have such strong wells of footguns that even an alien intelligence gets trapped in them.
First one. Most of C code you can find out there is either oneliners or shit, there are fewer bigger projects for the LLMs to train on, compared to python and typescript
And once we go to the embedded space, the LLMs are trained on manufacturer written/autogenerated code, which is usually full of inaccuracies (mismatched comments) bugs and bat practices
> It's really funny how much better the AI is at writing python and javascript than it is C/C++. For one thing it proves the point that those languages really are just way harder to write.
I have not found this to be the case. I mean, yeah, they're really good with Python and yeah that's a lot easier, but I had one recently (IIRC it was the pre-release GPT5.1) code me up a simulator for a kind of a microcoded state machine in C++ and it did amazingly well - almost in one-shot. It can single-step through the microcode, examine IOs, allows you to set input values, etc. I was quite impressed. (I had asked it to look at the C code for a compiler that targets this microcoded state machine in addition to some Verilog that implements the machine in order for it to figure out what the simulator should be doing). I didn't have high expectations going in, but was very pleasantly surprised to have a working simulator with single-stepping capabilities within an afternoon all in what seems to be pretty-well written C++.
> I have successfully vibe-coded features in C. I still don't like C.
Same here. I've been vibe-coding in C for the sake of others in my group who only know C (no C++ or Rust). And I have to say that the agent did do pretty well with memory management. There were some early problems, but it was able to debug them pretty quickly (and certainly if I had had to dig into the intricacies of GDB to do that on my own, it would've taken a lot longer). I'm glad that it takes care of things like memory management and dealing with strings in C (things that I do not find pleasant).
Lately I have learned assembly more deeply and I sometimes let an AI code up the same thing I did just to compare.
Not that my own code is good but every single time assembly output from an optimizing compiler beats the AI as it "forgets" about all the little tricks involved.
However it may still be about how I prompt it. If I tell it to solve the actual challenge in assembly it does do that, it's just not good or efficient code.
On the other hand because I take the time to proof read it I learn from it's mistakes just as I would from my own.
Why would I want to have an extra thing to maintain, on top of having to manually review, debug, and write tests for a language I don't like that much?
Sure. Or you can let the language do that for you and spend your tokens on something else. Like, do you want your LLM to generate LLVM byte code? It could, right? Buy why wouldn't you let the compiler do that?
well,glib is terrible for anything important, it's really just for desktop apps.
when there is a mem error, glib does not really handle it,it just aborts. ok for desktop, not ok for anything else.
There was a recent discussion, “Why AI Needs Hard Rules, Not Vibe Checks” (https://news.ycombinator.com/item?id=46152838).
We need as many checks as possible - and ideally ones that come for free (e.g., guaranteed by types, lifetimes, etc.) - which is why Rust might be the language for vibe coding.
Without checks and feedback, LLMs can easily generate unsafe code. So even if they can generate C or Assembly that works, they’re likely to produce code that’s riddled with incorrect edge cases, memory leaks, and so on.
Also, abstraction isn’t only for humans; it’s also for LLMs. Sure, they might benefit from different kinds of abstraction - but that doesn’t mean “oh, just write machine code” is the way to go.
It makes me imagine a programming language designed for LLMs but not humans, designed for rigorous specification of every function, variable, type, etc., valid inputs and outputs, tightly coupled to unit tests, mandatory explicit handling of every exception, etc.
Maybe it'll look like a lot of boilerplate but make it easy to read as opposed to easy to write.
The idea of a language that is extremely high-effort to write, but massively assists in guaranteeing correctness, could be ideal for LLM's.
I'm writing one of these, I'll post it on HN next year. The key to a language for LLMs is: make sure all the context is local, and explicit. If you have functions, use parameters for arguments instead of positions. If you have types, spell them out right there. Also, don't use too many tokens, so keywords are out. And that's just a start.
I think the ideal language for LLMs will look more like APL than C.
Feel like you can also achieve pretty correct programs using good tests. Like really good tests.
I’m thinking of generative tests (quickcheck style), fuzzing, erroring on invariants, contract testing (see the test.contract clojure library for a very cool contract test setup!).
Really really good test suites can do stuff that even logically verified programs can’t do. They’re just a pain in the ass to write. Seems like a good use of LLMs, and you can keep using the same languages!
I find it's really nice to just have Claude run the compilers and linters when it's done making a change, as it often has some mistakes and will catch them at this step. It lets me step in for review after some trivially stupid thing is fixed up, rather than wasting my own time.
To go along with this, the ACM has a recent article on Automatically Translating C to Rust. It gets into the challenges of 'understanding code and structure' so that the end result reflects the intent of the code, not the actual execution paths.
> Rust doesn't prevent programs from having logic errors.
Sure, but it prevents memory safety issues, which C doesn't. As for logic bugs, what does prevent them? That's a bigger question but I'd suggest it's:
1. The ability to model your problem in a way that can be "checked". This is usually done via type systems, and Rust has an arguably good type system for this.
2. Tests that allow you to model your problem in terms of assertions. Rust has decent testing tooling but it's not amazing, and I think this is actually a strike against Rust to a degree. That said, proptest, fuzzing, debug assertions, etc, are all present and available for Rust developers.
There are other options like using external modeling tools like TLA+ but those are decoupled from your language, all you can ever do is prove that your algorithm as specified is correct, not the code you wrote - type systems are a better tool to some degree in that way.
I think that if you were to ask an LLM to write very correct code then give two languages, one with a powerful, express type system and testing utilities, and one without those, then the LLM would be far more likely to produce buggy code in the system without those features.
> Rust doesn't prevent programs from having logic errors.
Like everything around Rust, this has been discussed ad nauseam.
Preventing memory safety bugs has a meaningful impact in reducing CVEs, even if it has no impact on logic bugs. (Which: I think you could argue the flexible and expressive type system helps with. But for the sake of this argument, let's say it provides no benefits.)
Modern medicine can't prevent or cure all diseases, so you might as well go back to drinking mercury then rubbing dog shit into your wounds.
Modern sewers sometimes back up, so might as well just releive yourself in a bucket and dump it into your sidewalk.
Modern food preservation doesn't prevent all spoilage so you might as well just go back to hoping that meat hasn't been sitting in the sun for too many days.
You can't get a gutter ball if you put up the rails in a bowling lane. Rust's memory safety is the rails here.
You might get different "bad code" from AI, but if it can self-validate that some code it spits out has memory management issues at compile time, it helps the development. Same as with a human.
> abstraction isn’t only for humans; it’s also for LLMs.
Bingo. LLMs are language models, not models of software systems. Everything gets translated through natural language! So the quality of the abstraction still matters: code that can be described well in plain language wins.
>We need as many checks as possible - and ideally ones that come for free (e.g., guaranteed by types, lifetimes, etc.) - which is why Rust might be the language for vibe coding.
Checking preconditions and postconditions is much easier to do for a human than checking an implementation
The thing that would really make sense is a proved language like Coq or Promela
You can then really just leave the implementation to the AI.
I really tried to get into the vibe coding thing - just describe the thing I need in human language and let the agent figure it out. It was incredible at first. Then I realized that I am spending a lot of time writing clarifications because the agent either forgot or misinterpreted something. Then I realized that I am waiting an awful long time for each agent step to complete just to write another correction or clarification. Then I realized that this constant start-stop process is literally melting my brain and making me unable to do any real work myself. It's basically having the same effect as scrolling any other algorithmic feed. Now I am back to programming myself and only bouncing the boring bits off of ChatGPT.
> Then I realized that this constant start-stop process is literally melting my brain and making me unable to do any real work myself. It's basically having the same effect as scrolling any other algorithmic feed
Yes, it’s extremely soul sucking. With the added disadvantage of not teaching me anything.
One thing that helps is to write an AGENTS.md file that encodes the knowledge and tricks you have of the codebase, like running a single test (faster feedback cycles), common coding patterns, examples, etc.
I went full meta and sketched out a file, then had an expensive LLM go through the codebase and write such a file. I don't know if it's any good though, I only really use coding assistants to write unit tests.
One trick I have tried is asking the LLM to output a specification of the thing we are in the middle of building. A commenter above said humans struggle with writing good requirements - LLMs have trouble following good requirements - ALL of them - often forgetting important things while scrambling to address your latest concern.
Getting it to output a spec lets me correct the spec, reload the browser tab to speed things up, or move to a different AI.
I very much doubt the ability of LLMs to provide leak-free, faulty memory management free, C code, because they are trained on loads of bad code in that regard. They will not output code of the quality that maybe 1% of C developers could, if even that many. Fact is, that even well paid and professional C/C++ developers introduce memory management issues in such code bases (see Chromium project statistics about this). So chances to get good C programs from LLMs, which learn from far lower quality code than Chromium, are probably very slim.
Vibe-coding a program that segfaults and you don't know why and you keep burning compute on that? Doesn't seem like a great idea.
>Is C the ideal language for vibe coding? I think I could mount an argument for why it is not, but surely Rust is even less ideal.
I've been using Rust with LLMs for a long time (mid-2023?) now; cargo check and the cargo package system make it very easy for LLMs to check their work and produce high quality code that almost never breaks, and always compiles.
My favorite use for LLMs with Rust is using them as a macro debugger; they provide better error messages than the errors Cargo can provide. It's cool to take a macro and ask the LLM to do an expansion of it, to see what it would look like. Or, to take Rust code and ask the LLM to create a macro for it.
I agree that an LLM may make mistakes.
But one advantage is, that you can also allocate resources for it to try and find its own mistakes. You can do this humans, but the grind wears away at them. Since this doesn't really happen with an LLM, it's pretty decent at catching it's own mistakes too.
Well, most LLM are fine tuned over higher quality data, this is kind of how they've kept improving them amongst other things.
The first pass is to learn the fundamentals of language, and then it is refined on curated datasets, so you could refine them on high quality curated C code.
> But if you look carefully, you will notice that it doesn’t struggle with undefined behavior in C. Or with making sure that all memory is properly freed. Or with off-by-one errors.
Doubt. These things have been trained to emulate humans, why wouldn't they make the same mistakes that humans do? (Yes, they don't make spelling errors, but most published essays etc. don't have spelling errors, whereas most published C codebases do have undefined behaviour).
It's incorrect to think because it is trained on buggy human code it will make these mistakes. It predicts the most likely token. Let's say 100 programmers write a function, most (unless it's something very tricky), won't forget to free that particular function. So the most likely tokens are those which do not leak.
In addition, this is not GPT 3. There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks). And also a massive amount of synthetic data which can also be generated in a way that is provably correct.
> Let's say 100 programmers write a function, most (unless it's something very tricky), won't forget to free that particular function. So the most likely tokens are those which do not leak.
You don't free a function.
And this would only be true if the function is the same content with minor variations, which is why LLMs are better suited for very small examples. Because bigger examples are less likely to be semantically similar, and so there is less data to determine the "correct" next token.
> There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks)
This is a really dubious claim. Where are you getting this? Do you have some information on how these models are trained on C code specifically? How do you know whether the code they train on has no leaks?
There are huge projects that everyone depends on that have memory bugs in them right now. And these are actual experts missing these bugs, what makes you think the people at OpenAI are creating safer data than the people whose livelihoods actually depend on it?
This thread is full of people sharing how easy it is to make memory bugs with an LLM, and that has been my experience as well.
I'm not very experienced with C++ at all but Sonnet in Copilot/Copilot Chat was able to create entire files with no memory errors on the first try for me, and it was very adept at hunting down memory errors (they were always my own fault) from even just vague descriptions of crashes.
How do you know? I can believe that they didn't show memory errors in a quick test run on a common architecture with a common compiler, much like most human-written code in the training corpus.
I've had issues with Claude and memory related bugs in C. Maybe small programs or prototypes it's fine if you can verify the output or all the expected inputs, but the moment the context is >50k lines or even doing something with pthreads, you run into the exact same problems as humans.
I think Claude would do much better with tools provided by modern C++ or Zig than C, frankly, anyways. Or even better, like the Rust people have helpfully mentioned, Rust.
> if vibe coding is the future of software development (and it is), then why bother with languages that were designed for people who are not vibe coding? Shouldn’t there be such a thing as a “vibe-oriented programming language?” VOP.
A language designed for vibe coding could certainly be useful, but what that means is the opposite of what the author thinks that means.
The author thinks that such a language wouldn't need to have lots of high-level features and structure, since those are things that exist for human comprehension.
But actually, the opposite is true. If you're designing a language for LLMs, the language should be extremely strict and wordy and inconvenient and verbose. You should have to organize your code in a certain way, and be forced to check every condition, catch every error, consider every edge case, or the code won't compile.
Such a language would aggravate a human, but a machine wouldn't care. And LLMs would benefit from the rigidness, as it would help prevent any confusion or hallucination from causing bugs in the finished software.
I don't think there is a need for an output language here at all, the LLM can read and write bits into executables directly to flip transistors on and off. The real question is how the input language (i.e. prompts) look like. There is still a need for humans to describe concepts for the machine to code into the executable, because humans are the consumers of these systems.
> the LLM can read and write bits into executables directly to flip transistors on and off
No, that's the problem (same misconception the author has) - it can't. At least not reliably. If you give an LLM free rein with a non-memory safe output format, it will make the exact same mistakes a human would.
The point of a verbose language is to create extensive guardrails. Which the LLM won't be annoyed by, unlike a human developer.
Because I want to be able to review it, and extend it myself.
edit: Pure vibe coding is a joke or thought exercise, not a goal to aspire to. Do you want to depend on a product that has not been vetted by any human? And if it is your product, do you want the risk of selling it?
I can imagine a future where AI coders and AI QA bots do all the work but we are not there yet. Besides, an expressive language with safety features is good for bots too.
In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements. I work in backend, API development, which is completely different from fullstack development with backend development. If you ask PMs about backend requirements, they will dodge you, and if you ask front-end or web developers, they are waiting for you to provide them the API. The hardest part is understanding the requirements. It's not because of illiteracy. It's because software development is a lot more than coding and requires critical thinking to discover the requirements.
As a Professor of English who teaches programming to humanities students, the writer has had an extremely interesting and unusual academic career [1]. He sounds awesome, but I think it's fair to suggest he may not have much experience of large scale commercial software development or be particularly well placed to predict what will or will not work in that environment. (Not that he necessarily claims to, but it's implicit in strong predictions about what the "future of programming" will be.)
[1] https://stephenramsay.net/about/
That said, I think people really under appreciate how diverse programmers actually are. I started in physics and came over when I went to grad school. While I wouldn't expect a physicist to do super well on leetcode problems I've seen those same people write incredible code that's optimized for HPC systems and they're really good at tracing bottlenecks (it's a skill that translates from physics really really well). Hell, the best programmer I've ever met got that way because he was doing his PhD in mechanical engineering. He's practically the leading expert in data streaming for HPC systems and gained this skill because he needed more performance for his other work.
There's a lot of different types of programmers out there but I think it's too easy to think the field is narrow.
I very much enjoy the act of programming, but I'm also a professional software developer. Incidentally, I've almost always worked in fields where subtly wrong answers could get someone hurt or killed. I just can't imagine either giving up my joy in the former case or abdicating my responsibility to understand my code in the latter.
And this is why the wood working analogy falls down. The scale at which damage can occur due to the decision to use power tools over hand tools is, for most practical purposes, limited to just myself. With computers, we can share our fuck ups with the whole world.
That is the strangest thing I've heard today.
I find the LLMs boost my productivity because I've always had a sort of architectural mindset, I love looking up projects that solve specific problems and keeping them on the back of my mind, turns out I was building myself up for instructing LLMs on how to build me software, and it takes several months worth of effort and spits it out in a few hours.
Speaking of vibe coding in archaic languages, I'm using LLMs to understand old Shockwave Lingo to translate it to a more modern language, so I can rebuild a legacy game in a modern language. Maybe once I spin up my blog again I'll start documenting that fun journey.
Well, I think we can say C is archaic when most developers write in something that for one isn't C, two isn't a language itself written in C, or three isn't running on something written in C :)
- SSL 2.0-TLS 1.1, HTTP/0.9-HTTP/1.1, ftp, WAIS, gopher, finger, telnet, rwho, TinyFugue MUD, UUCP email, SHOUTcast streaming some public domain radio whatever
- <blink>, <marquee>, <object>, XHTML, SGML
- Java <applet>, Java Web Start
- MSJVM/J++, ActiveX, Silverlight
- Flash, Shockwave (of course), Adobe Air
- (Cosmo) VRML
- Joke ActiveX control or toolbar that turns a Win 9x/NT-XP box into a "real" ProgressBar95. ;)
(Gov't mandated PSA: Run vintage {good,bad}ness with care.)
lol
The converse is that if vibe coding is the future, that means we assume there are things the AI cannot do well (such as come up with requirements), at which point it's also likely it cannot actually vibe code that well.
The general problem is that once we start talking about imagined AI capabilities, both the capabilities and the constraints become arbitrary. If we imagine an AI that does X but not Y, we could just as easily imagine an AI that does both X and Y.
They already can brainstorm new features and make roadmaps. If you give them more context about the business strategy/goals then they will make better guesses. If you give them more details about the user personas / feedback / etc they will prioritize better.
We're still just working our way up the ladder of systematizing that context, building better abstractions, workflows, etc.
If you were to start a new company with an AI assistant and feed it every piece of information (which it structures / summarizes synthesizes etc in a systematic way) even with finite context it's going to be damn good. I mean just imagine a system that can continuously read and structure all the data from regular news, market reports, competitor press releases, public user forums, sales call transcripts, etc etc. It's the dream of "big data".
But I think it is certainly possible that we reach a point/plateau where everything is just 'english -> code' compilation but that 'vibe coding' compilation step is really really good.
The guy is also a complete tool. I'd point out that what he described wasn't actually what they needed, and that there functionality was ... strange and didn't actually do anything useful. We'd be told to just do as we where being told, seeing as they where the ones paying the bills. Sometimes we'd read between the lines, and just deliver what was actually needed, then we'd be told just do as we where told next time, and they'd then use the code we wrote anyway. At some point we got tired of the complaining and just did exactly as the tasks described, complete with tests that showed that everything worked as specified. Then we where told that our deliveries didn't work, because that wasn't what they'd asked for, but couldn't tell us where we misunderstood the Jira task. Plus the tests showed that the code functioned as specified.
Even if the Jira tasks are in a state where it seems like you could feed them directly to an LLM, there's no context (or incorrect context) and how is a chatbot to know that the author of the task is a moron?
It can make a vague ticket precise and that can be an easy platform to have discussions with stakeholders.
Agentic AI can now do 20 rounds of lobbying with all stake holders as long as it’s over something like slack.
> How do I express this code in Typescript?
it's
> What is the best way to express this idea in a way that won't confuse or anger our users? Where in the library should I put this new idea? Upstream of X? Downstream of Y? How do I make it flexible so they can choose how to integrate this? Or maybe I don't want to make it flexible - maybe I want to force them to use this new format?
> Plus making sure that whatever changes I make are non-breaking, which means that if I update some function with new parameters, they need to be made optional, so now I need to remember, downstream, that this particular argument may or may not be `undefined` because I don't want to break implementations from customers who just upgraded the most recent minor or patch version
The majority of the problems I solve are philosophical, not linguistic
The observation from Lean is that the faster you can build a prototype, the faster you can validate the real/unspoken/unclear requirements.
This applies for backends too. A lot of the “enterprise-y” patterns like BFFs, hexagonal, and so on, will make it really easy to compose new APIs from your building blocks. We don’t do this now because it’s too expensive to write all the boilerplate involved. But one BFF microservice per customer would be totally feasible for a sales engineer to vibe code, in the right architecture.
One could argue that "vibe coding" forces you (eventually) to think in terms of requirements. There's a range of approaches, from "nitpick over every line written by AI" to "yolo this entire thing", but one thing they have in common is they all accelerate failure if the specs are not there. You very quickly find out you don't know where you're going.
I see this in my work as well, the biggest bottleneck is squeezing coherent, well-defined requirements out of PMs. It's easy to get a vision board, endless stacks of slides about priorities and direction, even great big nests of AWS / Azure thingnames masquerading as architecture diagrams. But actual "this is the functionality we want to implement and here are the key characteristics of it" detail? Absolutely scarce.
Most software products built that way seem to move fast at first but become monstrous abominations over time. If those are the only places you keep finding yourself in, be careful!
A) as stated by parent comment, the ones doing req. mngmt. Are doing a poor job of abstracting the requirements, and what could be done as one feature suddenly turns in 25.
B) in a similar manner as A, all solutions imply writing more and more code, and never refactor and abstract parts away.
Lots of people hide the fact that they struggle with reading and a lot of people hide or try to hide the fact they don’t understand something.
Very well said. More often than not, the job isn't to translate the product requirements into compiling/correctly executing computer code, but rather to reveal the hidden contradictions in a seemingly straightforward natural-language feature specification.
Once these are ironed out, the translation into code quite often does become a somewhat mechanical exercise, at least in my line of work.
We're basically the lawyers the person finding the magic lamp should have consulted with before opening their mouth while facing the genie ;)
Except that now it still takes me the same time to understand the requirements ... and then the coding takes 1/2 or 1/3 of the time. The coding also always takes 1/3 of the effort so I leave my job less burned out.
Context: web app development agency.
I really don't understand this "if it does not replace me 100% it's not making me more productive" mentality. Yeah, it's not a perfect replacement for a senior developer ... but it is like putting the senior developer on a bike and pretending that it's not making them go any faster because they are still using their legs.
Trying to insert myself, or the right backend people, into the process, is more challenging now than it used to be, and a bad API can make or break the user experience as the UI gets tangled in the web of spaghetti.
It hobbles the effectiveness of whatever you could get an LLM to do because you’re already starting on the backfoot, requirements-wise.
Agreed.
In addition, on the other side of the pipeline, code reviews are another bottleneck. We could have more MRs in review thanks to AI, but we can't really move at the speed of LLM's outputs unless we blindly trust it (or trust another AI to do the reviews, at which point what are we doing here at all...)
This is the new programming. Programming and requirements are both a form of semantics. One conveys meaning to a computer at a lower level, the other conveys it to a human at a higher level. Well now we need to convey it at a higher level to an LLM so it can take care of the lower-level translation.
I wonder if the LLM will eventually skip the programming part and just start moving bits around in response to requirements?
Mythical Man Month had it all--build one to throw away.
Highest gains are def in full stack frameworks (like nextjs), with Database ORM, and building large features in one go, not having to go back & forth with stakeholders or collegues.
Sure, there are the overhypers who talk about software engineers getting entirely replaced, but I get the sense those are not people who've ever done software development in their lives. And I have not seen any credible person claiming that engineering as whole can be done by AI.
On the other hand, the most grounded comments about AI-assisted programming everywhere are about the code, and maybe some architecture and design aspects. I personally, along with many other commenters here and actual large-scale studies, have found that AI does significantly boost coding productivity.
So yes, actual software engineering is much more than coding. But note that even if coding is, say, only 25% of engineering (there are actually studies about this), putting a significant dent in that is still a huge boost to overall productivity.
I'm the last guy to be enthused about any "ritualistic" seeming businessy processes. Just let me code...
However, some things do need actually well defined adhered to processes where all parties are aware of and agreeing with the protocol.
I've found the same way. I just published an AI AUP for my company and most of it is teaching folks HOW to use AI.
we vibe requirements to our ticket tracker with an api key, vibe code ticket effort, and manage the state of the tickets via our commits and pull requests and deployments
just teach the guy the product manager is shielding you from not to micromanage and all the frictions are gone
in this same year I've worked at an organization that didn't allow AI use at all, and by Q2, Co-Pilot was somehow solving their data security concerns (gigglesnort)
in a different organization none of those restrictions are there and the productivity boost is through an order of magnitude greater
I had to retake it with the same instructor but by some luck I was able to take it online, where I would spend the majority of the time trying to decipher what he was asking me to do.
Ultimately I found that the actual ask was being given as a 3 second aside in a 50 minute lecture. Once I figured out his quirk I was able to isolate the ask and code it up, ended with an A+ in the class on the second take.
I would like to say that I learned a lot about programming from that teacher, but what I actually learned is what you're saying.
Smart, educated, capable people are broken when it comes to clearly communicating their needs to other people just slightly outside of their domain. If you can learn the skill of figuring out what the hell they're asking for and delivering that, that one skill will be more valuable to you in your career than competency itself.
Which is what vibe coders are.....
Even if you don't let it author or write a single line of code, from collecting information, inspecting code, reviewing requirements, reviewing PRs, finding bugs, hell even researching information online, there's so many things it does well and fast that if you're not leveraging it, you're either in denial or have ai skill issues period.
On the other hand, I've enjoyed vibe coding Rust more, because I'm interested in Rust and felt like my understanding approved along they way as I saw what code was produced.
A lot of coding "talent" isn't skill with the language, it's learning all the particularities of the dependencies: The details of the Smithay package in Rust, the complex set of GTK modules or the Wayland protocol implementation.
On a good day, AI can help navigate all that "book knowledge" faster.
With rust, what I see is generally what I get. I'm not worried about heisenbug gotchas lurking in innocent looking changes. If someone is going to be vibe coding, and truly doesn't care about the language the product ends up in, they might as well do it in a language that has rigid guardrails.
So in essense I have to disagree with the author's suggestion to vibe code in C instead of Python. I think the python usability features that were made for humans actually help the AI the exact same ways.
There are all kinds of other ways that vibe coding should change one's design though. It's way easier now to roll your own version of some UI or utility library instead of importing one to save time. It's way easier now to drop down into C++ for a critical section and have the AI handle the annoying data marshalling. Things like that are the real unlock in my opinion.
First one. Most of C code you can find out there is either oneliners or shit, there are fewer bigger projects for the LLMs to train on, compared to python and typescript
And once we go to the embedded space, the LLMs are trained on manufacturer written/autogenerated code, which is usually full of inaccuracies (mismatched comments) bugs and bat practices
I have not found this to be the case. I mean, yeah, they're really good with Python and yeah that's a lot easier, but I had one recently (IIRC it was the pre-release GPT5.1) code me up a simulator for a kind of a microcoded state machine in C++ and it did amazingly well - almost in one-shot. It can single-step through the microcode, examine IOs, allows you to set input values, etc. I was quite impressed. (I had asked it to look at the C code for a compiler that targets this microcoded state machine in addition to some Verilog that implements the machine in order for it to figure out what the simulator should be doing). I didn't have high expectations going in, but was very pleasantly surprised to have a working simulator with single-stepping capabilities within an afternoon all in what seems to be pretty-well written C++.
Same here. I've been vibe-coding in C for the sake of others in my group who only know C (no C++ or Rust). And I have to say that the agent did do pretty well with memory management. There were some early problems, but it was able to debug them pretty quickly (and certainly if I had had to dig into the intricacies of GDB to do that on my own, it would've taken a lot longer). I'm glad that it takes care of things like memory management and dealing with strings in C (things that I do not find pleasant).
Not that my own code is good but every single time assembly output from an optimizing compiler beats the AI as it "forgets" about all the little tricks involved. However it may still be about how I prompt it. If I tell it to solve the actual challenge in assembly it does do that, it's just not good or efficient code.
On the other hand because I take the time to proof read it I learn from it's mistakes just as I would from my own.
I highly recommend people learn how to write their own agents. Its really not that hard. You can do it with any llm model, even ones that run locally.
I.e you can automate things like checking for memory freeing.
Or, if you don't need to use C (e.g. for FFI or platform compatibility reasons), you could use a language with a compiler that does it for you.
Or to quote Rick and Morty, “that’s just rust with extra steps!”
Without checks and feedback, LLMs can easily generate unsafe code. So even if they can generate C or Assembly that works, they’re likely to produce code that’s riddled with incorrect edge cases, memory leaks, and so on.
Also, abstraction isn’t only for humans; it’s also for LLMs. Sure, they might benefit from different kinds of abstraction - but that doesn’t mean “oh, just write machine code” is the way to go.
It makes me imagine a programming language designed for LLMs but not humans, designed for rigorous specification of every function, variable, type, etc., valid inputs and outputs, tightly coupled to unit tests, mandatory explicit handling of every exception, etc.
Maybe it'll look like a lot of boilerplate but make it easy to read as opposed to easy to write.
The idea of a language that is extremely high-effort to write, but massively assists in guaranteeing correctness, could be ideal for LLM's.
I think the ideal language for LLMs will look more like APL than C.
I’m thinking of generative tests (quickcheck style), fuzzing, erroring on invariants, contract testing (see the test.contract clojure library for a very cool contract test setup!).
Really really good test suites can do stuff that even logically verified programs can’t do. They’re just a pain in the ass to write. Seems like a good use of LLMs, and you can keep using the same languages!
Look at Shellcheck. It turns a total newbie into a shell master just by iteration.
https://cacm.acm.org/research/automatically-translating-c-to...
If LLMs produce code riddled with bugs in one language it will do in other languages as well. Rust isn't going to save you.
Sure, but it prevents memory safety issues, which C doesn't. As for logic bugs, what does prevent them? That's a bigger question but I'd suggest it's:
1. The ability to model your problem in a way that can be "checked". This is usually done via type systems, and Rust has an arguably good type system for this.
2. Tests that allow you to model your problem in terms of assertions. Rust has decent testing tooling but it's not amazing, and I think this is actually a strike against Rust to a degree. That said, proptest, fuzzing, debug assertions, etc, are all present and available for Rust developers.
There are other options like using external modeling tools like TLA+ but those are decoupled from your language, all you can ever do is prove that your algorithm as specified is correct, not the code you wrote - type systems are a better tool to some degree in that way.
I think that if you were to ask an LLM to write very correct code then give two languages, one with a powerful, express type system and testing utilities, and one without those, then the LLM would be far more likely to produce buggy code in the system without those features.
Like everything around Rust, this has been discussed ad nauseam.
Preventing memory safety bugs has a meaningful impact in reducing CVEs, even if it has no impact on logic bugs. (Which: I think you could argue the flexible and expressive type system helps with. But for the sake of this argument, let's say it provides no benefits.)
Nobody ever claimed that. The claims are:
1. Rust drastically reduces the chance of memory errors. (Or eliminates them if you avoid unsafe code.)
2. Rust reduces the chance of other logic errors.
Rust doesn't have to eliminate logic errors to be a better choice than C or assembly. Significantly reducing their likelihood is enough.
Modern sewers sometimes back up, so might as well just releive yourself in a bucket and dump it into your sidewalk.
Modern food preservation doesn't prevent all spoilage so you might as well just go back to hoping that meat hasn't been sitting in the sun for too many days.
You can't get a gutter ball if you put up the rails in a bowling lane. Rust's memory safety is the rails here.
You might get different "bad code" from AI, but if it can self-validate that some code it spits out has memory management issues at compile time, it helps the development. Same as with a human.
https://security.googleblog.com/2025/11/rust-in-android-move...
That team claims that not having to deal with memory bugs saved them time. That time can be spent on other things (like fixing logic errors)
We're at the point of diminishing returns from scaling and RL is the only way to see meaningful improvements
Very hard to improve much via RL without some way to tell if the code works without requiring compilation
Logic based languages like Prolog take this to the logic extreme, would love to see people revisit that idea
But none of them really have enough training data for LLMs to be any good at them.
Bingo. LLMs are language models, not models of software systems. Everything gets translated through natural language! So the quality of the abstraction still matters: code that can be described well in plain language wins.
Checking preconditions and postconditions is much easier to do for a human than checking an implementation
The thing that would really make sense is a proved language like Coq or Promela
You can then really just leave the implementation to the AI.
Yes, it’s extremely soul sucking. With the added disadvantage of not teaching me anything.
I went full meta and sketched out a file, then had an expensive LLM go through the codebase and write such a file. I don't know if it's any good though, I only really use coding assistants to write unit tests.
Getting it to output a spec lets me correct the spec, reload the browser tab to speed things up, or move to a different AI.
Vibe-coding a program that segfaults and you don't know why and you keep burning compute on that? Doesn't seem like a great idea.
>Is C the ideal language for vibe coding? I think I could mount an argument for why it is not, but surely Rust is even less ideal.
I've been using Rust with LLMs for a long time (mid-2023?) now; cargo check and the cargo package system make it very easy for LLMs to check their work and produce high quality code that almost never breaks, and always compiles.
The first pass is to learn the fundamentals of language, and then it is refined on curated datasets, so you could refine them on high quality curated C code.
Doubt. These things have been trained to emulate humans, why wouldn't they make the same mistakes that humans do? (Yes, they don't make spelling errors, but most published essays etc. don't have spelling errors, whereas most published C codebases do have undefined behaviour).
It's incorrect to think because it is trained on buggy human code it will make these mistakes. It predicts the most likely token. Let's say 100 programmers write a function, most (unless it's something very tricky), won't forget to free that particular function. So the most likely tokens are those which do not leak.
In addition, this is not GPT 3. There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks). And also a massive amount of synthetic data which can also be generated in a way that is provably correct.
You don't free a function.
And this would only be true if the function is the same content with minor variations, which is why LLMs are better suited for very small examples. Because bigger examples are less likely to be semantically similar, and so there is less data to determine the "correct" next token.
> There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks)
This is a really dubious claim. Where are you getting this? Do you have some information on how these models are trained on C code specifically? How do you know whether the code they train on has no leaks?
There are huge projects that everyone depends on that have memory bugs in them right now. And these are actual experts missing these bugs, what makes you think the people at OpenAI are creating safer data than the people whose livelihoods actually depend on it?
This thread is full of people sharing how easy it is to make memory bugs with an LLM, and that has been my experience as well.
How do you know? I can believe that they didn't show memory errors in a quick test run on a common architecture with a common compiler, much like most human-written code in the training corpus.
I think Claude would do much better with tools provided by modern C++ or Zig than C, frankly, anyways. Or even better, like the Rust people have helpfully mentioned, Rust.
A language designed for vibe coding could certainly be useful, but what that means is the opposite of what the author thinks that means.
The author thinks that such a language wouldn't need to have lots of high-level features and structure, since those are things that exist for human comprehension.
But actually, the opposite is true. If you're designing a language for LLMs, the language should be extremely strict and wordy and inconvenient and verbose. You should have to organize your code in a certain way, and be forced to check every condition, catch every error, consider every edge case, or the code won't compile.
Such a language would aggravate a human, but a machine wouldn't care. And LLMs would benefit from the rigidness, as it would help prevent any confusion or hallucination from causing bugs in the finished software.
No, that's the problem (same misconception the author has) - it can't. At least not reliably. If you give an LLM free rein with a non-memory safe output format, it will make the exact same mistakes a human would.
The point of a verbose language is to create extensive guardrails. Which the LLM won't be annoyed by, unlike a human developer.
Because I want to be able to review it, and extend it myself.
edit: Pure vibe coding is a joke or thought exercise, not a goal to aspire to. Do you want to depend on a product that has not been vetted by any human? And if it is your product, do you want the risk of selling it?
I can imagine a future where AI coders and AI QA bots do all the work but we are not there yet. Besides, an expressive language with safety features is good for bots too.
I'm getting too old for this shit.