I asked ChatGPT to convert a simple 12-line Python program to Lisp.
The result is broken by mismatched brackets. Yes. In Lisp.
I ask for correction and receive the errant bracket moved, still mismatched.
I ask again and the bracket gets removed. Fine??
No. A loop break has been mistranslated to a return from function, so execution is nonsense.
I wonder how much user faith in ChatGPT is based on examples in which the errors are not apparent ... to a certain kind of user.
Also, getting template of 100s of LOC saves time in writing it from scratch even if I know exactly what I'd need to write.
I suspect the typical ChatGPT user is using it for code that's beyond his ability to write. That being the whole point of his using it.
It follows that such code is likely beyond his ability to understand. But worse, beyond his ability to understand he doesn't understand it.
I rarely know exactly what I need to write and writing it usually isn't the problem. Sans LLM, there have always been tools and techniques you can lean on, like syntax highlighting, auto completion, your compiler feedback, or unit tests.
I find that writing code consumes only a small percentage of my time. And while writing code is obviously where bugs originate, it's not where the big problems come from.
Making software good is the easy part, making the right software is where the trouble lies. It's why I spend most time not coding, but talking to people. Figuring out what needs to be done, how, when and why.
What if you could write something like:
And the decorator reads the function name and optional docstring, runs it through an LLM and replaces the function with one implementing the desired behavior (hopefully correctly). I remember there was something like this for StackOverflow.Of course AI data pipelines are a totally different conversation than code solutions.
Because almost every programmer these days has learned by a route that relies on incorporating code on trust. E.g. using someone else's compiler.
> I use code from LLMs on most work days ... not because I trust it, but because I understand it and tested it.
The snag there is that one can rarely verify test coverage is sufficient.
> the same as code from anyone, including from myself six months ago.
But it is not the same. ChatGPT delivers completely untested code. Not even syntax-checked in my case. Would you accept such code from any human?
> Trust doesn't enter into it. Clear and workable solutions do
Plenty of clear and workable solutions are discovered unsafe. I suspect that's much more likely when the code comes from a source such as ChatGPT that delivers code in a completely untested state.
There are way too countless many stories of people successfully accomplishing tasks by getting an llm to give them a starting point or outline or even a complete solution that only needed a few fixups, hardly any different from debugging your own first draft, to say it doesn't or can't work. That's already sailed.
The fact that it doesn't produce finished correct code, or even appropriate outline for every single case, doesn't seem especially remarkable or damning to me. It is what it is, it's not what it's not.
(I haven't used any myself, but I'm not primarily a coder who is obligated to take advantage of every available tool, and don't like how any of the llm companies are training on open source code without either consent or attribution and then selling that. And don't like how ai and even mere speech to text voice interface before that is being added to everything in general.)
Really? You can't understand why people do things based on trust? Do you trust no one?
I've only used it for basic snippets where I know what the code I need is, but it's faster to describe it than write it out myself (for example, "Using Python and boto3, take all the files in the current directory that end in .csv or .pdf and upload them to a bucket named "reports". Prepend the S3 keys with "YYYY-MM-DD/" using today's date").
For simple things like that, it's been great.
Obscure languages and operating systems are also a weakness - for example I could not get llama3 to reply with FreeBSD commands even after repeatedly telling it not to - there just aren’t that many SO/Reddit/forum posts about BSD in comparison to Linux. And evidently that’s what they’ve trained it on. There isn’t even anything semantically odd about an rc.conf, it spits out correct statements, just for the wrong file in the wrong OS.
Everyone acknowledges things like multiplication and counting are hard for LLMs because they aren’t statistically encoded. But counting lisp brackets is also basically the same problem. And unlike YAML it’s not really self-healing or obviously correctable once it’s fucked up (as you noticed). I think the implication is there’s clearly some syntaxes that are much less generateable simply because they require intense amounts of context.
(also, as far as counting… I’m sure far smarter people have look at this, but maybe multiplication is something that could benefit from log-addition techniques, that might reduce the complexity quite a bit vs a naive “multiply column by column like a human fourth-grader” approach.)
ChatGPT: "I've encountered numerous examples of Lisp code during my training, covering different applications and techniques within the language. Whether it's simple examples or more complex implementations, I've seen quite a bit of Lisp code."
> the general complexity of the format
Good then that this example has a very simple format.
Here Not a rpoblem in this (even without AI you’ll see diff breaking on json files for much the same reason, unless you go out of your way to avoid it - brackets are semantic white noise
I chose to. So I could give it another chance to succeed.
But do tell. What type of person does not have to ask it to fix its mismatched brackets?
Glad I'm not doing that, then. I understand what it is generating as soon as I try to run it - and get a "missing bracket" error. I've understood that what it has generated is crap. And that only good luck made that easily detectable.
Very different. An LLM lets him generate /far more/ bad untested code per unit of effort.
And SO encourages other humans to test it. Often an SO answer comes with evaluative comments and ratings that are all I need to reject it. If the LMM was "AI", it would at least provide the same, right?
> A better path is to ensure they are spending significantly more time reviewing, testing, and incrementally improving their code than they are typing/generating it
Then let's see the LLM-based workflow that achieves that...
LLM's will turn out to be nothing more than another glittering gadget. Except this one will obliterate any coding skills of future generations. Not to mention the gigatons of AI generated garbage text, images and music that it's filling the world with. What a waste.
ChatGPT: The larger number is 3.
User: If I have one apple, pick another apple and then get an apple from a friend, and Alice has two apples, who has more apples?
ChatGPT: You have more apples. You started with one apple, picked another, and got one more from a friend, so you have three apples. Alice has two apples.
[edit]:
User: what does good will hunting say to the know-it-all at the bar?
ChatGPT: After Will Hunting outsmarts the know-it-all student in the bar, he spots the student's female friend outside a shop later on. To tease the student further and show his wit, Will taps on the window to get her attention and, holding up his phone number, famously asks, "Do you like apples?" After she nods, Will triumphantly says, "Well, I got her number. How do you like them apples?" This line became quite iconic and is one of the memorable quotes from the movie.
Every time I've tried gpt-4 etc there are such problems. I don't use it for trivial things since then what's the point, and for anything non-trivial it screws up too much.
It is useful for translations and brainstorming, but not much else. Translating from natural language to code is a part of that, but not letting it code itself.
I mean... yeah. ChatGPT doesn't know what numbers or what logic is. It wasn't "confused". It just doesn't do that.
You're using it for something it's not intended to be used for and, unsurprisingly, it's bad at it. If I said I'm giving up on using hammers because they were terrible for screwing screws into the wall, you'd tell me I'm using the wrong tool. This is no different. ChatGPT has its uses, but it isn't magic. Just like any tool it has limitations.
LLMs have fairly well understood failure modes. They have incredible strengths, too. And they differ very much by what you use. If the ChatGPT you were forced to use was GPT3.5 (and the behavior sounds very much like it was), you literally have no idea what modern LLMs can actually do.
https://hachyderm.io/@inthehands/112006855076082650
> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.
> Alas, that does not remotely resemble how people are pitching this technology.
Great. Solved the world shortage of bullshit. Oh wait...
Add to that the fact that matching parentheses is hard for humans to reliably do correctly—especially when typing code into a textbox on Stack Overflow—and now you have a model whose architecture struggles with knowing when to stop in highly repetitive sequences and that was trained on data that is disproportionately likely to have incorrect syntax for the language you're using.
Other languages don't have those features, and I've never found Copilot to give me a syntax error in, say, JavaScript.
That said, I also don't know any professional developer who trusts LLMs to code. Everyone I know who uses them uses them as a tool and takes responsibility for cleaning up their output.
FWIW, this example had one more ) than ( !
I'll post it as a comment to the root.
Lisp is just a uniquely hard language for a transformer to cope with—not that it doesn't make mistakes in other languages, but I've almost never seen a syntax mistake.
Is helpful to always be skeptical of code and try to get an understanding of it.
Another way I think of it is I find it easier to revise than to write (both in English and code). So when I write something I just try to get my ideas out there and then skeptically reread it and recompose. LLMs let me just speed up that process.
Aaawwww hellz no. I hate looking at my old code. I've had too many of those surreal experiences where something breaks, I start looking at how to fix it, and am suddenly stunned to realize "this could never have worked in the first place." There's one or two of those I'm still puzzling over, decades later. :-(
I assume everyone else must be working on projects/languages that have far more representation in the training set, but for me, especially working in Rust, even trying to write interfaces into C libraries, I would constantly find myself dealing with hallucinated parameters or function calls, incorrectly implemented functions or just broken code.
At the very least if I do it myself, I'll have some intuition about how the code works.
I can't speak for anyone else, but I've used LLM's mainly for generating Java code. The generated code hasn't necessarily ever been perfect, but IME it's been "close enough" that even counting having to make some manual changes, it's still faster than writing it myself from scratch.
That said, I'm typically only using it for relatively short snippets of code, where I "know" how to do the operation in question, but want to avoid having to stop and look up fiddly details that might not be cached "top of mind" because I haven't written that code in a while.
I'm sure there are use-cases where LLM's don't save time, but I'm equally sure that there are at least some where they do.
Also, as a general rule you shouldn't trust its code output, but it's a good way to brainstorm or get started on coding something. It can give useful tips or replace stack overflow. I almost never ask it to generate code for me, but I ask it questions about existing code very often. That seems to work very well for me.
By far, the most common problem is that I will ask ChatGPT 4 to write a small problem that does X, and it will create a skeleton. Lots of functions that implement at most some paramater validation and then contain comments like // Implement the rest of this function.
With enough prodding, I can sometimes get it to actually fill in the function implementations one-by-one. But often times it will implement part of it and then put another one of those "implement the rest" comments in. Like for example, if I asked it to write a function to draw three circles, after prodding, it would write code to draw the first circle and then stop. Obviously, the answer here is for it to create a function that draws 1 circle and call it 3 times, but I can't fault it for taking my instructions very literally.
Even when I do get it to implement full functions, when I later ask it for the full code listing, it's back to printing out functions that have "implement this" comments in them.
Perhaps the fault is mine for asking it to do tasks that there aren't hundreds of implementations of on the internet.
It's knowledge of sveltekit still seems weighted to pre 1.0, it hallucinates libraries for shadcn, or radix-ui.
You have to prefix your prompts with "Please use modern JS" otherwise it won't use .filter or .reduce sometimes.
I agree its a good replacement for Stackoverflow, but it is often pitched as a replacement for coding.
I've found using aider as the best compromise, let it build something small and revert if it doesn't work.
It was.
> version 4
Paywalled.
> is much better at code.
I'll post the question as comment to root, for you to try if you want.