How Good Is ChatGPT at Coding, Really?

Interesting study, but unfortunately already outdated. I don't think anybody uses ChatGPT 3.5 for coding anymore.

From my personal experience, Claude 3.5 or GPT-4o work best. They're more coding assistants, not really capable of writing anything more than very simple programs on their own. They make a lot of mistakes, and you need to know how to debug code they produce.

Claude is my favorite but it just randomly added a division by 100,000 to a line of code for no discernible reason. According to Claude it was "an oversight on [Claude's] part".

trashface · 2 years ago

I was using 3.5 until quite about a month ago. Now I'm using 4o. 4o is better but its not a huge difference. I was surprised as I expected it to be a huge improvement. I haven't tried the regular "4" model yet, I've heard it used to be quite good but maybe got worse.

In any case, these models are good for simple stuff as far as I can tell, but can't do anything hard or off the beaten path. They can give me ideas for solving hard issues, but any code they generate is killed by hallucinations and general lack of anything resembling contextual knowledge. It can be quite obstinate too, even if I explain I'm trying to solve an unusual problem, and so need out of the box "thinking", 4o will try to force me back to some mainstream solution that I can't use.

I also use Amazon's Q and that is good for simple/repetitive automation but often generates extremely wacky stuff otherwise.

They are all apparently better at Python than anything else. I don't use that so maybe that's an issue.

algo_trader · 2 years ago

Are you using code-completion-in-the-IDE ? which tool ?

gtirloni · 2 years ago

Exactly what the article is saying.

calibas · 2 years ago

Maybe my reading comprehension is awful, but I don't see it mentioned anywhere in the article that the "ChatGPT" from the study is the worst at coding of the 3 models that people commonly use.

It seems relevant to mention that the "ChatGPT" in the article isn't the one most of us are using for coding.

I'm doing pretty great "pairing" with ChatGPT.

I don't need help coding. That's my core skill!

But I do need a lot of help with how packages/libraries/languages work, and ChatGPT can usually give a decent answer in a minute that could have taken me hours of deeply frustrating searches.

wwalexander · 2 years ago

Indeed. ChatGPT has been extremely valuable for “what’s the right way to glue these two iOS frameworks released a decade apart together” for me.

GuB-42 · 2 years ago

I also find that ChatGPT works well for documentation. I don't need help coding either, as for you, it is may core skill.

However, I am not a great writer, and not a native English speaker, and I find that ChatGPT is better than I am at this. It is after all, a large language model, it is really good with words, it is what it is designed to do, more so than problem solving. I usually feed it my code and let it document it, if it gets it wrong, I correct it, add some context,... Essentially, ChatGPT is my editor (the job, not the software).

BurningFrog · 2 years ago

That's an interesting idea.

I look at hard to understand code pretty often. Maybe ChatGPT can be helpful there too.

amelius · 2 years ago

The "hallucinations" can also be frustrating, though.

BurningFrog · 2 years ago

They do happen, but that happens when I work with humans as well.

I won't blindly trust either partner, and at least ChatGPT isn't insulted when I check if it is right :)

hanland · 2 years ago

Synopsis: ChatGPT is great at stealing from LeetCode, but is horrible at doing anything creative.

We knew that already, but it is good to have an academic publication to link to.

jppope · 2 years ago

that was cracking me up... they took a data set that no doubt has tons of discussion and documentation out there and used that as the basis for the research...

number6 · 2 years ago

And to add on this, a method for testing coding abilities, which programmers think very little of, since it does not show how good some is at programming, only at leetcoding

guilamu · 2 years ago

These days, I'm using Claude 3.5 to create WordPress plug-ins with some success and some failures. What is sure though, is that Clause is miles better than chat gpt at creating WordPress plug-ins. The last 10 times I tried to create one with chatgpt, I had a total WordPress failure.

TheRoque · 2 years ago

Can you specify which version of ChatGPT you used ? It's not helpful to say "ChatGPT" because there's a huge gap between the versions.

UniverseHacker · 2 years ago

Definitely a pet peeve of mine that “ChatGTP” entered the popular lexicon as the name for all OpenAI models, when it’s just a web interface for running LLM models, not a specific model- and the different models have vastly different capabilities.

tiahura · 2 years ago

Scoff all you want, but this 40 year hobby coder's projects have gotten geometrically better in design and functionality - on OSs and languages I've never touched before. Whether it's been VBA, SwiftUI or DEBUG.COM.

esafak · 2 years ago

MSDOS DEBUG.COM? For what?

fragmede · 2 years ago

Microsoft open sourced the code for MS-DOS 2.0 to github, so there's fun nostalgia projects to be had.

https://github.com/microsoft/MS-DOS/

sebzim4500 · 2 years ago

Using GPT-3.5 for this is insane, except as a comparison for GPT-4o etc.

okanat · 2 years ago

This means they are simply overfitting the training dataset which also increases their likelyhood of producing an almost the same output of their training data and violating copyright.

CamperBob2 · 2 years ago

No, it means they tested an obsolete version and got irrelevant results.

whoisthemachine · 2 years ago

I do not understand why this was flagged, this research gives important information on LLM's that contrasts with the marketing fluff we're seeing out of big tech.

p1esk · 2 years ago

Probably because they used GPT 3.5 to make general claims about LLMs.

nunez · 2 years ago

tons of big-LLM astroturfing on this site.