Do variable names matter for AI code completion? (2025)

When GitHub Copilot suggests your next line of code, does it matter whether your variables are named "current_temperature" or just "x"?

I ran an experiment to find out, testing 8 different AI models on 500 Python code samples across 7 naming styles. The results suggest that descriptive variable names do help AI code completion.

Full paper: https://www.researchsquare.com/article/rs-7180885/v1

amelius · 5 months ago

Shouldn't the LLMs therefore train on code where the variable names have been randomized?

Perhaps it will make them more intelligent ...

jerf · 5 months ago

No. Variable names contain valuable information. That's why humans use them too.

AIs are finite. If they're burning brainpower on determining what "x" means, that's brainpower they're not burning on your actual task. It is no different than for humans. Complete with all the considerations about them being wrong, etc.

fenomas · 5 months ago

LLMs do see randomized identifiers, whenever they encounter minimized code. And you can get a bit of an idea how much they learn, by giving an LLM some minimized JS and asking it to restore it with meaningful var names.

When I tried it once the model did a surprisingly good job, though it was quite a while ago and with a small model by today's standards.

dingnuts · 5 months ago

No, they're more likely to predict the correct next token the closer the code is to identical to the training set, so if you're doing something generic short names will get the right predictions and if you're doing something in a problem domain, using an input that starts the sequence generation in a part of the model that was trained on the problem domain is going to be better

empath75 · 5 months ago

They're trained on plenty of code with bad variable names.

But every time you make an AI think you are introducing an opportunity for it to make a mistake.

knome · 5 months ago

if you train them on randomized names, they'll also suggest them.

better to not, I think.

It's kinda funny that people are now taking decades of good coding practices seriously now that they work with AI instead of humans.

roxolotl · 5 months ago

I was talking to a coworker about how they get the most out of Claude Code and they just went on to list every best practice they've never been willing to implement when working previously. For some reason people are willing to produce design documentation, provide comments that explain why, write self documenting code and so on now that they are using LLMs to generate code.

It's the same with the articles about how to work with these tools. A long list of coding best practices followed by a totally clueless "wow once I do all the hard work LLMs generate great code every time!"

nzach · 5 months ago

> For some reason people are willing to produce design documentation....

I'm assuming you wrote that just for dramatic effect but let me explain why I think this behavior is completely rational.

If you implement "feature X" you already learned everything you needed, so adding documentation is a task that doesn't bring you any benefits. You could argue that it would make your life easier when you have to do some maintenance in this code, but that is a pretty big time investment for something you may use some day.

But now with LLMs that reasoning changes dramatically. Having good documentation makes your life easier right now. And the same argument can be made for every good practice that people never bothered to follow: commit messages, tests, variable naming, ...

For example, where I work the 'developer experience' team created a bot that reads your merge requests and judges your changes to understand if they are small enough to bypass the need to have an explicit approval from a coworker. And one of the things that it takes into account is how well documented is this change. If you have a small change without any context the bot won't approve your MR, but you explain the bug and add some relevant tests it will approve your MR and allow you to skip the human code review.

And the result of this new bot is that people are starting to better document their changes, because this allows them to work faster.

So, I agree with GP that is funny to see this play out. But it should not be a surprising behavior for anyone that understand how software is written.

kingstnap · 5 months ago

"Context engineering" + "Prompt Engineering":

1. Having clear requirements with low ambiguity. 2. Giving a few input output pairs on how something should work (few shot prompting). 3. Avoiding providing useless information. Be consicise. 4. Avoid having contradictory information or distractors. 5. Break complex problems into more manageable pieces. 6. Provide goals and style guides.

A.K.A its just good engineering.

Groxx · 5 months ago

Obviously yes. They all routinely treat my "thingsByID" array like a dictionary - it's a compact array where ID = index though.

They even screw that up inside the tiny function that populates it. If anything IMO, they over-value names immensely (which makes sense, given how they work, and how broadly consistent programmers are with naming).

gnulinux · 5 months ago

Do you still have this problem if you add a comment before declaring the variable like "Note: thingsById is not a dictionary, it is an array. Each index of the array represents a blabla id that maps to a thing"

In my experience they under overvalue var names, but they value comments even more. So I tend to calibrate these things with more detailed comments.

bluefirebrand · 5 months ago

Can't you just write the code instead of the more detailed comments? What is the benefit of this approach?

partdavid · 5 months ago

I get what you're saying, but what's interesting to me is that this case is a mild signal that a subsequent developer could take the same erroneous implication. "Id" does in fact imply to me that entries are indexed by "Id", i.e., an attribute of the item being indexed, and that they are not array-like, in that they wouldn't all get different IDs by a deletion, for example.

DullPointer · 5 months ago

Curious if you get better results with something like “thingsByIdx” or “thingsByIndex,” etc.?

delifue · 5 months ago

Did you add the type annotation of it in code?

This is in Go, so both "yes" (it's defined with an explicit type in the file, sometimes the same func) and "yes but" (afaict next to no code-agent looks at type information that e.g. gopls has readily available, or even godoc).

yakubov_org · 5 months ago

nemo1618 · 5 months ago

Time for Hungarian notation to make a comeback? I've always felt it was unfairly maligned. It would probably give LLMs a decent boost to see the type "directly" rather than needing to look up the type via search or tool call.

socalgal2 · 5 months ago

It was and still is

https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

Types help but they don't help "at a glance". In editors that have type info you have to hover over variables or look elsewhere in the code (even if it's up several lines) to figure out what you're actually looking at. In "app" hungarian this problem goes away.

hmry · 5 months ago

I remember thinking this post was outdated when I first read it.

"Safe strings and unsafe strings have the same type - string - so we need to give them different naming conventions." I thought "Surely the solution is to give them different types instead. We have a tool to solve this, the type system."

"Operator overloading is bad because you need to read the entire code to find the declaration of the variable and the definition of the operator." I thought "No, just hit F12 to jump to definition. (Also, doesn't this apply to methods as well, not just operators?) We have a tool to solve this, the IDE."

If it really does turn out that the article's way is making a comeback 20 years later... How depressing would that be? All those advances in compilers and language design and editors thrown out, because LLMs can't use them?

k__ · 5 months ago

ssalka · 5 months ago

The names of variables impart semantic meaning, which LLMs can pick up on and use as context for determining how variables should behave or be used. Seems obvious to me that `current_temperature` is a superior name to `x` – that is, unless we're doing competitive programming ;)

My first hypothesis was that shorter variable names would use fewer tokens and be better for context utilisation and inference speed. I would expand your competitive programming angle to the obfuscated C challenge ;)

Macha · 5 months ago

The problem is, unless you're doing green field development, that description of what the existing desired functionality is has to be somewhere, and I suspect a parallel markdown requirements documents and the code with golfed variable names are going to require more context, not less.

r0s · 5 months ago

The purpose of code is for humans to read.

Until AI is compiling straight to machine language, code needs to be readable.

deadbabe · 5 months ago

Variable names don’t matter in small scopes.

rented_mule · 5 months ago

It certainly can matter in any scope. `x` or even `delay` will lead to more bugs down the line than `delay_in_milliseconds`. It can be incredibly frustrating to debug why `delay = 1` does not appear to lead to a delay if your first impression is that `delay` (or `x`) is in seconds.

The scope of the cognitive effort is the total context of the system. Yes it matters.

quuxplusone · 5 months ago

"500 code samples generated by Magistral-24B" — So you didn't use real code?

The paper is totally mum on how "descriptive" names (e.g. process_user_input) differ from "snake_case" names (e.g. process_user_input).

The actual question here is not about the model but merely about the tokenizer: is it the case that e.g. process_user_input encodes into 5 tokens, ProcessUserInput into 3, and calcpay into 1? If you don't break down the problem into simple objective questions like this, you'll never produce anything worth reading.

ijk · 5 months ago

True - though in the actual case of your examples, calcpay, process_user_input, and ProcessUserInput all encode into exactly 3 tokens with GPT-4.

Which is the exact kind of information that you want to know.

In practice, I'd expect the performance difference to be relatively minimal, as input tokens tends to quickly get aggregated into more general concepts. But that's the kind of question that's worth getting metrics on: my intuition suggests one answer, but do the numbers actually hold up when you actually measure it?

Awesome! You should have written this blog post instead of that guy. :)

OutOfHere · 5 months ago

Section names (as a comment) help greatly in long functions. Section names can also help partially compensate for some of the ambiguity of variable names.

Another thing that matters massively in Python is highly accurate, clear, and sensible type annotations. In contrast, incorrect type annotations can throw-off the LLM.