How I use LLMs as a staff engineer

As a fellow "staff engineer" LLMs are terrible at writing or teaching how to write idiomatic code, and they are actually causing me to spend more time reviewing than I was previously due to the influx of junior to senior engineers trying to sneak in LLM garbage.

In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.

I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

tokioyoyo · a year ago

I will get probably heavily crucified for this, but to people who are ideologically opposing AI generated code — executives, directors and managerial staff think the opposite. Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

Personally, I’m on the fence. But having conversations with others, and some requests from execs to implement different AI utils into our processes… making me to be on the safer side of job security, rather than dismiss it and be adamant against it.

feoren · a year ago

> executives, directors and managerial staff think the opposite

Executives, directors, and managerial staff have had their heads up their own asses since the dawn of civilization. Riding the waves of terrible executive decisions is unfortunately part of professional life. Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.

> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

You're making the assumption that LLMs can improve your speed. That's the very assumption being questioned by GP. Heaps of low-quality code do not improve development speed.

subw00f · a year ago

I think I’m having the same experience as you. I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.

Don’t get me wrong—I’ve seen productivity gains both in LLMs explaining code/ideation and in actual implementation, and I use them regularly in my workflow now. I quite like it. But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display. They write a snake game one day using ChatGPT, and the next, they’re telling you that you might be too slow—despite a string of record-breaking quarters driven by successful product iterations.

I really don’t want to be a naysayer here, but it’s pretty demoralizing when these are the same people who decide your compensation and overall employment status.

hansvm · a year ago

"Yes, of course, I'm using AI at every single opportunity where I think it'll improve my output"

<<never uses AI>>

rectang · a year ago

This has been true for every heavily marketed development aid (beneficial or not) for as long as the industry has existed. Managing the politics and the expectations of non-technical management is part of career development.

aprilthird2021 · a year ago

> executives, directors and managerial staff think the opposite

The entire reason they hire us is to let them know if what they think makes sense. No one is ideologically opposed to AI generated code. It comes with lots of negatives and caveats that make relying on it costly in ways we can easily show to any executives, directors, etc. who care about the technical feasibility of their feelings.

v3xro · a year ago

As a former "staff engineer" these executives can go and have their careers and leave people who want to have code they can understand, reason about and focus on quality software well alone.

hinkley · a year ago

When IntelliJ was young the autocomplete and automated refactoring were massive game changers. It felt like a dawn of a new age. But then release after release no new refactorings materialized. I don't know if they hit the Pareto limit or the people responsible moved on to new projects.

I think that's the sort of spot where better tools might be appropriate. I know what I want to do, but it's a mess to do it. I suspect that will be better at facilitating growth instead of stunting it.

Deleted Comment

billy99k · a year ago

I've seen the exact opposite. Management at my company has been trying to shove AI into everything. They even said that this year we would be dropping all vendors that didn't have some for of AI in their workflow.

satellite2 · a year ago

I just don't fully understand this position at this level. Personally I know exactly what the next 5 lines need to be, and whether I write them or auto complete or some AI write them doesn't matter. I'll only accept what I had in mind exactly. And with Copilot for boilerplate and relatively trivial tasks that happens pretty often. I feel I'm just saving time / old age joint pain.

purerandomness · a year ago

If the next 5 lines of code are so predictable, do they really need to be written down?

If you're truly saving time by having an LLM write boiler plate code, is there maybe an opportunity to abstract things away so that higher-level concepts, or more expressive code could be used instead?

axlee · a year ago

What's your stack ? I have the complete opposite experience. LLMs are amazing at writing idiomatic code, less so at dealing with esoteric use cases.

And very often, if the LLM produces a poopoo, asking it to fix it again works just well enough.

Bjartr · a year ago

> asking it to fix it again works just well enough.

I've yet to encounter any LLM from chatGPT to cursor, that doesn't choke and start to repeat itself and say it changed code when it didn't, or get stuck changing something back and forth repeatedly inside of 10-20 minutes. Like just a handful of exchanges and it's worthless. Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?

beepbooptheory · a year ago

Simply, it made my last job so nightmarish that for the first time in this career I absolutely dreaded even thinking about the codebase or having to work the next day. We can argue about the principle of it all day, or you can say things like "you are just doing it wrong," but ultimately there is just the boots-on-the-ground experience of it that is going to leave the biggest impression on me, at least. Like it's just so bad to have to work alongside, either the model itself or your coworker with the best of intentions but no domain knowledge.

Its like having to forever be the most miserable detective in the world; no mystery, only clues. A method that never existed, three different types that express the same thing, the cheeky smile of your coworker who says he can turn the whole backend into using an ORM in a day because he has Cursor, the manager who signs off on this, the deranged PR the next day. This continual sense that less and less people even know whats going on anymore...

"Can you make sure we support both Mongo and postgres?"

"Can you put this React component inside this Angular app?"

"Can you setup the kubernetes with docker compose?"

esafak · a year ago

Hiring standards are important, as are managers who get it. Your organization seems to be lacking in both.

the_mitsuhiko · a year ago

> but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

That does not match my experience at all. You obviously have to use your brain to review it, but for a lot of problems LLMs produce close to perfect code in record time. It depends a lot on your prompting skills though.

xmprt · a year ago

Perhaps I suck at prompting but what I've noticed is that if an LLM has hallucinated something or learned a fake fact, it will use that fact no matter what you say to try to steer it away. The only way to get out of the loop is to know the answer yourself but in that case you wouldn't need an LLM.

codr7 · a year ago

I would say prompting skills relative coding skills; and the more you rely on them, the less you learn.

brandall10 · a year ago

It's helpful to view working solutions and quality code as separate things to the LLM.

* If you ask it to solve a problem and nothing more, chances are the code isn't the best as it will default to the most common solutions in the training data.

* If you ask it to refactor some code idiomatically, it will apply most common idiomatic concepts found in the training data.

* If you ask it to do both at the same time you're more likely to get higher quality but incorrect code.

It's better to get a working solution first, then ask it to improve that solution, rinse/repeat in smallish chunks of 50-100 loc at a time. This is kinda why reasoning models are of some benefit, as they allow a certain amount of reflection to tie together disparate portions of the training data into more cohesive, higher quality responses.

jondwillis · a year ago

It isn't like you can't write tests or reason about the code, iterate on it manually, just because it is generated. You can also give examples of idioms or patterns you would like to follow. It isn't perfect, and I agree that writing code is the best way to build a mental model, but writing code doesn't guarantee intuition either. I have written spaghetti that I could not hope to explain many times, especially when exploring or working in a domain that I am unfamiliar with.

ajmurmann · a year ago

I described how I liked doing ping-pong pairing TDD with Cursor elsewhere. One of the benefits of that approach is that I write at least half the implementation and tests and review every single line. That means that there is always code that follows the patterns I want and it's right there for the LLM to see and base its work on.

Edit: fix typo in last sentence

doug_durham · a year ago

I've had exactly the opposite experience with generating idiomatic code. I find that the models have a lot of information on the standard idioms of a particular language. If I'm having to write in a language I'm new in, I find it very useful to have the LLM do an idiomatic rewrite. I learn a lot and it helps me to get up to speed more quickly.

Deleted Comment

qqtt · a year ago

I wonder if there is a big disconnect partially due to the fact that people are talking about different models. The top tier coding models (sonnet, o1, deepseek) are all pretty good, but it requires paid subscriptions to make use of them or 400GB of local memory to run deepseek.

All the other distilled models and qwen coder and similar are a large step below the above models in terms of most benchmarks. If someone is running a small 20GB model locally, they will not have the same experience as those who run the top of the line models.

Deleted Comment

arijo · a year ago

LLMs can work if you program above the code.

You still need to state your assertions with precision and keep a model of the code in your head.

Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.

elliotto · a year ago

> Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.

This is a fantastic quote and I will use this. I describe the future of coding as natural language coding (or maybe syntax agnostic coding). This does not mean that the llm is a magic machine that understands all my business logic. It means what you've described - I can describe my function flow in abstracted english rather than requiring adherence to a syntax

icnexbe7 · a year ago

i’ve had some luck with asking conceptual questions about how something works if i am using library X with protocol Y. i usually get an answer that is either actually useful or at least gets me on the right path of what the answer should be. for code though, it will tell me to use non existent apis from that library to implement things

whatever1 · a year ago

The counterargument that I hear is that since writing code is now so easy and cheap, there is no need to write pretty code that generalizes well. Just have the llm write a crappy version and the necessary tests, and once your requirements change you just toss everything and start fresh.

the_real_cher · a year ago

Code written from an LLM is really really good if done right, i.e. reviewing every line of code as it comes out and prompt guiding it in the right direction.

If youre getting junior devs just pooping out code and sending to review thats really bad and should be a pip-able offense in my opinion.

baq · a year ago

I’ve iterated on 1k lines of react slop in 4h the other day, changed table components twice, handled errors, loading widgets, modals, you name it. It’d take me a couple days easily to get maybe 80% of that done.

The result works ok, nobody cares if the code is good or bad. If it’s bad and there are bugs, doesn’t matter, no humans will look at it anymore - Claude will remix the slop until it works or a new model will rewrite the whole thing from scratch.

Realized during writing this that I should’ve added the extract of requirements in the comment of the index.ts of the package, or maybe a README.CURSOR.md.

mrtesthah · a year ago

My experience having Claude 3.5 Sonnet or Google Gemini 2.0 Exp-12-06 rewrite a complex function is that it slowly introduces slippage of the original intention behind the code, and the more rewrites or refactoring, the more likely it is to do something other than what was originally intended.

At the absolute minimum this should require including a highly detailed function specification in the prompt context and sending the output to a full unit test suite.

hollowturtle · a year ago

I'd pay to review one of your PRs. Maybe a consistent one with ai usage proof.

On last resort bug fixes:

> I don’t do this a lot, but sometimes when I’m really stuck on a bug, I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”

The "reasoning" models are MUCH better than this. I've had genuinely fantastic results with this kind of thing against o1 and Gemini Thinking and the new o3-mini - I paste in the whole codebase (usually via my https://github.com/simonw/files-to-prompt tool) and describe the bug or just paste in the error message and the model frequently finds the source, sometimes following the path through several modules to get there.

Here's a slightly order example: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8... - finding a bug in some Django middleware

elliotto · a year ago

I have picked up the cursor tool which allows me to throw in relevant files with a drop down menu. Previously I was copy pasting files into the chatgpt browser page, but now I point cursor to o1 and do it within the ide.

One of my favourite things is to ask it if it thinks there are any bugs - this helps a lot with validating any logic that I might be exploring. I recently ported some code to a different environment with slightly different interfaces and it wasn't working - I asked o1 to carefully go over each implementation in detail why it might be producing a different output. It thought for 2 whole minutes and gave me a report of possible causes - the third of which was entirely correct and had to do with how my environment was coercing pandas data types.

There have been 10 or so wow moments over the past few years where I've been shocked by the capabilities of genai and that one made the list.

powersnail · a year ago

The "attach the entire file" part is very critical.

I've had the experience of seeing some junior dev posting error messages into ChatGPT, applying the suggestions of ChatGPT, and posting the next error message into ChatGPT again. They ended up applying fixes for 3 different kinds of bugs that didn't exist in the code base.

---

Another cause, I think, is that they didn't try to understand any of those (not the solutions, and not the problems that those solutions are supposed to fix). If they did, they would have figured out that the solutions were mismatches to what they were witnessing.

There's a big difference between using LLM as a tool, and treating it like an oracle.

theshrike79 · a year ago

This is why in-IDE LLMs like Copilot are really good.

I just had a case where I was adding stuff to two projects, both open at the same time.

I added new fields to the backend project, then I swapped to the front-end side and the LLM autocomplete gave me 100% exactly what I wanted to add there.

And similar super-accurate autocompletes happen every day for me.

I really don't understand people who complain about "AI slop", what kind of projects are they writing?