In my experience there's really only three true prompt engineering techniques:
- In Context Learning (providing examples, AKA one shot or few shot vs zero shot)
- Chain of Thought (telling it to think step by step)
- Structured output (telling it to produce output in a specified format like JSON)
Maybe you could add what this article calls Role Prompting to that. And RAG is its own thing where you're basically just having the model summarize the context you provide. But really everything else just boils down to tell it what you want to do in clear plain language.
Dunno. I was working on a side project in TypeScript, and couldn’t think of the term “linear regression”. I told the agent, “implement that thing where you have a trend line through a dot cloud”, or something similarly obtuse, and it gave me a linear regression in one shot.
I’ve also found it’s very good at wrangling simple SQL, then analyzing the results in Bun.
I’m not doing heavy data processing, but so far, it’s remarkably good.
I see that as applying to niche platforms/languages without large public training datasets - if Rust was introduced today, the productivity differential would be so stacked against it that I’m not sure it would hypothetically survive.
Even role prompting is totally useless imo. Maybe it was a thing with GPT3, but most of the LLMs already know they're "expert programmers". I think a lot of people are just deluding themselves with "prompt engineering".
Be clear with your requirements. Add examples, if necessary. Check the outputs (or reasoning trace if using a reasoning model). If they aren't what you want, adjust and iterate. If you still haven't got what you want after a few attempts, abandon AI and use the reasoning model in your head.
It's become more subtle but still there. You can bias the model towards more "expert" responses with the right terminology. For example, a doctor asking a question will get a vastly different response than a normal person. A query with emojis will get more emojis back. Etc.
I get the best results with Claude by treating the prompt like a pseudo-SQL language, treating words like "consider" or "think deeply" like keywords in a programming language. Also making use of their XML tags[1] to structure my requests.
I wouldn't be surprised if in a few years from now some sort of actual formalized programming language for "gencoding" AI is gonna emerge.
One thing I've had a lot of success with recently is a slight variation on role-prompting: telling the LLM that someone else wrote something, and I need their help assessing the quality of it.
When the LLM thinks _you_ wrote something, it's nice about it, and deferential. When it thinks someone else wrote it, you're trying to decide how much to pay that person, and you need to know what edits to ask for, it becomes much more cut-throat and direct.
The main thing I think is people just trying to do everything in "one prompt" or one giant thing throwing all the context at it. What you said is correct but also, instead of making one massive request breaking it down into parts and having multiple prompts with smaller context that say all have structured output you feed into each other.
Make prompts focused with explicit output with examples, and don't overload the context. Then the 3 you said basically.
Chain Of Thought prompting loses much of its effectiveness on newer reasoning models like GPT “o” series and Claude Sonnet.
As an exercise for the reader, I encourage you all to try the examples vs. control prompts in prompt engineering papers for chain of thought prompting, and you’ll see that the latest models have either been trained to or instructed to reason by default now - the outputs are close enough to equivalent.
CoT prompting was probably much more effective a few years ago on older, less powerful models.
You may find some benefit in telling it exactly how you want it to reason about a problem, but note that you may actually be limiting its capabilities that way.
I’ve found that most of the time, I will let it use its default reasoning capabilities and guide those rather than supplying my own.
You use a two-phase prompt for this. Have it reason through the answer and respond with a clearly-labeled 'final answer' section that contains the English description of the answer. Then run its response through again in JSON mode with a prompt to package up what the previous model said into structured form.
The second phase can be with a cheap model if you need it to be.
Sometimes I get the feeling that making super long and intricate prompts reduces the cognitive performance of the model. It might give you a feel of control and proper engineering, but I'm not sure it's a net win.
My usage has converged to making very simple and minimalistic prompts and doing minor adjustments after a few iterations.
That's exactly how I started using them as well. 1. Give it just enough context, the assumptions that hold and the goal. 2. Review answer and iterate on the initial prompt. It is also the economical way to use them. I've been burned one too many times by using agents (they just spin and spin, burn 30 dollars for one prompt and either mess the code base or converge on the previous code written ).
I also feel the need to caution others that by letting the AI write lots of code in your project it makes it harder to advance it, evolve it and just move on with confidence (code you didn't think about and write it doesn't stick as well into your memory).
I’d have to hunt, but there is evidence that using the vocabulary of an expert versus a layman will produce better results. Which makes sense since places where people talk “normally” in spaces are more likely to be incorrect. Whereas in places where people speak in the in the professional vernacular they are more likely to be correct. And the training will associate them together in their spaces.
At their heart, these are still just document-completion machines. Very clever ones, but still inherently trying to find a continuation that matches the part that came before.
This seems right to me. I often ask questions in two phases to take advantage of this (1) How would a professional in the field ask this question? Then (2) paste that question into a new chat.
For another kind of task, a colleague had written a very verbose prompt. Since I had to integrate it, I added some CRUD ops for prompts. For a test, I made a very short one, something like "analyze this as a <profession>". The output was pretty much comparable, except that the output on the longer prompt contained (quite a few) references to literal parts of that prompt. It wasn't incoherent, but it was as if that model (gemini 2.5, btw) has a basic response for the task it extracts from the prompt, and merges the superfluous bits in. It would seem that, at least for this particular task, the model cannot (easily) be made to "think" differently.
Yeah I had this experience today where I had been running code review with a big detailed prompt in CLAUDE.md but then I ran it in a branch that did not have that file yet and got better results.
> It might give you a feel of control and proper engineering
Maybe a super salty take, but I personally haven't ever thought anything involving an LLM as "proper engineering". "Flailing around", yes. "Trial and error", definitely. "Confidently wrong hallucinations", for sure. But "proper engineering" and "LLM" are two mutually exclusive concepts in my mind.
Same here: it starts with a relatively precise need, keeping a roadmap in mind rather than forcing one upfront. When it involves a technology I'm unfamiliar with, I also ask questions to understand what certain things mean before "copying and pasting".
I've found that with more advanced prompts, the generated code sometimes fails to compile, and tracing the issues backward can be more time consuming than starting clean.
I use specs in markdown for the more advanced prompts. I ask the llm to refine the markdown first and add implementation steps, so i can review what it will do. When it starts implementing, i can always ask it to "just implement step 1, and update the document when done". You can also ask it to verify if the spec has been implemented correctly.
It already did. Programming languages already are very strict about syntax; professional jargon is the same way, and for the same reason- it eliminates ambiguity.
There is no such thing as "prompt engineering". Since when the ability to write proper and meaningful sentences became engineering?
This is even worse than "software engineering". The unfortunate thing is that there will probably be job postings for such things and people will call themselves prompt engineers for their extraordinary abilities for writing sentences.
> Since when the ability to write proper and meaningful sentences became engineering?
Since what's proper and meaningful depends on a lot of variables. Testing these, keeping track of them, logging and versioning take it from "vibe prompting" to "prompt engineering" IMO.
There are plenty of papers detailing this work. Some things work better than others (do this and this works better than don't do this - pink elephants thing). Structuring is important. Style is important. Order of information is important. Re-stating the problem is important.
Then there's quirks with family models. If you're running an API-served model you need internal checks to make sure the new version still behaves well on your prompts. These checks and tests are "prompt engineering".
I feel a lot of people take the knee-jerk reaction to the hype and miss critical aspects because they want to dunk on the hype.
Yeah, if this catches on, we may definitely see the title "engineer" go the way of "manager" and "VP" in the last decades...So, yeah, we may start seeing coffee engineers now :D
I would caution against thinking it's impossible even if it's not something you've personally experienced. Prompt engineering is necessary (but not sufficient) to creating high leverage outcomes from LLMs when solving complex problems.
Without it, the chances of getting to a solution are slim. With it, the chances of getting to 90% of a solution and needing to fine tune the last mile are a lot higher but still not guaranteed. Maybe the phrase "prompt engineering" is bad and it really should be called "prompt crafting" because there is more an element of craft, taste, and judgment than there is durable, repeatable principles which are universally applicable.
You're not talking to managers here, you can use plain english.
> Maybe the phrase "prompt engineering" is bad and it really should be called "prompt crafting" because there is more an element of craft, taste, and judgment than there is durable, repeatable principles which are universally applicable.
Yes, the biggest problem with the phrase is that "engineering" implies a well-defined process with predicable results (think of designing a bridge), and prompting doesn't check either of those boxes.
Let's say you have two teams of contractors. One from your native country (I'm assuming US here), working remotely and one from India, located in India.
Would you communicate with both in the exact same manner, you wouldn't adjust your messaging in any way?
Of course you would, that's exactly what "prompt engineering" is.
The language models are a different and a bit fiddly at the moment, so getting quality output from each requires a specific input.
You can try it yourself, ask each of the big free-tier models to write a simple script in a specific language for you, every single one will have a different output. They all have a specific "style" they fall into.
I agree with yowlingcat's point but I see where you are coming from and also agree with you.
The way I see it, it's a bit like putting up a job posting for 'somebody who knows SSH'. While that is a useful skill, it's really not something you can specialize in since it's just a subset within linux/unix/network administration, if that makes sense.
There are so many prompting guides at the moment. Personally I think they are quite unnecessary. If you take the time to use these tools, build familiarity with them and the way they work, the prompt you should use becomes quite obvious.
It reminds me that we had the same hype and FOMO when Google became popular. Books were being written on the subject and you had to buy those or you would become a caveman in a near future. What happened is that anyone could learn the whole thing in a day and that was it, no need to debate about whether you would miss anything if you didn't knew all those tools.
You’re only proving the opposite: there’s definitely a difference between “experienced Google user” and someone who just puts random words and expects to find what they need.
I think there are people for whom reading a prompt guide (or watching an experienced user) will be very valuable.
Many people just won't put any conscious thought into trying to get better on their own, though some of them will read or watch one thing on the topic. I will readily admit to picking up several useful tips from watching other people use these tools and from discussing them with peers. That's improvement that I don't think I achieve by solely using the tools on my own.
Many years ago there were guides on how to write user stories: “As a [role], I want to be able to do [task] so I can achieve [objective]”, because it was useful to teach high-level thinkers how to communicate requirements with less ambiguity.
It may seem simple, but in my experience even brilliant developers can miss or misinterpret unstructured requirements, through no fault of their own.
It's at least useful for seeing how other people are being productive with these tools. I also sometimes find a clever idea that improves that I'm already doing.
And documenting the current state of this space as well. It's easy to have tried doing something a year ago and think they're still bad.
I also usually prefer researching some area before reinventing the wheel by trial/failure myself. I appreciate when people share what they've discovered with their own their time, as I don't always have all the time in the world to explore it as I would if I were still a teen.
A long time back for my MS CS I took a science of programming course. The way to verify has helped me craft prompts when I do data engineering work. Basically:
Given input (…) and preconditions (…) write me spark code that gives me post conditions (…). If you can formally specify the input, preconditions and post conditions you usually get good working code.
1. Science of programming, David Gries
2. Verification of concurrent and sequential systems
In my own experience, if the problem is not solvable by a LLM. No amount of prompt "engineering" will really help. Only way to solve it would be by partially solving it (breaking down to sub-tasks / examples) and let it run its miles.
I'll love to be wrong though. Please share if anyone has a different experience.
I think part of the skill in using LLMs is getting a sense for how to effectively break problems down, and also getting a sense of when and when not to do it. The article also mentions this.
I think we'll also see ways of restructuring, organizing, and commenting code to improve interaction with LLMs. And also expect LLMs to get better at doing this, and maybe suggesting ways for programmers to break problems down that it is struggling with.
I think the intent of prompt engineering is to get better solutions quicker, in formats you want. But yeah, ideally the model just "knows" and you don't have to engineer your question
I went into software instead, but IIRC sales and QA engineers were common jobs I heard about for people in my actual accredited (optical) engineering program. A quick search suggests it is common for sales engineers to have engineering degrees? Is this specifically about software (where "software engineers" frequently don't have engineering degrees either)?
I have a degree in software engineering and I'm still critical if its inclusion as an engineering discipline, just given the level of rigour that's applied to typical software development.
When it comes to "prompt engineering", the argument is even less compelling. Its like saying typing in a search query is engineering.
For real. Editing prompts bares no resemblance to engineering at all, there is no accuracy or precision. Say you have a benchmark to test against and you're trying to make an improvement. Will your change to the prompt make the benchmark go up? Down? Why? Can you predict? No, it is not a science at all. It's just throwing shit and examples at the wall in hopes and prayers.
Absolutely. It's not appropriate to describe developers in general either. That fight has been lost I think and that's all the more reason to push against this nonsense now.
- In Context Learning (providing examples, AKA one shot or few shot vs zero shot)
- Chain of Thought (telling it to think step by step)
- Structured output (telling it to produce output in a specified format like JSON)
Maybe you could add what this article calls Role Prompting to that. And RAG is its own thing where you're basically just having the model summarize the context you provide. But really everything else just boils down to tell it what you want to do in clear plain language.
Start out with Typescript and have it answer data science questions - won't know its way around.
Start out with Python and ask the same question - great answers.
LLMs can't (yet) really transfer knowledge between domains, you have to prime them in the right way.
I’ve also found it’s very good at wrangling simple SQL, then analyzing the results in Bun.
I’m not doing heavy data processing, but so far, it’s remarkably good.
Every day tech broism gets closer to a UFO sect.
Be clear with your requirements. Add examples, if necessary. Check the outputs (or reasoning trace if using a reasoning model). If they aren't what you want, adjust and iterate. If you still haven't got what you want after a few attempts, abandon AI and use the reasoning model in your head.
I wouldn't be surprised if in a few years from now some sort of actual formalized programming language for "gencoding" AI is gonna emerge.
[1]https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
When the LLM thinks _you_ wrote something, it's nice about it, and deferential. When it thinks someone else wrote it, you're trying to decide how much to pay that person, and you need to know what edits to ask for, it becomes much more cut-throat and direct.
Make prompts focused with explicit output with examples, and don't overload the context. Then the 3 you said basically.
As an exercise for the reader, I encourage you all to try the examples vs. control prompts in prompt engineering papers for chain of thought prompting, and you’ll see that the latest models have either been trained to or instructed to reason by default now - the outputs are close enough to equivalent.
CoT prompting was probably much more effective a few years ago on older, less powerful models.
You may find some benefit in telling it exactly how you want it to reason about a problem, but note that you may actually be limiting its capabilities that way.
I’ve found that most of the time, I will let it use its default reasoning capabilities and guide those rather than supplying my own.
The second phase can be with a cheap model if you need it to be.
My usage has converged to making very simple and minimalistic prompts and doing minor adjustments after a few iterations.
I also feel the need to caution others that by letting the AI write lots of code in your project it makes it harder to advance it, evolve it and just move on with confidence (code you didn't think about and write it doesn't stick as well into your memory).
My experience as well. I fear admitting this for fear of being labled a luddite.
At the same time, I’ve seen the system prompts for a few agents (https://github.com/x1xhlol/system-prompts-and-models-of-ai-t...), and they are huge
How does that work?
Maybe a super salty take, but I personally haven't ever thought anything involving an LLM as "proper engineering". "Flailing around", yes. "Trial and error", definitely. "Confidently wrong hallucinations", for sure. But "proper engineering" and "LLM" are two mutually exclusive concepts in my mind.
I've found that with more advanced prompts, the generated code sometimes fails to compile, and tracing the issues backward can be more time consuming than starting clean.
This is even worse than "software engineering". The unfortunate thing is that there will probably be job postings for such things and people will call themselves prompt engineers for their extraordinary abilities for writing sentences.
Since what's proper and meaningful depends on a lot of variables. Testing these, keeping track of them, logging and versioning take it from "vibe prompting" to "prompt engineering" IMO.
There are plenty of papers detailing this work. Some things work better than others (do this and this works better than don't do this - pink elephants thing). Structuring is important. Style is important. Order of information is important. Re-stating the problem is important.
Then there's quirks with family models. If you're running an API-served model you need internal checks to make sure the new version still behaves well on your prompts. These checks and tests are "prompt engineering".
I feel a lot of people take the knee-jerk reaction to the hype and miss critical aspects because they want to dunk on the hype.
On the other hand, prompt tweaking can be learned in a few days just by experimenting.
That could be said about ordering coffee at local coffee shop. Is there a "barista order engineering" we are all supposed to read?
> Re-stating the problem is important.
maybe you can show us some examples ?
Without it, the chances of getting to a solution are slim. With it, the chances of getting to 90% of a solution and needing to fine tune the last mile are a lot higher but still not guaranteed. Maybe the phrase "prompt engineering" is bad and it really should be called "prompt crafting" because there is more an element of craft, taste, and judgment than there is durable, repeatable principles which are universally applicable.
You're not talking to managers here, you can use plain english.
> Maybe the phrase "prompt engineering" is bad and it really should be called "prompt crafting" because there is more an element of craft, taste, and judgment than there is durable, repeatable principles which are universally applicable.
Yes, the biggest problem with the phrase is that "engineering" implies a well-defined process with predicable results (think of designing a bridge), and prompting doesn't check either of those boxes.
Let's say you have two teams of contractors. One from your native country (I'm assuming US here), working remotely and one from India, located in India.
Would you communicate with both in the exact same manner, you wouldn't adjust your messaging in any way?
Of course you would, that's exactly what "prompt engineering" is.
The language models are a different and a bit fiddly at the moment, so getting quality output from each requires a specific input.
You can try it yourself, ask each of the big free-tier models to write a simple script in a specific language for you, every single one will have a different output. They all have a specific "style" they fall into.
The way I see it, it's a bit like putting up a job posting for 'somebody who knows SSH'. While that is a useful skill, it's really not something you can specialize in since it's just a subset within linux/unix/network administration, if that makes sense.
I don't think you have to worry about that.
For the uneducated, law engineers are members of the Congress / Parliament / Bundestag / [add for your own country]
Many people just won't put any conscious thought into trying to get better on their own, though some of them will read or watch one thing on the topic. I will readily admit to picking up several useful tips from watching other people use these tools and from discussing them with peers. That's improvement that I don't think I achieve by solely using the tools on my own.
It may seem simple, but in my experience even brilliant developers can miss or misinterpret unstructured requirements, through no fault of their own.
And documenting the current state of this space as well. It's easy to have tried doing something a year ago and think they're still bad.
I also usually prefer researching some area before reinventing the wheel by trial/failure myself. I appreciate when people share what they've discovered with their own their time, as I don't always have all the time in the world to explore it as I would if I were still a teen.
Given input (…) and preconditions (…) write me spark code that gives me post conditions (…). If you can formally specify the input, preconditions and post conditions you usually get good working code.
1. Science of programming, David Gries 2. Verification of concurrent and sequential systems
I'll love to be wrong though. Please share if anyone has a different experience.
I think we'll also see ways of restructuring, organizing, and commenting code to improve interaction with LLMs. And also expect LLMs to get better at doing this, and maybe suggesting ways for programmers to break problems down that it is struggling with.
I get by just fine with pasting raw code or errors and asking plain questions, the models are smart enough to figure it out themselves.
> Calling someone a prompt engineer is like calling the guy who works at Subway an artist because his shirt says ‘Sandwich Artist.’
All jokes aside I wouldn't get to hung up on the title, the term engineer has long since been diluted to the point of meaninglessness.
https://jobs.mysubwaycareer.eu/careers/sandwich-artist.htm
https://en.wikipedia.org/wiki/Audio_engineer
When it comes to "prompt engineering", the argument is even less compelling. Its like saying typing in a search query is engineering.
There are prompts to be used with API an inside automated workflows and more to it.
Many prompt engineers do measure and quantitatively compare.