Even a simple prompt like this:
=
I have two potential solutions.
Solution A:
Solution B:
Which one is better and why?
=
Is biased. Some LLM tends to choose the first option and the other prefer the last one.
(Of course, humans suffer from the same kind of bias too: https://electionlab.mit.edu/research/ballot-order-effects)
You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)
I asked 2 things.
1. Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured. It generated garbage devicetree which didn't even compile. When I pointed it out, it apologized and generated another one that didn't compile. It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
2. I asked it to create 7x10 monochromatic pixelmaps, as C integer arrays, for numeric characters, 0-9. I also gave an example. It generated them, but number eight looked like zero. (There was no cross in ether 0 nor 8, so it wasn't that. Both were just a ring)
What am I doing wrong? Or is this really the state of the art?
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting