In my experience, the best strategy is to minimize your use of it — call out to binaries or shell scripts and minimize your dependence on any of the GHA world. Makes it easier to test locally too.
I think tons of interpersonal engineering issues boil down to a failure to apply this principle.
I agree completely that an LLM's first attempt to write a Semgrep rule is likely as not to be horseshit. That's true of everything an LLM generates. But I'm talking about closed-loop LLM code generation. Unlike legal arguments and medical diagnoses, you can hook an LLM up to an execution environment and let it see what happens when the code it generates runs. It then iterates, until it has something that works.
Which, when you think about it, is how a lot of human-generated code gets written too.
So my thesis here does not depend on LLMs getting things right the first time, or without assistance.
I only came down hard on that quote out of context because it felt somewhat standalone and I want to broadcast this “fluency paradox” point a bit louder because I keep running into people who really need to hear it.
I know you know what’s up.
This is the thing with LLMs. When you’re not an expert, the output always looks incredible.
It’s similar to the fluency paradox — if you’re not native in a language, anyone you hear speak it at a higher level than yourself appears to be fluent to you. Even if for example they’re actually just a beginner.
The problem with LLMs is that they’re very good at appearing to speak “a language” at a higher level than you, even if they totally aren’t.
Fly.io seriously needs to get it together. Why it hasn’t happened yet is a mystery to me. They have a good product but stability needs to be an absolute top for a hosting service. Everything else is secondary.