After some time I began to understand the mechanics of how LLMs operate on a deeper level. That naturally led to the now fading term “prompt engineering”. These days people talk more about “context engineering” but the core idea is the same. We have to teach our own brain how the LLM works so we can structure the context in a way that lets it deliver maximum value.
With my current work on GameByte, where AI builds studio-quality mobile games from prompts, this understanding has become crucial. When you explain the problem in a way that matches how the model processes information, even something as short as “3D platformer game” in the system prompt can be enough. The model will then ask the right follow-up questions and move you toward your goal without constant manual steering.
Another lesson is that all the old pain points developers faced before AI are still pain points for LLMs. Spaghetti code, excessively long files, poor documentation, lack of comments and missing test cases all reduce their effectiveness. This is why Amazon’s recent “Kiro” and the spec-driven development approach resonate so well. They are basically formalizing best practices that those of us building with LLMs have learned over time.
And finally, LLMs do not particularly enjoy editing someone else’s messy code. Just like human developers, they perform much better when writing from scratch. If you clearly define the boundaries of the task and ask them to start fresh, the results are often significantly better.
If those factors aren’t set up well, you’ll hit walls no matter which agent you pick. I’ve seen teams switch tools thinking “this one will finally work” only to hit the exact same issues. Because the bottleneck was their workflow, not the AI’s raw ability.
Once you tune the prompts, scope tasks properly, and feed the model the right context, most modern coding agents perform surprisingly well. That’s why in our platform, some dev teams can ship entire game prototypes or complex features with LLMs, while others struggle to get a passing unit test out of the same tool.
I’ve worked with Supabase, Clerk, Keycloak, and Kratos on different projects. None of the open-source options truly deliver on “low management overhead”. You’ll always have to deal with updates, patches, and some manual babysitting.
If you refuse to compromise on your feature list, your realistic options shrink fast. In that case, Zitadel is a solid choice, but be ready for higher costs from day one. My advice is to trim the must-have list, go with a managed service, get real users, and revisit the decision when scale actually becomes a problem.
That can be fine for a lot of general use cases, but if you’re working in specific domains like coding agents or high-precision summarization, that routing can actually make results worse compared to sticking with a model you know performs well for your workload.