> What I found interesting is how it forced me to think differently about the development process itself. Instead of jumping straight into code, I found myself spending more time articulating what I actually wanted to build and high level software architectural choices.
This is what I already do with Claude Code. Case in point, I spent 2.5 hours yesterday planning a new feature - first working with an agent to build out the plan, then 4 cycles of having that agent spit out a prompt for another agent to critique the plan and integrate the feedback.
In the end, once I got a clean bill of health on the plan from the “crusty-senior-architect” agent, I had Claude build it - took 12 minutes.
Two passes of the senior-architect and crusty-senior-architect debating how good the code quality was / fixing a few minor issues and the exercise was complete. The new feature worked flawlessly. It took a shade over 3 hours to implement what would have taken me 2 days by myself.
I have been doing this workflow a while, but Claude Code released Agents yesterday (/agents) and I highly recommend them. You can define an agent on the basis of another agent, so crusty-architect is a clone of my senior-architect but it’s never happy unless code was super simple, maintainable, and uses well established patterns. The debates between the two remind me of sitting in conf rooms hashing an issue out with a good team.
I've been attempting to do this kind of thing manually w/ mcp - took a look at "claude swarm" https://github.com/parruda/claude-swarm - but in the short time I spent on it I wasn't having much success - admittedly I probably went a little too far into the "build an entire org chart of agents" territory
[EDIT]: looks like I should be paying attention to the changelog on the gh repo instead of the release notes
[EDIT 2]: so far this seems to suffer from the same problem I had in my own attempts which is that I need to specifically tell it to use an agent when I would really like it to just figure that out on its own
like if I created an agent called "code-reviewer" and then I say - "review this code" ... use the agent!
Roo Code has had Orchestrator mode doing this for a while with your models of choice. And you can tweak the modes or add new ones.
What I have noticed is the forcing function of needing to think through technical and business considerations of ones work up front, which can be tedious if you are the type that likes to jump in and hack at it.
For many types of coding needs, that is likely the smarter and ultimately more efficient approach. Measure twice, cut once.
What I have not yet figured out is how to reduce the friction in the UX of that process to make it more enjoyable. Perhaps sprinkling in some dopamine triggering gamification to answering questions.
You planned and wrote a feature yesterday that would have taken yourself 2 whole days? And you already got it reviewed and deployed it and know that 'it works flawlessly'?
....
That reminds me of when my manager (a very smart, very AI-bullish ex-IC) told us about how he used AI to implement a feature over the weekend and all it took him was 20 mins. It sounds absolutely magical to me and I make a note to use AI more. I then go to review the PR, and of course there are multiple bugs and unintended side-effects in the code. Oh and there are like 8 commits spread over a 60 hour window... I manually spin up a PR which accomplishes the same thing properly... takes me 30mins.
This sounds like a positive outcome? A manager built a proof-of-concept of a feature that clearly laid out and fulfilled the basic requirements, and an engineer took 30 mins to rewrite it once it's been specified.
How long does it typically take to spec something out? I'd say more than 20 mins, and typical artifacts to define requirements are much lossier than actual code - even if that code is buggy and sloppy.
This seems more of a process problem than a tooling problem. Without specs on what the feature was, I would be inclined to say you manager had a lapse in his "smartness", there was a lot of miscommunication on what was happening, or you are being overly critical over something that "wasted 30 minutes of your time". Additionally, this seems like a crapshoot work environment...there seems to be resentment for the manager using AI to build a feature that had bugs/didn't work...whereas ideally you two sit down and talk it out and see how it could be managed better next time?
> This long (hopefully substantial) blog post was written collaboratively, with Kiro handling the bulk of the content generation while I focused on steering the direction.
So Kiro wrote whatever Kiro "decided", better said, guessed, what to write about and did most of the "content generation" - a weird but fitting term to use by a machine in writing a fake human blog. And the human kind of "directed it", but we dont really know for sure, because language is our primary interface and an author should be able to express their thoughts without using a machine?
I'd happier if the author shared their actual experience in writing the software with this tool.
Why does it matter, as long as the output is of high quality? E.g., a Spielberg directed movie indicates a level of quality, even if Spielberg didn't do everything himself.
The words-to-thoughts ratio is way too high, it reads like a elementary school book report, it’s way too long for how dry it is, I could go on but these are just some of my initial thoughts while reading the article. Also, knowing it is mostly written with AI, how do I know if details are real or made up? There’s a reason you are reading my comment: it expressed thoughts or an image that you found captivating. Being able to write well is a privileged skill that improves your communication, ability to express ideas, your humor; the things that make you an interesting person. You should not be outsourcing your voice to AI. Also Spielberg wasn’t writing an article - he was directing a movie.
It matters as its not about a vague notion of a "level of quality". It's about reading about a personal experience written by an actual person. It's about not insulting the intelligence of one's readers by throwing up a wall of LLM-text and signing oneself, it's about not being intellectually and morally dishonest by kinda mentioning it, but only half-way through the text. The comparison with Spielberg is almost there, but not there yet, as the director does whatever it is the directors do - not outsourcing it to some "Kiro". The right comparison would have been if the AI created the next sequel of E.T., Gremlins or whatever it was that Spielberg became famous for. Who cares? I want new and genuine insights that only another human can create. Not the results of a machine minimising the statistical regression error in order to produce the 100th similar piece of "content" that I have already seen the first 99 times. I have a feeling that none of the ghouls pushing for the AI-generated "content" have ever tryly enjoyed the arts, whether 'popular' or 'noble'. Its about learning something about yourself and the world, trying to graps the author's struggles during the creation. Not about mindless consumption. That's why it matters.
I’m pretty sure Mr Spielberg DOES something. If he does absolutely nothing, I really doubt the phrase “directed movie indicates a level of quality” can have any level of truth in a general case.
Well, you just answered the question yourself, by inadvertently using a good example.
You see, a movie is fictional work, but a blog article most likely isn't (or shouldn't). In this case, I am reading the article because I want to know an objective, fair assessment of Kiro from a human, not random texts generated from an LLM.
I would actually rather read his incomprehendable notes, and then the actual prompt he tried to send to the LLM, then skip reading the resultant generated part. This is actually what is worth reading to me
The output is not of high quality. It is extremely verbose for what it is trying to say, and I found myself skimming it while dealing with
1. The constant whiplash of paragraphs which describe an amazing feature followed by paragraphs which walk it back ("The shift is subtle but significant / But I want to be clear", "Kiro would implement it correctly / That said, it wasn't completely hands-off", "The onboarding assistance was genuinely helpful / However, this is also where I encountered", "It's particularly strong at understanding / However, there are times when");
2. Bland analogies that detract from, rather than enhance, understanding ("It's the difference between being a hands-on manager who needs to check every detail versus setting clear expectations and trusting the process.", "It's like having a very knowledgeable colleague who..."); and
3. literal content-free filler ("Here's where it got interesting", "Don't let perfect be the enemy of good", "Most importantly / More importantly"), etc etc.
Kiro is a new agentic IDE which puts much more of a focus on detailed, upfront specification than competitors like Cursor. That's great. Just write about that.
Bullshit is what we call it when a speaker wants to look good saying something but doesn't care if what they say is true. It's a focus on form over content.
I think a big reason why Claude Code is winning is because it’s such a thin wrapper over a very strong base model which is why people are afraid of comparing it directly.
All these IDE integrations and GUIs and super complex system prompts etc are only bloating all these other solutions with extra complexity, so comparing something inherently different becomes also harder.
Agree. I stopped reading after the blurb below because it tells me this person has not actually even used Copilot or Cursor to a serious degree. This is an AI-written sentence that seems fine, but is actually complete nonsense.
> Each tool has carved out its own niche in the development workflow: Copilot excels at enhancing your typing speed with intelligent code completion, Cursor at debugging and helping you implement discrete tasks well, and recently pushing more into agentic territory.
Cursor's autocomplete blows Copilot's out of the water. And both Copilot and Cursor have pretty capable agents. Plus, Claude Code isn't even mentioned here.
This blog post is a Kiro advertisement, not a serious comparative analysis.
Outside of its excellent capabilities, the thing I most love about Claude Code is that I can run it in my containers. I don’t want Cursor or other bloated, dependency-ridden, high-profile security targets on my desktop.
I tried out Kiro last week on quite a gnarly date time parsing problem. I had started with a two hundred-ish word prompt and a few bits of code examples for context to describe the problem. Similar to OP it forced me to stop and think more clearly about the problem I was trying to solve and in the end left my jaw on the floor as I saw it work through the task list.
I think only early bit of feedback I had was in that my tasks were also writing a lot of tests, and if the feedback loop to getting test results was neater this would be insanely powerful. Something like a sandboxed terminal, I am less keen on a YOLO mode and had to keep authorising the terminal to run.
This sort of comment always fascinates me. Having a machine do the last leaps for you is a time saver I guess, but I often wonder whether the real thing people are discovering again is that sitting down and really thinking about the problem you're trying to solve before writing some code results in better solutions when you get to it.
its not that i havent spent time thinking about it - i at least still my thinking first mostly on paper.
the LLM however asks me clarifying questions that i wouldnt have thought about myself. the thinking is a step or two deeper than it was before, if the LLM comes up with good questions
I kind of hate the implications of it, but if HN (or someone else) wanted to add value, they could show one-line sentiment analyses of the comments in the HN articles so you can decide what's what without even clicking.
I wanted a tiny helper tool to display my global keyboard shortcuts for me on macOS. I gave Kiro a short spec and some TypeScript describing the schema of the input data.
It wrote around 5000 LOC including tests and they... worked. It didn't look as nice as I would have liked, but I wasn't able to break it. However, 5000 lines was way too much code for such a simple task, the solution was over-engineered along every possible axis. I was able to (manually) get it down to ~800LOC without losing any important functionality.
> I was able to (manually) get it down to ~800LOC without losing any important functionality.
This is funny. Why would you a) care how many LOC it generated and b) bother injecting tedious, manual process into something otherwise fully automated?
What year is it? Back in 2000 or before, the same arguments were made about webpages made in Dreamweaver and Frontpage. Shortly after there was a big push towards making the web faster and more efficient, which included stepping away from web page builders and building tools that optimized and minified all aspects of a webpage.
I care about the complexity because I want/need to maintain the code down the line. I find it much easier to maintain shorter, simpler code than long, complex code.
Also because it was an experiment. I wanted to see how it would do and how reasonable the code it wrote was.
I am just gonna say it. This is not something Kiro came up with. People were already using this workflow. Perhaps they should’ve added more features instead of spending time making promo videos of themselves. I fail to see any add value here especially considering it’s Amazon. Sonnet 4 is effectively unlimited for many MAX users so giving that away to work out their list of bugs is a non-starter.
at this point i think we ought to start having a tag in the submission title for when the submission is (primarily) llm-generated, like we do for video/pdf/nsfw
> What I found interesting is how it forced me to think differently about the development process itself. Instead of jumping straight into code, I found myself spending more time articulating what I actually wanted to build and high level software architectural choices.
This is what I already do with Claude Code. Case in point, I spent 2.5 hours yesterday planning a new feature - first working with an agent to build out the plan, then 4 cycles of having that agent spit out a prompt for another agent to critique the plan and integrate the feedback.
In the end, once I got a clean bill of health on the plan from the “crusty-senior-architect” agent, I had Claude build it - took 12 minutes.
Two passes of the senior-architect and crusty-senior-architect debating how good the code quality was / fixing a few minor issues and the exercise was complete. The new feature worked flawlessly. It took a shade over 3 hours to implement what would have taken me 2 days by myself.
I have been doing this workflow a while, but Claude Code released Agents yesterday (/agents) and I highly recommend them. You can define an agent on the basis of another agent, so crusty-architect is a clone of my senior-architect but it’s never happy unless code was super simple, maintainable, and uses well established patterns. The debates between the two remind me of sitting in conf rooms hashing an issue out with a good team.
Thanks for the tip!
I've been attempting to do this kind of thing manually w/ mcp - took a look at "claude swarm" https://github.com/parruda/claude-swarm - but in the short time I spent on it I wasn't having much success - admittedly I probably went a little too far into the "build an entire org chart of agents" territory
[EDIT]: looks like I should be paying attention to the changelog on the gh repo instead of the release notes
https://github.com/anthropics/claude-code/blob/main/CHANGELO...
[EDIT 2]: so far this seems to suffer from the same problem I had in my own attempts which is that I need to specifically tell it to use an agent when I would really like it to just figure that out on its own
like if I created an agent called "code-reviewer" and then I say - "review this code" ... use the agent!
What I have noticed is the forcing function of needing to think through technical and business considerations of ones work up front, which can be tedious if you are the type that likes to jump in and hack at it.
For many types of coding needs, that is likely the smarter and ultimately more efficient approach. Measure twice, cut once.
What I have not yet figured out is how to reduce the friction in the UX of that process to make it more enjoyable. Perhaps sprinkling in some dopamine triggering gamification to answering questions.
....
That reminds me of when my manager (a very smart, very AI-bullish ex-IC) told us about how he used AI to implement a feature over the weekend and all it took him was 20 mins. It sounds absolutely magical to me and I make a note to use AI more. I then go to review the PR, and of course there are multiple bugs and unintended side-effects in the code. Oh and there are like 8 commits spread over a 60 hour window... I manually spin up a PR which accomplishes the same thing properly... takes me 30mins.
How long does it typically take to spec something out? I'd say more than 20 mins, and typical artifacts to define requirements are much lossier than actual code - even if that code is buggy and sloppy.
So Kiro wrote whatever Kiro "decided", better said, guessed, what to write about and did most of the "content generation" - a weird but fitting term to use by a machine in writing a fake human blog. And the human kind of "directed it", but we dont really know for sure, because language is our primary interface and an author should be able to express their thoughts without using a machine?
I'd happier if the author shared their actual experience in writing the software with this tool.
You see, a movie is fictional work, but a blog article most likely isn't (or shouldn't). In this case, I am reading the article because I want to know an objective, fair assessment of Kiro from a human, not random texts generated from an LLM.
1. The constant whiplash of paragraphs which describe an amazing feature followed by paragraphs which walk it back ("The shift is subtle but significant / But I want to be clear", "Kiro would implement it correctly / That said, it wasn't completely hands-off", "The onboarding assistance was genuinely helpful / However, this is also where I encountered", "It's particularly strong at understanding / However, there are times when");
2. Bland analogies that detract from, rather than enhance, understanding ("It's the difference between being a hands-on manager who needs to check every detail versus setting clear expectations and trusting the process.", "It's like having a very knowledgeable colleague who..."); and
3. literal content-free filler ("Here's where it got interesting", "Don't let perfect be the enemy of good", "Most importantly / More importantly"), etc etc.
Kiro is a new agentic IDE which puts much more of a focus on detailed, upfront specification than competitors like Cursor. That's great. Just write about that.
This blog post does approximate my sentiments pretty well, although its writing style diverges from my usual style.
Can't really get value out reading this if you don't compare it to the leading coding agent
> Each tool has carved out its own niche in the development workflow: Copilot excels at enhancing your typing speed with intelligent code completion, Cursor at debugging and helping you implement discrete tasks well, and recently pushing more into agentic territory.
Cursor's autocomplete blows Copilot's out of the water. And both Copilot and Cursor have pretty capable agents. Plus, Claude Code isn't even mentioned here.
This blog post is a Kiro advertisement, not a serious comparative analysis.
I think only early bit of feedback I had was in that my tasks were also writing a lot of tests, and if the feedback loop to getting test results was neater this would be insanely powerful. Something like a sandboxed terminal, I am less keen on a YOLO mode and had to keep authorising the terminal to run.
the LLM however asks me clarifying questions that i wouldnt have thought about myself. the thinking is a step or two deeper than it was before, if the LLM comes up with good questions
It wrote around 5000 LOC including tests and they... worked. It didn't look as nice as I would have liked, but I wasn't able to break it. However, 5000 lines was way too much code for such a simple task, the solution was over-engineered along every possible axis. I was able to (manually) get it down to ~800LOC without losing any important functionality.
This is funny. Why would you a) care how many LOC it generated and b) bother injecting tedious, manual process into something otherwise fully automated?
Also because it was an experiment. I wanted to see how it would do and how reasonable the code it wrote was.