The comment about having "3 to 6 hours per day" to work directly with code is the key insight here. I run a small AI consultancy and use Claude Code daily to deliver client projects — chatbots, automation pipelines, API integrations — and the spec-driven approach described in this post is what makes it actually work at scale.
The pattern I've converged on: spend the first 30 minutes writing detailed markdown specs (inputs, outputs, edge cases, integration points), then let Claude Code chew through the implementation while I review, test, and iterate. For a typical automation project — say a WhatsApp bot that handles booking flows and integrates with a client's CRM — this cuts delivery time roughly in half compared to writing everything manually.
The biggest practical lesson: the spec quality is everything. A vague spec produces code you'll spend more time debugging than you saved. A good spec with explicit error handling expectations, API response formats, and state transitions produces code that's 80-90% production-ready on the first pass.
Where I disagree slightly with the parallel agent approach: for client-facing work where correctness matters more than speed, I've found 2-3 focused agents (one on backend, one on frontend, one on tests) more reliable than 6-8 competing agents that create merge conflicts. The overhead of resolving conflicts and ensuring consistency across parallel outputs eats into the productivity gains fast.
I've recently started adding a PROJECT.md to all my own projects to keep the direction consistent.
Just something that tells the LLM (and me, as I tend to forget) what is the actual purpose of the project and what are the next features to be added.
In many cases the direction tends to get lost and the AI starts adding features like it's doing a multi-user SaaS or helfully adding things that aren't in the scope for the project because I have another project doing that already.
I did a sort of bell curve with this type of workflow over summer.
- Base Claude Code (released)
- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)
- Base Claude Code (today)
Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.
My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.
I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.
I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator
The merge conflicts and cognitive load are indeed two big struggles with my setup. Going back to a single Claude instances however would mean I’m waiting for things to happen most of the time. What do
you do while Claude is busy?
It is one of those things I look and thing, yeah you are hyper productive... but it looks cognitively like being a pilot landing a plane all day long, and not what I signed up for. Where is my walk in the local park where I think through stuff and come up with a great idea :(
I use claude-code. claude-code now spins up many agents on its own, sometimes switch models to save costs, and can easily use 200+ tools concurrently, and use multiple skills at the same time when needed, its automation gets smarter and more parallel by the day, do we still need to outwit what's probably already done by claude-code? I still use tmux but no longer for multiple agents, but for me to poke around at will, I let the plan/code/review/whatever fully managed and parallelized by claude-code itself, it's massively impressive.
This rings true, as I’ve noticed that with every new model update, I’m leaving behind full workflows I’ve built. The article is really great, and I do admire the system, even if it is overengineered in places, but it already reads like last quarter’s workflow. Now letting Codex 5.3 xhigh chug for 30 minutes on my super long dictated prompt seems to do the trick. And I’m hearing 5.4 is meaningfully better model. Also for fully autonomous scaffolding of new projects towards the first prototype I have my own version of a very simple Ralph loop that gets feed gpt-pro super spec file.
The deny list section hit home. I keep seeing agents use unlink instead of rm, or spawn a python subprocess to delete files. Every new rule just taught the agent a new workaround.
Ended up flipping the model — instead of blocking bad actions, require proof of safety before any action runs. No proof, no action. Much harder to route around.
Nothing super fancy.
For me “proof” just means the agent has to make its intent explicit in a way I can check before running it.
For example:
1) If it wants to delete a file, it has to output the exact path it thinks it’s deleting. I normalize it and make sure it’s inside the project root. If not, I block it.
2) If it proposes a big change, I require a diff first instead of letting it execute directly.
3) After code changes, I run tests or at least a lint/type check before accepting it.
So it’s less about formal proofs and more about forcing the agent to surface assumptions in a structured way, then verifying those assumptions mechanically.
Still hacky, but it reduced the “creative workaround” behavior a lot.
I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?
Most of what I'm seeing is AI influencers promoting their shovels.
> If it's so much more productive, where is all the great software that's being built with it?
This is such a new and emerging area that I don't understand how this is a constructive comment on any level.
You can be skeptical of the technology in good faith, but I think one shouldn't be against people being curious and engaging in experimentation. A lot of us are actively trying to see what exactly we can build with this, and I'm not an AI influencer by any means. How do we find out without trying?
I still feel like we're still at a "building tools to build tools" stage in multi-agent coding. A lot of interesting projects springing up to see if they can get many agents to effectively coordinate on a project. If anything, it would be useful to understand what failed and why so one can have an informed opinion.
I don't think it is unreasonable to ask where all the great AI built software is. There has been comments here on HN about people becoming 30 to 50 times more productive than before.
To put a statement like that into perspective (50 times more productive): The first week of the year about as much was accomplished as the whole previous year put together.
The hard part about extracting patterns right now is that they shift every 2-4 months now (was every 6-12 month in 2024-2025). What works for you today might be obsolete in May.
I just avoided $1.8 million/year in review time w/ parallel agents for a code review workflow.
We have 500+ custom rules that are context sensitive because I work on a large and performance sensitive C++ codebase with cooperative multitasking. Many things that are good are non-intuitive and commercial code review tools don't get 100% coverage of the rules. This took a lot of senior engineering time to review.
Anyways, I set up a massive parallel agent infrastructure in CI that chunks the review guidelines into tickets, adds to a queue, and has agents spit up GitHub code review comments. Then a manager agent validates the comments/suggestions using scripts and posts the review. Since these are coding agents they can autonomously gather context or run code to validate their suggestions.
Instantly reduced mean time to merge by 20% in an A/B test. Assuming 50% of time on review, my org would've needed 285 more review hours a week for the same effect. Super high signal as well, it catches far more than any human can and never gets tired.
Likewise, we can scale this to any arbitrary review task, so I'm looking at adding benchmarking and performance tuning suggestions for menial profiling tasks like "what data structure should I use".
This is what Google uses in their internal review systems - at least their AI team does this.
Heard a presentation from one of their AI engineers where they had a few slides about using multi-agent systems with different focuses looking through the code before a single human is pinged to look at the pull request.
That sounds like a completely made up bullshit number that a junior engineer would put on a resume. There’s absolutely no way you have enough data to state that with anything approaching the confidence you just did.
It's for personal use, and I wouldn't call it great software, but I used Claude Code Teams in parallel to create a Fluxbox-compatible window compositor for Wayland [1].
Overall effort was a few days of agentic vibe-coding over a period of about 3 weeks. Would have been faster, but the parallel agents burn though tokens extremely quickly and hit Max plan limits in under an hour.
Even if somebody shows you what they've built with it, you're none the wiser. All you'll know is that it seemingly works well enough for a greenfield project.
The jury is still very far out on how agentic development affects mid/long term speed and quality. Those feedback cycles are measured in years, not weeks. If we bother to measure at all.
People in our field generally don't do what they know works, because by and large, nobody really knows, beyond personal experiences, and I guess a critical mass doesn't even really care. We do what we believe works. Programming is a pop culture.
I am now releasing software for projects that have spent years on the back-burner. From my perspective, agent loops have been a success. It makes the impractical pipe-dream doable.
I'm using Claude Code (loving it) and haven't dipped into the agentic parallel worker stuff yet.
Where does one get started?
How do you manage multiple agents working in parallel on a single project? Surely not the same working directory tree, right? Copies? Different branches / PRs?
You can't use your Claude Code login and have to pay API prices, right? How expensive does it get?
Does good design up front matter as much if an AI can refactor in a few hours something that would take a good developer a month? Refactoring is one of those tasks that's tedious, and too non-trivial for automation, but seems perfect for an AI. Especially if you already have all the tests.
I work for Snowflake and the code I'm building is internal. I'm exploring open sourcing my main project which I built with this system. I'd love to share it one day!
The long tail of deployable software always strikes at some point, and monetization is not the first thing I think of when I look at my personal backlog.
I also am a tmux+claude enjoyer, highly recommended.
There are dozens and dozens of these submitted to Show HN, though increasingly without the title prefix now. This one doesn't seem any more interesting than the others.
I picked up a number things from others sharing their setup. While I agree some aspects of these are repetitive (like using md files for planning), I do find useful things here and there.
I'm experimenting with building an agent swarm to take a very large existing app that's been built over the past two decades (internal to the company I work for) and reverse engineer documentation from the code so I can then use that documentation as the basis for my teams to refactor big chunks of old-no-longer-owned-by-anyone features and to build new features using AI better. The initial work to just build a large-scale understanding of exactly what we actually run in prod is a massively parallelizable task that should be a good fit for some documentation writing agents. Early days but so far my experiments seem to be working out.
Obviously no users will see a benefit directly but I reckon it'll speed up delivery of code a lot.
From personal experience, SW that was developed with agent does not hit the road because:
a) learning and adapting is at first more effort, not less,
b) learning with experiments is faster,
c) experiencing the acceleration first hand is demoralising,
d) distribution/marketing is on an accelerated declining efficiency trajectory (if you want to keep it human-generated)
e) maintenance effort is not decelerating as fast as creation effort
Yet, I believe your statement is wrong, in the first place. A lot of new code is created with AI assistance, already and part of the acceleration in AI itself can be attributed to increased use of ai in software engineering (from research to planning to execution).
In my view, these agent teams have really only become mainstream in the last ~3 weeks since Claude Code released them. Before that they were out there but were much more niche, like in Factory or Ralphie Wiggum.
There is a component to this that keeps a lot of the software being built with these tools underground: There are a lot of very vocal people who are quick with downvotes and criticisms about things that have been built with the AI tooling, which wouldn't have been applied to the same result (or even poorer result) if generated by human.
This is largely why I haven't released one of the tools I've built for internal use: an easy status dashboard for operations people.
Things I've done with agent teams: Added a first-class ZFS backend to ganeti, rebuilt our "icebreaker" app that we use internally (largely to add special effects and make it more fun), built a "filesystem swiss army knife" for Ansible, converted a Lambda function that does image manipulation and watermarking from Pillow to pyvips, also had it build versions of it in go, rust, and zig for comparison sake, build tooling for regenerating our cache of watermarked images using new branding, have it connect to a pair of MS SQL test servers and identify why logshipping was broken between them, build an Ansible playbook to deploy a new AWS account, make a web app that does a simple video poker app (demo to show the local users group, someone there was asking how to get started with AI), having it brainstorm and build 3 versions of a crossword-themed daily puzzle (just to see what it'd come up with, my wife and I are enjoying TiledWords and I wanted to see what AI would come up with).
Those are the most memorable things I've used the agent teams to build in the last 3 weeks. Many of those things are internal tools or just toys, as another reply said. Some of those are publicly released or in progress for release. Most of these are in addition to my normal work, rather than as a part of it.
Further, my POV is that coding agents crossed a chasm only last December with Opus 4.5 release. Only since then these kinds of agent teams setups actually work. It’s early days for agent orchestration
Most software is mundane run of the mill CRUD feature set. Just yesterday I rolled out 5 new web pages and revamped a landing page in under an hour that would have easily taken 3-4 days of back and forth.
There are lot of similar coding happening.
This is the space AI coding truly shines. Repetitive work, all the wiring and routing around adding links, SEO elements and what not.
Either way, you can try to incorporate AI coding in your coding flow and where it takes.
You're not wrong. The current bottleneck is validation. If you use orchestration to ship faster, you have less time to validate what you're building, and the quality goes down.
If you have a really big test suite to build against, you can do more, but we're still a ways off from dark software factories being viable. I guessed ~3 years back in mid 2025 and people thought I was crazy at the time, but I think it's a safe time frame.
People are building for themselves. However I’d also reference www.Every.to
They built the popular compound-engineering plugin and have shipped a set of production grade consumer apps. They offer a monthly subscription and keep adding to that subscription by shipping more tools.
There’s so much more iOS apps being published that it takes a week to get a dev account, review times are longer, and app volume is way up. It’s not really a thing you’re going to notice or not if you’re just going by vibes.
The bigger question for me is how to use this efficiently as a team of engineers. Most workflow tools i've seen so far focus on making a single engineer get more out of a claude/codex subscription but not much how teams as a whole can become more productive.
My hunch is to experiment not as a team but individually. With teams you want a bit more stability in terms of workflows. A lot of this stuff involves people handcrafting workflows on top of tools and models that are dramatically changing nearly constantly. That kind of chaos is not something you want at the team level.
I'm mostly sticking to a codex workflow. I transitioned from the cli to their app when they released it a few weeks ago and I'm pretty happy with that. I've had to order extra tokens a few times but most weeks I get by on the 20$ Chat GPT Plus subscription. That's not really compatible with burning hundreds/thousands on using lots of parallel agents in any case.
I also have a hunch that there are some fast diminishing returns on that kind of spending. At least, I seem to get a lot of value out of just spending 20/month. A lot of that more extreme burn might just be tool churn / inefficiency.
With teams, basically you should organize around CI/CD, pull requests and having code reviews (with or without AI assists). Standard stuff; you should be doing that anyway. But doubling down on making this process fast and efficient pays off. With LLMs the addition to this would be codifying/documenting key skills in your repositories for doing stuff with your code base and ways of working. A key thing in teams is to own and iterate on that stuff and not let it just rot. PRs against that should be well reviewed and coordinated and not just sneaked in.
Otherwise, AI usage just increases the volume of PRs and changes. Most of these tools in any case work a lot better if you have a good harness around your workflow that allows it to run linting/tests, etc. If you have good CI, this shouldn't be hard to express in skill form. The issue then becomes making sure the team gets good at producing high quality PRs and processing them efficiently. If you are dealing with a lot of conflicts, PR scope creep, etc. that's probably not optimal.
A lot of stuff related to coordinating via issue trackers can also be done with agents. If you have gh cli set up, it can actually create, label, etc. or act on github issues. That opens the door to also using LLMs for broader product management. It's something I've been meaning to experiment with more. But for bigger teams that could be something to lean on more. LLMs filing lots of issues is only helpful if you have the means to stay on top of that. That requires workflows where a lot of issues are short lived (time to some kind of resolution). This is not something many teams are good at currently.
I certainly don't run 6 at a time, but even with just 1 - if it's doing anything visual - how are folks hooking up screenshots to self verify? And how do you keep an eye on it?
The only solution I've seen on a Mac is doing it on a separate monitor.
I couldn't find a solution here and have built similar things in the past so I took a crack at it using CGVirtualDisplay.
Ended up adding a lot of productivity features and polished until it felt good.
Curious if there are similar solutions out there I just haven't seen.
For macOS, generically, you can run `screencapture -o -l $WINDOW_ID output.png` to screenshot any window. You can list window IDs belonging to a PID with a few lines of Swift (that any agent will generate). Hook this up together and give it as a tool to your agents.
For major, in depth refactors and large scale architectural work, it's really important to keep the agents on-track, to prevent them from assuming or misunderstanding important things, or whatever — I can't imagine what it'd be like doing parallel agents. I don't see how that's useful. And I'm a massive fan of agentic coding!
It's like OpenClaw for me — I love the idea of agentic computer use; but I just don't see how something so unsupervised and unsupervisable is remotely a useful or good idea.
I have found that with a good plan we are able to make big refactors quite a bit faster. The approach is that our /create-plan command starts high level, and only when we agree on that, fills in the details. It will also determine in what pull requests it plans to deliver it. The size estimation of the prs is never correct, but it gives a good enough phase split for the next step. Which is letting it rip with a “Ralph loop” (just a bash script while with claude -p —yolo). This with instructions to use jj (or git) and some other must read skills.
This lets us review the end result, and correct with a review. That then gets incorporated whilst having claude rework the actual small prs that we can easily review and touch up.
I must say jj helps massively in staying sane and rebasing a lot. Claude fixes the conflicts fine.
We have been able to push ~5K of changes in a couple days, whilst reviewing all code, and making sure it’s on par with our quality requirements. And not writing a line of code ourselves.
I would have never attempted these large scale refactors, and we would have been stuck with the tech debt forever in the past.
The pattern I've converged on: spend the first 30 minutes writing detailed markdown specs (inputs, outputs, edge cases, integration points), then let Claude Code chew through the implementation while I review, test, and iterate. For a typical automation project — say a WhatsApp bot that handles booking flows and integrates with a client's CRM — this cuts delivery time roughly in half compared to writing everything manually.
The biggest practical lesson: the spec quality is everything. A vague spec produces code you'll spend more time debugging than you saved. A good spec with explicit error handling expectations, API response formats, and state transitions produces code that's 80-90% production-ready on the first pass.
Where I disagree slightly with the parallel agent approach: for client-facing work where correctness matters more than speed, I've found 2-3 focused agents (one on backend, one on frontend, one on tests) more reliable than 6-8 competing agents that create merge conflicts. The overhead of resolving conflicts and ensuring consistency across parallel outputs eats into the productivity gains fast.
Just something that tells the LLM (and me, as I tend to forget) what is the actual purpose of the project and what are the next features to be added.
In many cases the direction tends to get lost and the AI starts adding features like it's doing a multi-user SaaS or helfully adding things that aren't in the scope for the project because I have another project doing that already.
- Base Claude Code (released)
- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)
- Base Claude Code (today)
Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.
My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.
I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.
I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator
- Research
- Scan the web
- Text friends
- Side projects
- Take walks outside
etc
Ended up flipping the model — instead of blocking bad actions, require proof of safety before any action runs. No proof, no action. Much harder to route around.
Curious if you've tried anything similar.
For example: 1) If it wants to delete a file, it has to output the exact path it thinks it’s deleting. I normalize it and make sure it’s inside the project root. If not, I block it. 2) If it proposes a big change, I require a diff first instead of letting it execute directly. 3) After code changes, I run tests or at least a lint/type check before accepting it.
So it’s less about formal proofs and more about forcing the agent to surface assumptions in a structured way, then verifying those assumptions mechanically.
Still hacky, but it reduced the “creative workaround” behavior a lot.
Most of what I'm seeing is AI influencers promoting their shovels.
This is such a new and emerging area that I don't understand how this is a constructive comment on any level.
You can be skeptical of the technology in good faith, but I think one shouldn't be against people being curious and engaging in experimentation. A lot of us are actively trying to see what exactly we can build with this, and I'm not an AI influencer by any means. How do we find out without trying?
I still feel like we're still at a "building tools to build tools" stage in multi-agent coding. A lot of interesting projects springing up to see if they can get many agents to effectively coordinate on a project. If anything, it would be useful to understand what failed and why so one can have an informed opinion.
To put a statement like that into perspective (50 times more productive): The first week of the year about as much was accomplished as the whole previous year put together.
We have 500+ custom rules that are context sensitive because I work on a large and performance sensitive C++ codebase with cooperative multitasking. Many things that are good are non-intuitive and commercial code review tools don't get 100% coverage of the rules. This took a lot of senior engineering time to review.
Anyways, I set up a massive parallel agent infrastructure in CI that chunks the review guidelines into tickets, adds to a queue, and has agents spit up GitHub code review comments. Then a manager agent validates the comments/suggestions using scripts and posts the review. Since these are coding agents they can autonomously gather context or run code to validate their suggestions.
Instantly reduced mean time to merge by 20% in an A/B test. Assuming 50% of time on review, my org would've needed 285 more review hours a week for the same effect. Super high signal as well, it catches far more than any human can and never gets tired.
Likewise, we can scale this to any arbitrary review task, so I'm looking at adding benchmarking and performance tuning suggestions for menial profiling tasks like "what data structure should I use".
Heard a presentation from one of their AI engineers where they had a few slides about using multi-agent systems with different focuses looking through the code before a single human is pinged to look at the pull request.
That sounds like a completely made up bullshit number that a junior engineer would put on a resume. There’s absolutely no way you have enough data to state that with anything approaching the confidence you just did.
Overall effort was a few days of agentic vibe-coding over a period of about 3 weeks. Would have been faster, but the parallel agents burn though tokens extremely quickly and hit Max plan limits in under an hour.
1. https://github.com/ecliptik/fluxland
The jury is still very far out on how agentic development affects mid/long term speed and quality. Those feedback cycles are measured in years, not weeks. If we bother to measure at all.
People in our field generally don't do what they know works, because by and large, nobody really knows, beyond personal experiences, and I guess a critical mass doesn't even really care. We do what we believe works. Programming is a pop culture.
Where does one get started?
How do you manage multiple agents working in parallel on a single project? Surely not the same working directory tree, right? Copies? Different branches / PRs?
You can't use your Claude Code login and have to pay API prices, right? How expensive does it get?
The long tail of deployable software always strikes at some point, and monetization is not the first thing I think of when I look at my personal backlog.
I also am a tmux+claude enjoyer, highly recommended.
I actually had a manager once who would say Done-Done-Done. He’s clearly seen some shit too.
Obviously no users will see a benefit directly but I reckon it'll speed up delivery of code a lot.
a) learning and adapting is at first more effort, not less, b) learning with experiments is faster, c) experiencing the acceleration first hand is demoralising, d) distribution/marketing is on an accelerated declining efficiency trajectory (if you want to keep it human-generated) e) maintenance effort is not decelerating as fast as creation effort
Yet, I believe your statement is wrong, in the first place. A lot of new code is created with AI assistance, already and part of the acceleration in AI itself can be attributed to increased use of ai in software engineering (from research to planning to execution).
There is a component to this that keeps a lot of the software being built with these tools underground: There are a lot of very vocal people who are quick with downvotes and criticisms about things that have been built with the AI tooling, which wouldn't have been applied to the same result (or even poorer result) if generated by human.
This is largely why I haven't released one of the tools I've built for internal use: an easy status dashboard for operations people.
Things I've done with agent teams: Added a first-class ZFS backend to ganeti, rebuilt our "icebreaker" app that we use internally (largely to add special effects and make it more fun), built a "filesystem swiss army knife" for Ansible, converted a Lambda function that does image manipulation and watermarking from Pillow to pyvips, also had it build versions of it in go, rust, and zig for comparison sake, build tooling for regenerating our cache of watermarked images using new branding, have it connect to a pair of MS SQL test servers and identify why logshipping was broken between them, build an Ansible playbook to deploy a new AWS account, make a web app that does a simple video poker app (demo to show the local users group, someone there was asking how to get started with AI), having it brainstorm and build 3 versions of a crossword-themed daily puzzle (just to see what it'd come up with, my wife and I are enjoying TiledWords and I wanted to see what AI would come up with).
Those are the most memorable things I've used the agent teams to build in the last 3 weeks. Many of those things are internal tools or just toys, as another reply said. Some of those are publicly released or in progress for release. Most of these are in addition to my normal work, rather than as a part of it.
Most software is mundane run of the mill CRUD feature set. Just yesterday I rolled out 5 new web pages and revamped a landing page in under an hour that would have easily taken 3-4 days of back and forth.
There are lot of similar coding happening.
This is the space AI coding truly shines. Repetitive work, all the wiring and routing around adding links, SEO elements and what not.
Either way, you can try to incorporate AI coding in your coding flow and where it takes.
https://git.ceux.org/cashflow.git/
If you have a really big test suite to build against, you can do more, but we're still a ways off from dark software factories being viable. I guessed ~3 years back in mid 2025 and people thought I was crazy at the time, but I think it's a safe time frame.
They built the popular compound-engineering plugin and have shipped a set of production grade consumer apps. They offer a monthly subscription and keep adding to that subscription by shipping more tools.
Any ideas?
I'm mostly sticking to a codex workflow. I transitioned from the cli to their app when they released it a few weeks ago and I'm pretty happy with that. I've had to order extra tokens a few times but most weeks I get by on the 20$ Chat GPT Plus subscription. That's not really compatible with burning hundreds/thousands on using lots of parallel agents in any case.
I also have a hunch that there are some fast diminishing returns on that kind of spending. At least, I seem to get a lot of value out of just spending 20/month. A lot of that more extreme burn might just be tool churn / inefficiency.
With teams, basically you should organize around CI/CD, pull requests and having code reviews (with or without AI assists). Standard stuff; you should be doing that anyway. But doubling down on making this process fast and efficient pays off. With LLMs the addition to this would be codifying/documenting key skills in your repositories for doing stuff with your code base and ways of working. A key thing in teams is to own and iterate on that stuff and not let it just rot. PRs against that should be well reviewed and coordinated and not just sneaked in.
Otherwise, AI usage just increases the volume of PRs and changes. Most of these tools in any case work a lot better if you have a good harness around your workflow that allows it to run linting/tests, etc. If you have good CI, this shouldn't be hard to express in skill form. The issue then becomes making sure the team gets good at producing high quality PRs and processing them efficiently. If you are dealing with a lot of conflicts, PR scope creep, etc. that's probably not optimal.
A lot of stuff related to coordinating via issue trackers can also be done with agents. If you have gh cli set up, it can actually create, label, etc. or act on github issues. That opens the door to also using LLMs for broader product management. It's something I've been meaning to experiment with more. But for bigger teams that could be something to lean on more. LLMs filing lots of issues is only helpful if you have the means to stay on top of that. That requires workflows where a lot of issues are short lived (time to some kind of resolution). This is not something many teams are good at currently.
The only solution I've seen on a Mac is doing it on a separate monitor.
I couldn't find a solution here and have built similar things in the past so I took a crack at it using CGVirtualDisplay.
Ended up adding a lot of productivity features and polished until it felt good.
Curious if there are similar solutions out there I just haven't seen.
https://github.com/jasonjmcghee/orcv
It's like OpenClaw for me — I love the idea of agentic computer use; but I just don't see how something so unsupervised and unsupervisable is remotely a useful or good idea.