Why has Google totally overcomplicated their subscription models?
Looking at "Google AI Ultra" it looks like I get this Jules thing, Gemini App, Notebook, etc. But if I want Gemini CLI, then I've got to go through the GCP hellscape of trying to create subscriptions, billing accounts then buying Google Code Assist or something, but then I can't get the Gemini app.
Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).
It looks like there are two different entities inside Google who provides AI products.
From a professional context for example, we are using in my company both Google Workspaces and GCP.
With Google Workspaces, we have including in our subscription Gemini, Veo 3, Jules, etc. Everything is included in a subscription models, rate-limited but unlimited. The main entrypoint is gemini.google.com
However, everytime we need to use the API, then we need to use GCP. It gives us access to some more advances models like Veo3 instead of Veo3-fast, and more features. It's unlimited usage, but pay-as-you-go. The main entrypoint is GCP Vertex AI
And both teams, Google Workspaces and GCP are quite separate. They often don't know really well what the others teams provides.
To add to the confusion, you can also just use Gemini via API (without Vertex AI). It shows up as a separate item in billing.
In the (latest, of three different) Go SDK, you can use either Vertex AI or Gemini. But not all features exist in either. Gemini can use uploaded files as attachments, and Vertex AI can use RAG stores, for example. Gemini uses API key based authentication, while Vertex AI uses the traditional credentials. All in the same SDK.
This sounds like par of the course for Google indeed, multiple teams creating the same products and competing without centralized product management or oversight.
Happens with their other products as well, eg their meet / chat / voice / hangouts products.
OT question about Google Workspaces:
What's the difference between My Drive, Shared Drives, and "Workspaces"? When would I want to use each in a team setup?
And God forbid you were an early Google for Domains adopter and have your own Google Workspace account because nothing fucking works right for those poor saps.
You think that's bad? I had my own Google Workspace account with Google Domains and then foolishly linked my Google Fi cellphone to it.
Trying to get that stuff resolved was such a pain that I eventually had to ask a friend who knew someone that worked at Google for assistance. Their support team had absolutely no public contact info available. I eventually managed to get my data and migrate the services I actually use (Google Fi and Youtube) to a non-workspace account.
The funny thing is that a few months later they tried to send a $60 bill to collections because they reopened the account for 2 days for me to migrate things off. I was originally going to pay it to just get them off my back, but Google's own collections agency wouldn't let me pay through card or check or anything. The only way I could pay was to "Log into your Google Workspace account" which NO LONGER EXISTED.
Now it's just an amusing story about incompetence to look back on, but at the time it was stressful because I almost lost my domains, cell phone number, and email addresses all at once. Now I never trust anything to a single company.
Because their main business is selling ads and maintaining their stranglehold on that market via analytics, chrome, Chromebook, android, SSO via google, etc.
The dev focused products are a sideshow amongst different fiefdoms at google. They will never get first billing or focus.
But unlike some other pieces of the Ultra subscription you can’t share YouTube premium with family. So now I have both and Google has suggested a few times that I shouldn’t be doing that.
> Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).
One of the common tests I've seen for the Google models specifically is understanding of YT videos: Summarization, transcription, diarization, etc. One of their APIs allows you to provide a YT video ID rather than making you responsible for downloading the content yourself.
Tangentially, last week I asked Gemini's research mode to write up a strategy guide for a videogame based on a 20-episode "masterclass" youtube series. It did a surprisingly good job.
I was wondering about this too, and apparently they're working on integrating it, so the Google AI Pro/Ultra subscriptions will also give API/CLI credits or something -- https://github.com/google-gemini/gemini-cli/issues/1427
We’ve been trying to understand Google Workspace subscriptions but it’s a complete mess. It’s not even clear which plans include GMail and which don’t, Google used to be the simple but great company, why do you feel so stranded when subscribing to their product now?
When you enter Google Cloud you end up in a shroud of darkness in which admins can’t see the projects created by users.
I’m the admin for our Google Workspace, I can’t even tell if we have access to Google AI studio or not, their tutorials are complete bullshit, the docs are just plain wrong because they reference to things that are not reflected in the platform.
I had to switch back to English because their automated translations are so awful, didn’t they really think to at least let one person review once each document before releasing it to the public?!
It’s a 450 billion dollars company and they can’t realize that they added so many layers of confusion that 99% of their users won’t need. The biggest issue is that they won’t solve this anytime soon, they dug themselves into a limitless pit.
I subscribed to ultra to give deep think a try but i will not extend it even a day for all the other packaging put together by someone who might be working for a competitor. Who does these things and the fragmentation is crazy as mentioned. Chinese deep agents may be entrenched(just kidding), its crazy that someone up the ladder is so off point and not worry about losing his job.
I installed Gemini CLI today, went to AI studio, got a free Gemini 2.5 Pro API key (without setting up billing or a credit card) and auth'd with Gemini CLI using the key. Took like 30 seconds. Unfortunately the results were much poorer than what I've been getting with Claude Code.
You're forgetting the 'Google Developer Premium' subscription ($300/y) which also bundles Gemini Code Assist / CLI, some Vertex credits, but none of the other Gemini things
One can make an argument that other Gemini stuff shouldn't be in there because it's not dev related, but Jules at least should
Oh gees another subscription. This Google Developer Premium does look closer to something I would pay for, but really I just want something that gives me everything in a single subscription, that I can use. Like Claude, OpenAI or most other services on the planet.
I'd guess Google is just a million disconnected teams with their own products and product managers and pricing schemes, and no one thinking about the overall strategy or cohesion.
I think it's possible that that may be an additional benefit (for Google), but to me it seems overwhelmingly more likely that the main explanation here is Conway's Law.
They block me from subscribing because I have a custom domain for my personal email. I’d gladly give them money but they say “Sign up with your personal email” when I try to subscribe. Such poor design
At this point, I'm not sure whether Google is just full of people trolling the world, or people so smart and unworldly that they are trolling themselves. The website for Jules has a section for plans, yet it's neither mention the price, nor which actual plan they are talking about or where I can find those f**ing plans, not even a link. This is just ridiculous. Has Google already replaced all their people with AI?
I've been actually kind-of enjoying using Jules as a way of "coding" my side project (a react native app) using my phone.
I have very limited spare time these days, but sometimes on my walk to work I can think of an idea/feature, plan out what I want it to do (and sometimes use the github app to revise the existing code), then send out a few jobs. By the time I get home in the evening I've got a few PRs to review. Most of the code is useless to me, but it usually runs, and means I can jump straight into testing out the idea before going back and writing it properly myself.
Next step is to add automatic builds to each PR, so that on the way home I can just check out the different branches on my phone instead of waiting to be home to run the ios simulator :D
Im not sure about how this translates to react native, AFAICT build chains for apps less optimiside, but using vercel for deployment, neon for db if needed, Ive really been digging the ability for any branch/commit/pr to be deployed to a live site i can preview.
Coming from the python ecosystem, ive found the commit -> deployed code toolchain very easy, which for this kind of vibe coding really reduces friction when you are using it to explore functional features of which you will discard many.
It moves the decision surface on what the right thing to build to _after_ you have built it. which is quite interesting.
I will caveat this by saying this flow only works seamlessly if the feature is simple enough for the llm to oneshot it, but for the right thing its an interesting flow.
I hooked up a GitHub repo that's long been abandoned by me and I've just been tinkering with menial stuff - updating dependencies, refactoring code without changing any actual implementation details, minor feature or style updates. It mostly works well for those use cases. I don't know if I'd give it anything important to develop though.
This is exactly why we built superconductor.dev, which has live app preview for each agent. We support Claude Code as well as Gemini, Codex, Amp. If you want to check it out just mention HN in your signup form and I’ll prioritize you :)
I tried Jules multiple times during the preview. Almost every week once, and it’s pretty terrible. Out of all the cloud coding assistants, it’s the worst. I honestly thought it was just an experiment that got abandoned and never expected it to actually become a real product, similar to how GH Copilot Spaces was an experiment and turned into Copilot agent.
It does what it wants, often just “finishes” a task preemptively and asking follow ups does nothing besides it rambling for a bit, the env sometimes doesn’t persist and stuff just stops working. For a while it just failed completely instantly because the model was dead or something
Out of the dozen times I tried it, I think I merged maybe one of its PRs. The rest I trashed and reassigned to a different agent.
My ranking
- Claude Code (through gh action), no surprise there
- ChatGPT Codex
- GitHub Copilot Agent
- Jules
I will try it again today to see if the full release changed anything (they give 3 months free trial for previous testers) but if it’s the same, I wouldn’t use or pay for Jules. Just use Codex or GitHub agent. Sorry for the harsh words
Alright, I wanted to give Jules another fair try to see if it improved, but it's still terrible.
- It proposed a plan. I gave it some feedback on the plan, then clicked approve. I came back a few minutes later to "jules is waiting for input from you", but there was nothing to approve or click, it just asked "let me know if you're happy with the plan and I'll get started". I told Jules "I already approved the plan, get started" and it finally started
- I have `bun` installed through the environment config. The "validate config" button successfully printed the bun version, so it's installed. When Jules tries to use bun, I get `-bash: bun: command not found` and it wastes a ton of time trying to install bun. Then bun was available until it asked me for feedback. When I replied, bun went missing again. Now for whatever reason it prefixes every command with "chmod +x install_bun.sh && ./install_bun.sh", so each step it does is installing bun again
- It did what I asked, then saw that the tests break (none were breaking beforehand, our main branch is stable), and instead of fixing them it told me "they're unrelated to our changes". I told it to fix everything, it was unable to. I'm using the exact same setup instructions as with Copilot Agent, Codex and Claude Code. Only Jules is failing
- I thought I'll take over and see what it did, but because it didn't "finish", it didn't publish a branch. I asked it to push a branch, it started doing something and is now in "Thinking" for a while. Seems to be running tests and lint again which are failing. But eventually it published the branch.
At this point I gave up. I don't have time to debug why bun is missing when in the env configuration it is available, or why it vanished in between steps, or figure out why only jules isn't able to properly run our testsuite. It took forever for a relatively small change, and each feedback iteration is taking forever.
I'm sure it'll be great one day, and I'll continue to re-visit it, but for now I'll stick with the other 3 when I need an async agent
similar experience. i would put codex over claude personally due to the better rate limits (of which i haven’t hit once yet even on extensive days) but jules was not very good - too messy and i prefer alternative outputs to creating a pull request. like in codex you can copy a git patch which is so incredibly useful to add personal tweaks before committing
I still need to try them, but I'm having a hard time envisioning async agents being nearly as useful to me as something local like Claude Code because of how often I need to intervene and ensure it is working correctly.
Won't the loop be pretty long-tail if you're using async agents? Like don't you have to pull the code, then go through a whole build/run/test cycle? Seems really tedious vs live-coding locally where I have a hot environment running and can immediately see if the agent goes off the rails.
We use async agents heavily. The key is to have proper validation loops, tests and strong/well phrased system prompts, so the agent can quickly see if something is broken or it broke convention.
We have proper issue descriptions that go into detail what needs to be done, where the changes need to be made and why. Break epics/stories down into smaller issues that can be chopped off easily. Not really different to a normal clean project workflow really.
Now for most of the tickets we just assign them to agents, and 10 minutes later pull requests appears. The pull requests get screened with Gemini Code Assist or Copilot Agent to find obvious issues, and github actions check lint, formatting, tests, etc. This gets pushed to a separate test environment for each branch.
We review the code, test the implementation, when done, click merge. Finished.
I can focus on bigger, more complex things, while the agents fix bugs, handle small features, do refactorings and so on in the background. It's very liberating. I am convinced that most companies/environments will end up with a similar setup and this becomes the norm. There really isn't a reason why not to use async agents.
Yeah sure if you give a giant epic to an agent it will probably act out of line, but you don't really have these issues when following a proper project management flow
"I honestly thought it was just an experiment that got abandoned and never expected it to actually become a real product, similar to how GH Copilot Spaces was an experiment and turned into Copilot agent."
My guess is that this is a play for the future. They know that current-day AIs can't really handle this in general... but if you wait for the AI that can and then try to implement this functionality you might get scooped. Why not implement it and wait for the AIs to catch up, is probably what they are thinking.
I'm skeptical LLMs can ever achieve this no matter how much we pour into them, but I don't expect LLMs to be the last word in AI either.
The daily task limit went down from 60 to 15 (edit: on the free plan) with this release. Personally I wasn't close to exhausting the limit because I had to spend time going back and forth, and fixing its code.
That's odd cuz my daily task limit went up to 100.
Are you on Google Pro or using it free?
Also, I've found that even with 60, over an entire full day/night of using it for different things, I never went over 10 tasks and didn't feel like I was losing anything. To be clear, I've used this every weekend for months and I mean that I've never gone over 10 on any one day, not overall.
15 should be plenty, especially if you aren't paying for it.
I will likely never use 100 even on my busiest of weekends
I tried it with this prompt a month or two ago, and again now:
"Write a basic raytracer in Rust that will help me learn the fundamentals of raytracing and 3D graphics programming."
Last time it apparently had working or atleast compiling code, but it refused to push the changes to the branch so I could actually look at it. I asked it, cajoled it, guilted it, threatened it, it just would not push the damn code. So i have no idea if it worked.
This time it wrote some code but wrote 2 main.rs files in separate directories. It split the code randomly across 2 directories and then gets very confused about why it doesn't run right. I explained the problem and it got very lost running around the whole filesystem trying to run the program or cargo in random directories, then gave up.
These tools are marketed as if you just drop the jira text in and it spits out a finished PR. So it's notable that they don't work the way they are advertised.
there exists tools that can zero shot complex tasks (claude/codex). bar has been raised and jules doesn’t stack up. knowing google it will probably improve in due time
Good to see competition for Codex. I think cloud-based async agents like Codex and Jules are superior to the Claude Code/Aider/Cursor style of local integration. It's much safer to have them completely isolated from your own machine, and the loop of sending them commands, doing your own thing on your PC and then checking back whenever is way better than having to set up git worktrees or any other type of sandbox yourself
Codex/Jules are taking a very different approach than CC/Curser,
There used to be this thesis in software of [Cathedral vs Bazaar](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar), the modern version of it is you either
1) build your own cathedral, and you bring the user to your house. It is a more controlled environment, deployment is easier, but also the upside is more limited and also shows the model can't perform out-of-distribution. OpenAI has taken this approach for all of its agentic offering, whether ChatGPT Agent or Codex.
2) the alternative is Bazaar, where you bring the agent to the user, and let it interact with 1000 different apps/things/variables in their environment. It is 100x more difficult to pull this off, and you need better model that are more adaptable. But payoff is higher. The issues that you raised (env setup/config/etc) are temporary and fixable.
This is the actual essence of CATB, has very little to with your analogy:
-----
> The software essay contrasts two different free software development models:
> The cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers. GNU Emacs and GCC were presented as examples.
> The bazaar model, in which the code is developed over the Internet in view of the public. Raymond credits Linus Torvalds, leader of the Linux kernel project, as the inventor of this process. Raymond also provides anecdotal accounts of his own implementation of this model for the Fetchmail project
CATB was about how to organize people to tackle major community/collaborative efforts in a social system that is basically anarchy.
Both situations you've described are Cathedrals in the CATB sense: all dev costs are centralized and communities are impoverished by repeating the same dev work over and over and over and over.
It's safer have them completely isolated, but it's slower and more expensive.
Sometimes I just realize that CC going nuts and stop it before it goes too far (and consume too much). With this async setup, you may come after a couple of hours and see utter madness(and millions of tokens burned).
Completely agree. I also want to tightly control the output, and the more it just burns and burns the more i become overwhelmed by a giant pile of work to review.
A tight feedback loop is best for me. The opposite of these async models. At least for now.
I think the Github-PR model for agent code suggestions is the path of least resistance for getting adoption from today's developers working in an existing codebase. It makes sense: these developers are already used to the idea and the ergonomics of doing code reviews this way.
But pushing this existing process - which was designed for limited participation of scarce people - onto a use-case of managing a potentially huge reservoir of agent suggestions is going to get brittle quickly. Basically more suggestions require a more streamlined and scriptable review workflow.
Which is why I think working in the command line with your agents - similar to Claude and Aider - is going to be where human maintainers can most leverage the deep scalability of async and parallel agents.
> is way better than having to set up git worktrees or any other type of sandbox yourself
I've built up a helper library that does this for you for either aider or claude here: https://github.com/sutt/agro. And for FOSS purposes, I want to prevent MS, OpenAI, etc from controlling the means of production for software where you need to use their infra for sandboxing your dev environment.
FWIW you can run Claude code async via GitHub actions and have it work on issues that you @ mention it from - there’s even a slash command in Claude code that will automatically set up your repository with the GitHub action config to do this
What kind of things are you coding while “on the road”? Phone addiction aside, the UX of tapping prompts into my phone and either collaborating with an agent, or waiting for a background agent to do its thing, is not very appealing.
I also just got an email tonight for early access to try CC in the browser. "Submit coding tasks from the web." "Pick up where Claude left off by teleporting tasks to your terminal" I'm most interested to see how the mobile web UI/UX is. I frequently will kick something off, have to handle something with my toddler, and wish I could check up on or nudge it quickly from my phone.
Getting the environment set up in the cloud is a pain vs just running in your environment imo. I think we’ll probably see both for the foreseeable future but I am betting on the worse-is-better of cli tools and ide integrations winning over the next 2 years.
It took me like half an afternoon to get set up for my workplace's monorepo, but our stack is pretty much just Python and MongoDB so I guess that's easier. I agree, it's a significant trade-off, it just enables a very convenient workflow once it's done, and stuff like having it make 4 different versions with no speed loss is mind-blowing.
One nice perk on the ChatGPT Team and Enterprise plans is that Codex environments can be shared, so my work setting this up saved my coworkers a bunch of time. I pretty much just showed how it worked to my buddy and he got going instantly
> I think cloud-based async agents like Codex and Jules are superior to the Claude Code/Aider/Cursor style of local integration
Ideally, a combination of both I feel like would be a productive setup. I prefer the UI of Codex where I can hand-off boring stuff while I work on other things, because the machines they run Codex on is just too damn slow, compiling Rust takes forever and it needs to continuously refetch/recompile dependencies instead of leveraging caching, compared to my local machine.
If I could have a UI + tools + state locally while the LLM inference is the only remote point, the whole workflow would end up so much faster.
I've tried using Jules for a side project, and the code quality it emits is much worse than GH Copilot (using Claude Sonnet), Gemini CLI, and Claude Code (which is odd, since it should have the same model as Gemini CLi). It also had a tendency to get confused in a monorepo -- it would keep trying to `cd backend && $DO_STUFF` even when it was already in backend, and iterate by trying to change `$DO_STUFF` rather than figure out that it's already in the backend directory.
I just tried Jules for the first time and it did a fantastic job on reworking a whole data layer. Probably better than I would have expected from Copilot. So.. I'm initially impressed. We'll see how it holds up. I was really impressed with Copilot, but after a lot of use there are times when it gets really bogged down and confused and you waste all the time you would have saved. Which is the story of AI right now.
I used it to make a small change (adding colorful terminal output) to a side project. The PR was great. I am seeing that LLM coding agents excel at various things and suck at others quite randomly. I do appreciate the ease of simply writing a prompt and then sitting back while it generates a PR. That takes very little effort and so the pain of a failure isn't significant. You can always re-prompt.
I like the term "asynchronous coding agent" for this class of software. I found a couple of other examples of it in use, which makes me hope it's going to stick:
Looking at "Google AI Ultra" it looks like I get this Jules thing, Gemini App, Notebook, etc. But if I want Gemini CLI, then I've got to go through the GCP hellscape of trying to create subscriptions, billing accounts then buying Google Code Assist or something, but then I can't get the Gemini app.
Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).
From a professional context for example, we are using in my company both Google Workspaces and GCP.
With Google Workspaces, we have including in our subscription Gemini, Veo 3, Jules, etc. Everything is included in a subscription models, rate-limited but unlimited. The main entrypoint is gemini.google.com
However, everytime we need to use the API, then we need to use GCP. It gives us access to some more advances models like Veo3 instead of Veo3-fast, and more features. It's unlimited usage, but pay-as-you-go. The main entrypoint is GCP Vertex AI
And both teams, Google Workspaces and GCP are quite separate. They often don't know really well what the others teams provides.
In the (latest, of three different) Go SDK, you can use either Vertex AI or Gemini. But not all features exist in either. Gemini can use uploaded files as attachments, and Vertex AI can use RAG stores, for example. Gemini uses API key based authentication, while Vertex AI uses the traditional credentials. All in the same SDK.
It's a mess.
Happens with their other products as well, eg their meet / chat / voice / hangouts products.
OT question about Google Workspaces: What's the difference between My Drive, Shared Drives, and "Workspaces"? When would I want to use each in a team setup?
Trying to get that stuff resolved was such a pain that I eventually had to ask a friend who knew someone that worked at Google for assistance. Their support team had absolutely no public contact info available. I eventually managed to get my data and migrate the services I actually use (Google Fi and Youtube) to a non-workspace account.
The funny thing is that a few months later they tried to send a $60 bill to collections because they reopened the account for 2 days for me to migrate things off. I was originally going to pay it to just get them off my back, but Google's own collections agency wouldn't let me pay through card or check or anything. The only way I could pay was to "Log into your Google Workspace account" which NO LONGER EXISTED.
Now it's just an amusing story about incompetence to look back on, but at the time it was stressful because I almost lost my domains, cell phone number, and email addresses all at once. Now I never trust anything to a single company.
"Can't share the subscription because the other person in your family is in another country."
Okay guess I'll change countr- "No you can't change your Google Workspace account's country."
Deleted Comment
The dev focused products are a sideshow amongst different fiefdoms at google. They will never get first billing or focus.
One of the common tests I've seen for the Google models specifically is understanding of YT videos: Summarization, transcription, diarization, etc. One of their APIs allows you to provide a YT video ID rather than making you responsible for downloading the content yourself.
When you enter Google Cloud you end up in a shroud of darkness in which admins can’t see the projects created by users.
I’m the admin for our Google Workspace, I can’t even tell if we have access to Google AI studio or not, their tutorials are complete bullshit, the docs are just plain wrong because they reference to things that are not reflected in the platform.
I had to switch back to English because their automated translations are so awful, didn’t they really think to at least let one person review once each document before releasing it to the public?!
It’s a 450 billion dollars company and they can’t realize that they added so many layers of confusion that 99% of their users won’t need. The biggest issue is that they won’t solve this anytime soon, they dug themselves into a limitless pit.
I installed Gemini CLI today, went to AI studio, got a free Gemini 2.5 Pro API key (without setting up billing or a credit card) and auth'd with Gemini CLI using the key. Took like 30 seconds. Unfortunately the results were much poorer than what I've been getting with Claude Code.
One can make an argument that other Gemini stuff shouldn't be in there because it's not dev related, but Jules at least should
Deleted Comment
I have very limited spare time these days, but sometimes on my walk to work I can think of an idea/feature, plan out what I want it to do (and sometimes use the github app to revise the existing code), then send out a few jobs. By the time I get home in the evening I've got a few PRs to review. Most of the code is useless to me, but it usually runs, and means I can jump straight into testing out the idea before going back and writing it properly myself.
Next step is to add automatic builds to each PR, so that on the way home I can just check out the different branches on my phone instead of waiting to be home to run the ios simulator :D
Coming from the python ecosystem, ive found the commit -> deployed code toolchain very easy, which for this kind of vibe coding really reduces friction when you are using it to explore functional features of which you will discard many.
It moves the decision surface on what the right thing to build to _after_ you have built it. which is quite interesting.
I will caveat this by saying this flow only works seamlessly if the feature is simple enough for the llm to oneshot it, but for the right thing its an interesting flow.
It does what it wants, often just “finishes” a task preemptively and asking follow ups does nothing besides it rambling for a bit, the env sometimes doesn’t persist and stuff just stops working. For a while it just failed completely instantly because the model was dead or something
Out of the dozen times I tried it, I think I merged maybe one of its PRs. The rest I trashed and reassigned to a different agent.
My ranking
- Claude Code (through gh action), no surprise there
- ChatGPT Codex
- GitHub Copilot Agent
- Jules
I will try it again today to see if the full release changed anything (they give 3 months free trial for previous testers) but if it’s the same, I wouldn’t use or pay for Jules. Just use Codex or GitHub agent. Sorry for the harsh words
- It proposed a plan. I gave it some feedback on the plan, then clicked approve. I came back a few minutes later to "jules is waiting for input from you", but there was nothing to approve or click, it just asked "let me know if you're happy with the plan and I'll get started". I told Jules "I already approved the plan, get started" and it finally started
- I have `bun` installed through the environment config. The "validate config" button successfully printed the bun version, so it's installed. When Jules tries to use bun, I get `-bash: bun: command not found` and it wastes a ton of time trying to install bun. Then bun was available until it asked me for feedback. When I replied, bun went missing again. Now for whatever reason it prefixes every command with "chmod +x install_bun.sh && ./install_bun.sh", so each step it does is installing bun again
- It did what I asked, then saw that the tests break (none were breaking beforehand, our main branch is stable), and instead of fixing them it told me "they're unrelated to our changes". I told it to fix everything, it was unable to. I'm using the exact same setup instructions as with Copilot Agent, Codex and Claude Code. Only Jules is failing
- I thought I'll take over and see what it did, but because it didn't "finish", it didn't publish a branch. I asked it to push a branch, it started doing something and is now in "Thinking" for a while. Seems to be running tests and lint again which are failing. But eventually it published the branch.
At this point I gave up. I don't have time to debug why bun is missing when in the env configuration it is available, or why it vanished in between steps, or figure out why only jules isn't able to properly run our testsuite. It took forever for a relatively small change, and each feedback iteration is taking forever.
I'm sure it'll be great one day, and I'll continue to re-visit it, but for now I'll stick with the other 3 when I need an async agent
Won't the loop be pretty long-tail if you're using async agents? Like don't you have to pull the code, then go through a whole build/run/test cycle? Seems really tedious vs live-coding locally where I have a hot environment running and can immediately see if the agent goes off the rails.
We have proper issue descriptions that go into detail what needs to be done, where the changes need to be made and why. Break epics/stories down into smaller issues that can be chopped off easily. Not really different to a normal clean project workflow really.
Now for most of the tickets we just assign them to agents, and 10 minutes later pull requests appears. The pull requests get screened with Gemini Code Assist or Copilot Agent to find obvious issues, and github actions check lint, formatting, tests, etc. This gets pushed to a separate test environment for each branch.
We review the code, test the implementation, when done, click merge. Finished.
I can focus on bigger, more complex things, while the agents fix bugs, handle small features, do refactorings and so on in the background. It's very liberating. I am convinced that most companies/environments will end up with a similar setup and this becomes the norm. There really isn't a reason why not to use async agents.
Yeah sure if you give a giant epic to an agent it will probably act out of line, but you don't really have these issues when following a proper project management flow
Deleted Comment
My guess is that this is a play for the future. They know that current-day AIs can't really handle this in general... but if you wait for the AI that can and then try to implement this functionality you might get scooped. Why not implement it and wait for the AIs to catch up, is probably what they are thinking.
I'm skeptical LLMs can ever achieve this no matter how much we pour into them, but I don't expect LLMs to be the last word in AI either.
To communicate with the Jules team join https://discord.gg/googlelabs
Are you on Google Pro or using it free?
Also, I've found that even with 60, over an entire full day/night of using it for different things, I never went over 10 tasks and didn't feel like I was losing anything. To be clear, I've used this every weekend for months and I mean that I've never gone over 10 on any one day, not overall.
15 should be plenty, especially if you aren't paying for it. I will likely never use 100 even on my busiest of weekends
"Write a basic raytracer in Rust that will help me learn the fundamentals of raytracing and 3D graphics programming."
Last time it apparently had working or atleast compiling code, but it refused to push the changes to the branch so I could actually look at it. I asked it, cajoled it, guilted it, threatened it, it just would not push the damn code. So i have no idea if it worked.
This time it wrote some code but wrote 2 main.rs files in separate directories. It split the code randomly across 2 directories and then gets very confused about why it doesn't run right. I explained the problem and it got very lost running around the whole filesystem trying to run the program or cargo in random directories, then gave up.
Not sure why folks continue to zero shot things.
There used to be this thesis in software of [Cathedral vs Bazaar](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar), the modern version of it is you either 1) build your own cathedral, and you bring the user to your house. It is a more controlled environment, deployment is easier, but also the upside is more limited and also shows the model can't perform out-of-distribution. OpenAI has taken this approach for all of its agentic offering, whether ChatGPT Agent or Codex.
2) the alternative is Bazaar, where you bring the agent to the user, and let it interact with 1000 different apps/things/variables in their environment. It is 100x more difficult to pull this off, and you need better model that are more adaptable. But payoff is higher. The issues that you raised (env setup/config/etc) are temporary and fixable.
-----
> The software essay contrasts two different free software development models:
> The cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers. GNU Emacs and GCC were presented as examples.
> The bazaar model, in which the code is developed over the Internet in view of the public. Raymond credits Linus Torvalds, leader of the Linux kernel project, as the inventor of this process. Raymond also provides anecdotal accounts of his own implementation of this model for the Fetchmail project
-----
Source: Wikipedia
Both situations you've described are Cathedrals in the CATB sense: all dev costs are centralized and communities are impoverished by repeating the same dev work over and over and over and over.
Sometimes I just realize that CC going nuts and stop it before it goes too far (and consume too much). With this async setup, you may come after a couple of hours and see utter madness(and millions of tokens burned).
A tight feedback loop is best for me. The opposite of these async models. At least for now.
But pushing this existing process - which was designed for limited participation of scarce people - onto a use-case of managing a potentially huge reservoir of agent suggestions is going to get brittle quickly. Basically more suggestions require a more streamlined and scriptable review workflow.
Which is why I think working in the command line with your agents - similar to Claude and Aider - is going to be where human maintainers can most leverage the deep scalability of async and parallel agents.
> is way better than having to set up git worktrees or any other type of sandbox yourself
I've built up a helper library that does this for you for either aider or claude here: https://github.com/sutt/agro. And for FOSS purposes, I want to prevent MS, OpenAI, etc from controlling the means of production for software where you need to use their infra for sandboxing your dev environment.
And I've been writing about how to use CLI tricks to review the outputs on some case studies as well: https://github.com/sutt/agro/blob/master/docs/case-studies/i...
It's interesting that most people seem to prefer local code, I love that it allows me to code from my mobile phone while on the road.
One nice perk on the ChatGPT Team and Enterprise plans is that Codex environments can be shared, so my work setting this up saved my coworkers a bunch of time. I pretty much just showed how it worked to my buddy and he got going instantly
No special environment instructions required.
Ideally, a combination of both I feel like would be a productive setup. I prefer the UI of Codex where I can hand-off boring stuff while I work on other things, because the machines they run Codex on is just too damn slow, compiling Rust takes forever and it needs to continuously refetch/recompile dependencies instead of leveraging caching, compared to my local machine.
If I could have a UI + tools + state locally while the LLM inference is the only remote point, the whole workflow would end up so much faster.
It might be worth trying again.
"Jules now uses the advanced thinking capabilities of Gemini 2.5 Pro to develop coding plans, resulting in higher-quality code outputs"
- https://blog.langchain.com/introducing-open-swe-an-open-sour...
- https://github.com/newsroom/press-releases/coding-agent-for-...