Readit News logoReadit News
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
mahendra0203 · 2 days ago
Freezing JS execution between actions is the kind of obvious idea that nobody did properly untill now. Kudos for actually forking Chromium instead of hacking around Playwright like everybody else.

But here's my thought: you're solving the "stale state" problem by making the browser deterministic. Real websites aren't deterministic. WebSOcket pushes, long-polling, background fetches, animations that don't finish — freezing execution doesn't pause the server. The moment you unfreeze, the world may have moved.

90.5% on Mind2Web is great. But Mind2Web tasks are mostly "fill a form, click submit." The brutal failures happen on SPAs with optimistic UI updates, where the DOM says "saved" but the network request hasn't finished. Does ABP handle that case, or does the freeze just delay the confusion?

Genuine question — not trying to tear this down. The architecture is smart. I just wonder if "make the browser simpler for the agent", eventually hit s a wall where you need to make the agent smarter about async instead.

theredsix · 2 days ago
The freeze sometimes does capture in between states. What I've seen the agent does in those cases is that it recognizes it's in between states and calls browser_wait(). Where the agent goes off the rails isn't a snapshot in the middle of a state transition, (it's smart enough to know to retry in that case), it's when the DOM changes after the agent believes the page has settled.

For async, lots of people smarter than me working on the smarter agent problem. Though there's a latency floor with inference due to prompt processing, and output generation. Without tools like ABP, the LLM is always aiming at a moving target.

theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
siva7 · 2 days ago
Call me impressed between all that vibe-coded crap nowadays and this vibe-coded masterpiece
theredsix · 2 days ago
*bows
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
KurSix · 3 days ago
Google is never going to upstream Chromium code that lets an external API arbitrarily freeze V8 and the render loop, purely based on the security model and stability requirements of a consumer browser. Your only real path forward is maintaining a custom patchset on top of stable releases, exactly like Brave or Electron do. Just be prepared that Claude won't save you when they inevitably rewrite the Blink architecture again
theredsix · 2 days ago
It's a long shot but getting ABP to be a first party citizen alongside CDP would be my dream!
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
multidude · 3 days ago
The stale state problem is real and underappreciated. I've been running browser automation through OpenClaw and the failure modes you describe — modal appears after screenshot, dropdown covers the target element — are exactly what causes silent failures that are hard to debug. The agent "succeeds" from its perspective because it acted on the last known state.

The freeze-then-capture approach is interesting. Curious how it handles pages with aggressive anti-bot detection that fingerprints headless Chromium forks — that's the other failure mode I keep hitting.

theredsix · 2 days ago
Right now, it's evading all anti-botting detectors I've tested it on. I believe it's due to the fact it runs in headful mode and I've removed all detectable CDP signatures. Input events are also simulated at a system level (typing is at 200 WPM) so it's very hard for a page's javascript to detect it's not in a human operated chrome. A lot of detection on headless happens due to the webGPU capabilities being disabled since a modern computer is very unlikely to not support those. You could also wire up one of the Heretic models as a dedicated Captcha solver, I recommend Qwen 3.5 27b Heretic! https://huggingface.co/coder3101/Qwen3.5-27B-heretic
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
KurSix · 3 days ago
Finally someone realized that CDP just doesn't cut it for agents and dug straight into the engine. Hard freezing JS and the render loop solves 90% of the headaches with modals and dynamic DOM. Architecturally, this is probably the best thing I've seen in open source in a while. The only massive red flag is maintaining the fork - manually merging Chromium updates is an absolute meat grinder
theredsix · 2 days ago
Maintaining the fork isn't so bad, the core chromium changes are only a few hundred lines and I was able to extend already existing concept like debugger pausing and virtualtime emulation while riding off mojo IPC for cross thread communications.
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
seanrrr · 3 days ago
> Pause JavaScript + virtual time

Very cool! Sometimes when I try to debug things with chrome dev tools MCP, Claude would click something and too many things happen then it kind of comes to the wrong conclusions about the state of things, so sounds like this should give it a more accurate slice of time / snapshot of things.

theredsix · 3 days ago
Exactly! This race condition is exactly the category of problems ABP will solve.
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
jlu · 3 days ago
Glad to know that, but being able to run the browser in headless mode will be much helpful in an agentic setting (think parallel agents operating browsers in the background), since you are already patching chromium, that might be a great addition to the feature list :)
theredsix · 3 days ago
Yes agreed, added to the roadmap!
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
jazzyjackson · 3 days ago
Have you thought about ways to let the agent select a portion of the page to read into context instead of just pumping in the entire markup or inner text?

I had good luck letting Claude use an xml parser to get a tree of the file, and then write xpath selections to grab what it needed

theredsix · 3 days ago
hmm, like adding an optional css selector for targeting?
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
jlu · 3 days ago
Have you considered removing all headless traits so that agent wont be easily detected, just like what browserbase did here?

https://www.browserbase.com/blog/chromium-fork-for-ai-automa...

theredsix · 3 days ago
It runs in headful mode and all control signals are passed in as system events so it bypasses the problems browserbase identified.
theredsix commented on Show HN: Open-source browser for AI agents   github.com/theredsix/agen... · Posted by u/theredsix
theredsix · 3 days ago
I've consolidated most of the changes in chrome/browser/abp and used shim's for the other modifications so rebase is light and handleable by Claude. I'd love to get this upstreamed. An intro to the chromium maintenance team would be greatly appreciated!

u/theredsix

KarmaCake day128March 10, 2017View Original