The freeze-then-capture approach is interesting. Curious how it handles pages with aggressive anti-bot detection that fingerprints headless Chromium forks — that's the other failure mode I keep hitting.
Very cool! Sometimes when I try to debug things with chrome dev tools MCP, Claude would click something and too many things happen then it kind of comes to the wrong conclusions about the state of things, so sounds like this should give it a more accurate slice of time / snapshot of things.
I had good luck letting Claude use an xml parser to get a tree of the file, and then write xpath selections to grab what it needed
https://www.browserbase.com/blog/chromium-fork-for-ai-automa...
But here's my thought: you're solving the "stale state" problem by making the browser deterministic. Real websites aren't deterministic. WebSOcket pushes, long-polling, background fetches, animations that don't finish — freezing execution doesn't pause the server. The moment you unfreeze, the world may have moved.
90.5% on Mind2Web is great. But Mind2Web tasks are mostly "fill a form, click submit." The brutal failures happen on SPAs with optimistic UI updates, where the DOM says "saved" but the network request hasn't finished. Does ABP handle that case, or does the freeze just delay the confusion?
Genuine question — not trying to tear this down. The architecture is smart. I just wonder if "make the browser simpler for the agent", eventually hit s a wall where you need to make the agent smarter about async instead.
For async, lots of people smarter than me working on the smarter agent problem. Though there's a latency floor with inference due to prompt processing, and output generation. Without tools like ABP, the LLM is always aiming at a moving target.