cosinusalpha (u/cosinusalpha)

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

the_mitsuhiko · a month ago

At this point I'm fully down the path of the agent just maintaining his own tools. I have a browser skill that continues to evolve as I use it. Beats every alternative I have tried so far.

cosinusalpha · a month ago

Do you experience any context pollution with that approach?

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

desireco42 · a month ago

I see, nice. Is there a way to run multiple sessions?

cosinusalpha · a month ago

Yes, you can create isolated environments using the "--session NAME" flag.

It isolates cookies and local storage for that specific run. Since it's a V1 release, there might be some edge cases in the session isolation - if you hit any, please open an issue!

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

gregpr07 · a month ago

Creator of Browser Use here, this is cool, really innovative approach with ARIA roles. One idea we have been playing around with a lot is just giving the LLM raw html and a really good way to traverse it - no heuristics, just BS4. Seems to work well, but much more expensive than the current prod ready [index]<div ... notation

cosinusalpha · a month ago

Thanks!

I actually tried a raw HTML when I was exploring solutions. It worked for "one-off" tasks, but I ran into major issues with replayability on modern SPAs.

In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from "Raw HTML" often breaks 10 minutes later. I found ARIA/semantics to be the only stable contract that persists across re-renders.

You mentioned the raw HTML approach is "expensive". Did you feed the full HTML into the context, or did you create a BS4 "tool" for the LLM to query the raw HTML dynamically?

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

cosinusalpha · a month ago

I actually think the CLI approach helps with those boundaries. Because webctl commands are discrete and pipeable (e.g. webctl snapshot | llm | webctl click), the "authority" is reset at every step of the pipeline. It feels easier to audit a text stream of commands than a socket connection that might be accumulating invisible context.

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

TheTaytay · a month ago

I really like this idea!

I’d like to see this other browser plugin’s API be exposed via your same CLI, so I don’t have to only control a separate browser instance. https://github.com/remorses/playwriter (I haven’t investigated enough to know how feasible it is, but as I was reading about your tool, I immediately wanted to control existing tabs from my main browser, rather than “just” a debug-driven separate browser instance.)

cosinusalpha · a month ago

Thanks! To clarify: webctl allows you to manually interact with the browser window at any time. It even returns "manual interaction" breakpoints to stdout if it detects an SSO/login wall.

But I agree, attaching to the OS "daily driver" instance specifically would be a nice addition.

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

binalpatel · a month ago

Cool to see lots of people independently come to "CLIs are all you need". I'm still not sure if it's a short-term bandaid because agents are so good at terminal use or if it's part of a longer term trend but it's definitely felt much more seamless to me then MCPs.

(my one of many contribution https://github.com/caesarnine/binsmith)

cosinusalpha · a month ago

I am also not sure if MCP will eventually be fixed to allow more control over context, or if the CLI approach really is the future for Agentic AI.

Nevertheless, I prefer the CLI for other reasons: it is built for humans and is much easier to debug.

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

desireco42 · a month ago

How are you holding session if every command is issues through cli? I assume this is essential for automation.

cosinusalpha · a month ago

A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

grigio · a month ago

is there a benchmark? there are a lot of scraping agents nowdays..

cosinusalpha · a month ago

I don't have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.

An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I'll definitely look into how to set that up.

cosinusalpha commented on Show HN: Webctl – Browser automation for agents based on CLI instead of MCP github.com/cosinusalpha/w... · Posted by u/cosinusalpha

philipbjorge · a month ago

This looks remarkably similar to https://github.com/vercel-labs/agent-browser

How is it different?

cosinusalpha · a month ago

To be honest, I hadn't seen that one yet!

The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.

Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.

Deleted Comment

u/cosinusalpha

KarmaCake day62January 13, 2026View Original