Readit News logoReadit News
Posted by u/MagMueller 6 months ago
Launch HN: Browser Use (YC W25) – open-source web agentsgithub.com/browser-use/br...
Hey HN, we’re Gregor and Magnus, the founders of browser-use (https://browser-use.com/), an easy way to connect AI agents with the browser. Our agent library is open-source (https://github.com/browser-use/browser-use) and we have what is the biggest open-source community for browser agents. And now we have a cloud offering—hence our Launch HN today!

Check out this video to see it in action: https://preview.screen.studio/share/r1h4DuAk. There are lots more demos at https://github.com/browser-use/browser-use on how we control the web with prompts.

We started coding a decade ago with Selenium bots and macros to automate tasks. Then we both moved into ML. Last November, we asked ourselves, “How hard could it be to build the interface between LLMs and the web?”

We launched on Show HN (https://news.ycombinator.com/item?id=42052432) and have since been addressing various challenges of browser automation, such as: - Automation scripts break when the website changes - Automation scripts are annoying to build - Captchas and rate limits - parsing errors and API key management - and perhaps worst of all, login screens.

People use us to fill out their forms, extract data behind login walls, or automate their CRM. Others use the xPaths browser-use clicked on and build their scripts faster, or directly rerun the actions of browser-use deterministically. We’re currently working on robust task reruns, agent memory for long tasks, parallelization for repetitive tasks, and many other sweet improvements.

One interesting aspect is that some companies now want to change their UI to be more agent-friendly. Some developers even replace ugly UIs with nice ones and use browser-use to copy data over.

Besides the open-source we have an API. We host the browser and LLMs for you and help you with handling proxy rotation, persistent sessions and allowing you to run multiple instances in parallel. We price at $30/month—significantly lower than OpenAI’s Operator.

On the open-source side, browser use remains free. You can use any LLM, from Gemini to Sonnet, Qwen, or even DeepSeek-R1. It’s licensed under MIT, giving you full freedom to customize it.

We’d love to hear from you—what automation challenges are you facing? Any thoughts, questions, experiences are welcome!

arjunchint · 6 months ago
Have you inspected or thought through the security of your open source library?

You are using debugger tools such as CDP, launching playwright without a sandbox, and guiding users to launch Chrome in debugger mode to connect to browser-use on their main browser.

The debugging tools you use have active exploits that Google doesn't fix because they are supposed to be for debugging and not for production/general use. This combined with your other two design choices let an exploit to escalate and infect their main machine.

Have you considered not using all these debugging permissions to productionize your service?

h4ny · 6 months ago
Thank you! It's constructive, helps the people who are making things while giving them the benefit of the doubt, keeps users safe, and educate those who have enough technical understanding on the topic.
rainforest · 6 months ago
Could you go into a bit more detail about this? Why is exposing devtools to the agent a problem? What's the attack vector? That the agent might do something malicious to exfil saved passwords?
arjunchint · 6 months ago
Forget the agent, browser-use's published setup instructions to use with your own Chrome profile and passwords [https://docs.browser-use.com/customize/real-browser, https://github.com/browser-use/browser-use/blob/495714e2dd38...] launches a Chrome session with Remote Debugging enabled.

These tools they are guiding users to setup and execute are "inherently insecure" [https://issues.chromium.org/issues/40056642].

So if you go to a site that can take advantage of these loopholes then your browser is likely to be compromised and could escalate from their.

jerpint · 6 months ago
If they’re launching a cloud-based service, doesn’t this effectively remove the risk of running it locally ?
arjunchint · 6 months ago
Their key offering is an open source solution that you can run on your own laptop and Chrome browser, but their approach to doing this presents a huge security risk.

They do have a cloud offering that should not have these risks but then you have to enter your passwords into their cloud browser environment, presenting a different set of risks. Their cloud offering is basically similar to SkyVerne or even a higher cost tier subscription we have at rtrvr.ai

mohsen1 · 6 months ago
Cross origin is still a problem when those settings are off
101008 · 6 months ago
This is very very important. It's completely unusable if this isn't solved. The agent could access a website that takes control of your machine.
gregpr07 · 6 months ago
how would that work? Can you control the browser without debug mode? Especially in production the browsers are anyway running on single instance docker containers so the file system is not accesible... are there exploits that can do harm from a virtual machine?
gloosx · 6 months ago
Yes, you can control the browser without debug mode, and the common way to do it is ChromeDriver[1].

[1] https://developer.chrome.com/docs/chromedriver/get-started

arjunchint · 6 months ago
Yes, I was able to figure out a secure way to control the browser with AI Agents at rtrvr.ai without using debugger permissions/tools so it is most definitely possible.

I meant by in production in the sense how you are advising your users to setup the local installation. Even if you launch browser use locally within a container but your restarting the user's Chrome in debug mode and controlling it with CDP from within the container, then the door is wide open to exploits and the container doesn't do anything?!

lostmsu · 6 months ago
Embed a WebView instead of launching browser?
cookiengineer · 6 months ago
I've been following your progress for a while now and I'm super impressed how far you've got already.

Are you working on unifying the tools that the LLM uses with the MCP / model context protocol?

As far as I understand, lots of other providers (like Bolt/Stackblitz etc) are migrating towards this. Currently, there's not many tools available in the upstream specification other than File I/O and some minor interactions for system-use - but it would be pretty awesome if tools and services (like say, a website service) could be reflected there as it would save a lot of development overhead for the "LLM bindings".

Very interesting stuff you're building!

gregpr07 · 6 months ago
hmm, I though about this a lot. But tbh I think MCP is sort of a gimmick... probably the better way is for agents just to understand the http apis directly. Maybe I'm wrong, very happy to be convinced differently. Do you think MCP server for the cloud version would be useful?
ec109685 · 6 months ago
MCP seems nicer than requiring LLM hosts execute arbitrary curl calls to endpoints since it packages a tool into a dedicated plugin that users can opt into.
nostrebored · 6 months ago
strong agree with this -- I don't understand outside of integration with Claude Desktop why to use MCP rather than a dedicated API endpoint.
rahimnathwani · 6 months ago
You can write a wrapper to use it with MCP, or use one someone else has created:

https://github.com/Saik0s/mcp-browser-use/blob/main/README.m...

agotterer · 6 months ago
I’m excited about the space and intend to keep an eye on you guys. I actually gave the opened source version of browser-use a try last week and ran into two problems:

The first, it refused to correctly load the browser tab and would get stuck in a loop trying. I was able to manually override this behavior for the purpose of prototyping.

The second, it hallucinated form input values. I provided it strict instructions on how to fill out a form and when it didn’t know what to do with an address field, it just wrote 123 Main St instead of not being able to complete the form.

The thing I really want and haven’t found in any of the browser agents I’ve tried, is a feedback loop. I don’t personally know what the final format looks like. But I want to know that what I expected to happen inside the browser, actually happened, and I want it to be verifiable. Otherwise I feel like I'm sending request into a black hole.

gregpr07 · 6 months ago
Yeah - one version had that weird bug - we fixed it already (I think, super happy to tell me how to replicate if not).

Hmm really? Maybe you could use the sensitive data api to make it more deterministic? https://docs.browser-use.com/customize/sensitive-data

How would you imagine a perfect feedback loop?

tnolet · 6 months ago
How are you different from https://www.browserbase.com/ and their Stagehand framework? [0]

[0]https://github.com/browserbase/stagehand

Deleted Comment

baal80spam · 6 months ago
From the first glance, browser-use is compatible with more models, and has (much) more github stars ;)

Coincidentally I played with it over the last weekend using Gemini model. It's quite promising!

gregpr07 · 6 months ago
Yeah, we are much bigger and work on a higher level. stagehand work step by step, we are trying to make end to end web agents.
alex1115alex · 6 months ago
Awesome job launching guys! We used Browser Use last week to order burgers from our smart glasses:

https://x.com/caydengineer/status/1889835639316807980

One thing I'm hoping for is an increase in speed. Right now, the agent is slow for complex tasks, so we're still in an era where it might be better to codify popular tasks (eg: sending a WhatsApp message) instead of handling them with browser automation. Have yall looked into Groq / Cerberus?

MagMueller · 6 months ago
One option could be for the main apps like WhatsApp to have defined custom actions, which are almost like an API to the service. I think the interplay between LLM and automation scripts will succeed here:

Agent call 1: Send WhatsApp message (to=Magnus, text=hi) Inside, you open WhatsApp and search for Magnus (without LLM)

Agent call 2: Select contact from all possible Magnus contacts Script 3: Type the message and click send

So in total, 2 calls - with Gemini, you could already achieve this in 10-15 seconds.

gregpr07 · 6 months ago
That was such a cool demo man! We are working on speed, we are already 3-4x faster than operator with gpt4o
OsrsNeedsf2P · 6 months ago
Does anyone have experience comparing this to Skyvern[0]? I originally thought the $30/month would be the killer feature, but it's only $30 worth of credits. Otherwise they both seem to have the same offering

[0] https://www.skyvern.com/

gregpr07 · 6 months ago
I think our cloud is much simpler (just one prompt and go). But it's also sort of a different service. The main differences come from the open source side - we are essentially building more of a framework for anytime to use and they are just a web app.
jackienotchan · 6 months ago
AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN.

Do you have any built-in features that address these issues?

MagMueller · 6 months ago
Yes, some hosting services have experienced a 100%-1000% increase in hosting costs.

On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.

Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.

Websites that really don't want us will come up with audio captchas and new creative methods.

Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.

erellsworth · 6 months ago
>Websites that really don't want us will come up with audio captchas and new creative methods.

Which you or other AIs will then figure a way around. You literally mention "extract data behind login walls" as one of your use cases so it sounds like you just don't give a shit about the websites you are impacting.

It's like saying, "If you really don't want me to break into your house and rifle through your stuff, you should just buy a more expensive security system."

deadfece · 6 months ago
In my experience these web agents are relatively expensive to run and are very slow. Admittedly I don’t browse HN frequently but I’d be interested to read some of these agent abuse stories, if any stand out to you. (I’ve been googling for ai agent website abuse stories and not finding anything so far)

Deleted Comment

kimoz · 6 months ago
A UI based on browser-use:

https://github.com/browser-use/web-ui