Readit News logoReadit News
suchintan commented on Web Bench: a new way to compare AI browser agents   blog.skyvern.com/web-benc... · Posted by u/suchintan
vasusen · 3 months ago
Thank you so much for creating this folks! A browser navigation agent is key part of our AI QA setup at Donobu (https://donobu.com/). We found the WebVoyager benchmarks severely lacking for complex e2e test cases like logged-in dashboards, onboarding forms, etc.

While the extraction/2fa flows aren't super relevant to us, this saves us time from building our own set of benchmarks. Really appreciate it and hope we can contribute to make this a really large set.

suchintan · 3 months ago
That would be amazing!!
suchintan commented on Web Bench: a new way to compare AI browser agents   blog.skyvern.com/web-benc... · Posted by u/suchintan
gitmagic · 3 months ago
Would love to see how Nelly [0] performs on this benchmark.

[0] https://nelly.is

suchintan · 3 months ago
Very cool. The benchmark can be found here if you want to take a look at it: https://github.com/Halluminate/WebBench
suchintan commented on Web Bench: a new way to compare AI browser agents   blog.skyvern.com/web-benc... · Posted by u/suchintan
neveroddoreven · 3 months ago
I had no idea WebVoyager only spanned 15 websites lol... the 452 figure you have still seems a little low though - do you have plans to expand it? It seems like you'd want as many sites as possible to improve the real-world accuracy of agents due to the long tail nature of website traffic
suchintan · 3 months ago
We definitely plan to expand it. I want to get to ~10,000 for a reasonable benchmark.

15 blew my mind -- it's too easy to overfit that dataset

suchintan commented on Show HN: MCP Server to let agents control the browser   github.com/Skyvern-AI/sky... · Posted by u/suchintan
neveroddoreven · 5 months ago
I used to doubt the usefulness of most mcp servers but with something as powerful as general-purpose browser use this makes a lot of sense. I would rather use something more open like this than a completely proprietary browser agent like chatgpt's. Great work!
suchintan · 5 months ago
Definitely. It gives you a lot more control on which models you want powering these browser agents, which is the important part
suchintan commented on Show HN: MCP Server to let agents control the browser   github.com/Skyvern-AI/sky... · Posted by u/suchintan
suchintan · 5 months ago
Hey HN, we were playing around with MCPs over the weekend and thought it would be cool to build an MCP that lets Claude / Cursor / Windsurf control your browser: https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

Just for context, we’re building Skyvern, an open source AI Agent that can control and interact with browsers using prompts, similar to OpenAI’s Operator.

The MCP Server can:

- This allows Claude to navigate to docs websites / stack overflow and look up information like the top posts on hackernews - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

- This allows Cursor to apply for jobs / fill out contact forms / login + download files / etc - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

- Connect Windsruf to take over your chrome while running Skyvern in “local” mode - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

We built this mostly for fun, but can see this being integrated into AI agents to give them custom access to browsers and execute complex tasks like booking appointments, downloading your electricity statements, looking up freight shipment information, etc

suchintan commented on Show HN: Vigilant - Blazing fast dev-friendly logs   vigilant.run/home... · Posted by u/benshumaker
izakfr · 7 months ago
CTO of Vigilant here. Does anyone have ideas on how to do versioning for an SDK that you change often? Would putting new features in a "beta" release version make sense?
suchintan · 7 months ago
I've seen people use semantic versioning to version APIs and SDKs

https://semver.org/

We at Skyvern are still doing patch versions only

u/suchintan

KarmaCake day533July 21, 2022
About
Founder of Skyvern - YC S23

We help companies automate workflows in the web using our open source browser automation tool

Check us out here: https://github.com/Skyvern-AI/Skyvern

View Original