suchintan (u/suchintan)

suchintan commented on Web Bench: a new way to compare AI browser agents blog.skyvern.com/web-benc... · Posted by u/suchintan

vasusen · 3 months ago

Thank you so much for creating this folks! A browser navigation agent is key part of our AI QA setup at Donobu (https://donobu.com/). We found the WebVoyager benchmarks severely lacking for complex e2e test cases like logged-in dashboards, onboarding forms, etc.

While the extraction/2fa flows aren't super relevant to us, this saves us time from building our own set of benchmarks. Really appreciate it and hope we can contribute to make this a really large set.

suchintan · 3 months ago

That would be amazing!!

suchintan commented on Web Bench: a new way to compare AI browser agents blog.skyvern.com/web-benc... · Posted by u/suchintan

gitmagic · 3 months ago

Would love to see how Nelly [0] performs on this benchmark.

[0] https://nelly.is

suchintan · 3 months ago

Very cool. The benchmark can be found here if you want to take a look at it: https://github.com/Halluminate/WebBench

suchintan commented on Web Bench: a new way to compare AI browser agents blog.skyvern.com/web-benc... · Posted by u/suchintan

neveroddoreven · 3 months ago

I had no idea WebVoyager only spanned 15 websites lol... the 452 figure you have still seems a little low though - do you have plans to expand it? It seems like you'd want as many sites as possible to improve the real-world accuracy of agents due to the long tail nature of website traffic

suchintan · 3 months ago

We definitely plan to expand it. I want to get to ~10,000 for a reasonable benchmark.

15 blew my mind -- it's too easy to overfit that dataset

suchintan commented on Show HN: MCP Server to let agents control the browser github.com/Skyvern-AI/sky... · Posted by u/suchintan

neveroddoreven · 5 months ago

I used to doubt the usefulness of most mcp servers but with something as powerful as general-purpose browser use this makes a lot of sense. I would rather use something more open like this than a completely proprietary browser agent like chatgpt's. Great work!

suchintan · 5 months ago

Definitely. It gives you a lot more control on which models you want powering these browser agents, which is the important part

suchintan commented on Show HN: MCP Server to let agents control the browser github.com/Skyvern-AI/sky... · Posted by u/suchintan

suchintan · 5 months ago

Hey HN, we were playing around with MCPs over the weekend and thought it would be cool to build an MCP that lets Claude / Cursor / Windsurf control your browser: https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

Just for context, we’re building Skyvern, an open source AI Agent that can control and interact with browsers using prompts, similar to OpenAI’s Operator.

The MCP Server can:

- This allows Claude to navigate to docs websites / stack overflow and look up information like the top posts on hackernews - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

- This allows Cursor to apply for jobs / fill out contact forms / login + download files / etc - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

- Connect Windsruf to take over your chrome while running Skyvern in “local” mode - https://github.com/Skyvern-AI/skyvern/tree/main/integrations...

We built this mostly for fun, but can see this being integrated into AI agents to give them custom access to browsers and execute complex tasks like booking appointments, downloading your electricity statements, looking up freight shipment information, etc

suchintan commented on Show HN: Vigilant - Blazing fast dev-friendly logs vigilant.run/home... · Posted by u/benshumaker

izakfr · 7 months ago

CTO of Vigilant here. Does anyone have ideas on how to do versioning for an SDK that you change often? Would putting new features in a "beta" release version make sense?

suchintan · 7 months ago

I've seen people use semantic versioning to version APIs and SDKs

https://semver.org/

We at Skyvern are still doing patch versions only