DanyWin (u/DanyWin) - Readit News

DanyWin commented on LaVague: Open-source Large Action Model to automate Selenium browsing github.com/lavague-ai/LaV... · Posted by u/DanyWin

pawnty · a year ago

A benchmark is helpful to show the success rate.

DanyWin · a year ago

Yes, we are working on that! We are preparing to release a feature for people to enable telemetry to contribute to a decentralized and open dataset to train and evaluate models for Selenium code

DanyWin commented on LaVague: Open-source Large Action Model to automate Selenium browsing github.com/lavague-ai/LaV... · Posted by u/DanyWin

atonse · a year ago

Right so in general I can see this in use by development teams itself cuz we don't want to sit there and manually write tests.

I'd love to tell it to just log in to my own website, click on certain pieces of functionality and repeat that. Especially with more casual day to day tasks.

Heck, we could even auto-generate tests from a bug report (where the steps to reproduce are written in plain english by non-technical testers).

That means less time for a dev to actually reproduce those steps, right?

DanyWin · a year ago

Exactly! In the future, testers could just write tests in natural language.

Every time we detect, for instance with a vision model, that the interface changed, we ask the Large Action Model to recompute the appropriate code and have it be executed.

Regarding generating tests from bug report totally possible! For now we focus on having a good mapping from low level instructions ("click on X") -> code, but once we solve that, we can have another AI take bug reports -> low level instructions, and use the previously trained LLM!

Really like your use case and would love to chat more about it if you are open. Could you come on our Discord and ping me? https://discord.gg/SDxn9KpqX9

DanyWin commented on LaVague: Open-source Large Action Model to automate Selenium browsing github.com/lavague-ai/LaV... · Posted by u/DanyWin

pants2 · a year ago

Similar example, Amazon disabled the ability to download your order history, leading to angry customers complaining[1] that they now have to click through item-by-item to get all of their orders for taxes or spend tracking. There are independently developed extensions[2] that do automated scraping, but they have to be actively maintained for changes in the site. A tool like LaVague would save a lot of headache for this and similar tasks.

1. https://www.amazonforum.com/s/question/0D56Q0000BMJvWOSQ1/do...

2. https://chromewebstore.google.com/detail/amazon-order-histor...

DanyWin · a year ago

Very interesting indeed!

We are thinking of developing an extension that would connect the browser to LaVague so that actions can be sent to the extension and be executed locally, thus bypassing their barriers

DanyWin commented on LaVague: Open-source Large Action Model to automate Selenium browsing github.com/lavague-ai/LaV... · Posted by u/DanyWin

valine · a year ago

This is cool. Was looking for model weights, but it seems like maybe it will work with a variety of different models. This is like a RAG/agent app built on top of your typical llama. Am I reading that right?

DanyWin · a year ago

You are exactly right! As I wanted to have a solution that works with many LLMs out of the box, I focused on chain of thoughts and few shot learnings.

Lots of paper show that fine-tuning only helps with steerability and form (https://arxiv.org/abs/2402.05119), therefore I thought it would be sufficient to provide just the right examples and it did work!

We do intend to create a decentralized dataset to further train models and have maybe a 2b or 7b model working well

DanyWin commented on LaVague: Open-source Large Action Model to automate Selenium browsing github.com/lavague-ai/LaV... · Posted by u/DanyWin

shadowgovt · a year ago

This has the potential to be a step towards the missing scripting language for graphical interfaces, which is great.

DanyWin · a year ago

Thanks! Funny thing, we did not use Vision models but text only with the HTML of the current page. However, we intend to add it to boost performance