Readit News logoReadit News
Posted by u/jonasnelle a year ago
Show HN: Autotab – Programmable AI browser for turning web tasks into APIs
Hey HN, we're Alexi and Jonas the co-founders of Autotab (https://autotab.com). Autotab is a chrome-based browser you can teach to do complex tasks, with a simple API for running them from your app or backend.

Here is a walkthrough of how it works: https://youtu.be/63co74JHy1k, and you can try it for free at https://autotab.com by downloading the app.

Why a dedicated editor?

The number one blocker we've found in building more flexible, agentic automations is performance quality BY FAR (https://www.langchain.com/stateofaiagents#barriers-and-chall...). For all the talk of cost, latency, and safety, the fact is most people are still just struggling to get agents to work. The keys to solving reliability are better models, yes, but also intent specification. Even humans don't zero-shot these tasks from a prompt. They need to be shown how to perform them, and then refined with question-asking + feedback over time. It is also quite difficult to formulate complete requirements on the spot from memory.

The editor makes it easy to build the specification up as you step through your workflow, while generating successful task trajectories for the model. This is the only way we've been able to get the reliability we need for production use cases.

But why build a browser?

Autotab started as a Chrome extension (with a Show HN post! https://news.ycombinator.com/item?id=37943931). As we iterated with users, we realized that we needed to focus on creating the control surface for intent specification, and that being stuck in a chrome sidepanel wasn't going to work. We also knew that we needed a level of control for the model that we couldn't get without owning the browser. In Autotab, the browser becomes a canvas on which the user and the model are taking turns showing and explaining the task.

Key features:

1. Self-healing automations that don't break when sites change

2. Dedicated authoring tool that builds memory for the model while defining steps for the automation

3. Control flows and deep configurability to keep automations on track, even when navigating complex reasoning tasks

4. Works with any website (no site-specific APIs needed)

5. Runs securely in the cloud or locally

6. Simple REST API + client libraries for Python, Node

We'd love to get any early feedback from the HN community, ideas for where you'd like the product to go, or experiences in this space. We will be in the comments for the next few hours to respond!

slfnflctd · a year ago
If I understand this correctly, it looks like the promise I saw in that 'Record Macro' button in my Excel toolbar in the 1990s might finally be coming to fruition in a wider and more capable sense! A pleasant surprise effect of the new AI situation if true.

I noticed in another comment that you said some steps can be made 'optional' (e.g. clicking through a modal). In my ancient Excel macro adventure, what I learned was that I had to tweak the heck out of the VBA code that Record button generated, which led to me just straight writing VBA for everything and eventually abandoning the Record feature entirely. I had a similar experience later on with AutoHotKey. What are the analogous aspects of Autotab to this? Also, to what extent is hand-manipulating the underlying automation possible and/or necessary to get optimal results?

jonasnelle · a year ago
Indeed! A little secret: Internally we call the skills/workflows in Autotab macros :)

Currently there is a bit of a learning curve for training Autotab to be really reliable in hard cases. We expect we’ll be able to decrease significantly in the next few months, as we get models to do more of the thinking about how to best codify a given task solution/workflow. As an intuition pump for why we expect such rapid progress: in the scenario you described you’d just have a model write the VBA code for you.

pugio · a year ago
I love the idea - owning the browser definitely seems like the right approach.

I tried it out on a workflow I've been manually piecing together and it gave me a bunch of "Error encountered, contact support" messages when doing things like clicking on a form input field, or even a button.

The more complex "Instruction" block worked correctly instead (literally things like "click the "Sign In" button), but then I ran out of the 5 minutes of free run time when trying to go through the full flow. I expect this kind of thing will be fixed soon, as it grows.

In terms of ultimate utility, what I really want is something which can export scripts that run entirely locally, but falling back to the more dynamic AI enhanced version when an error is encountered. I would want AutoTab to generate the workflow which I could then run on my own hardware in bulk.

Anyway, great work! This is definitely the best implementation I've seen of that glimpsed future of capable AI web browsing agents.

Deleted Comment

alexirobbins · a year ago
sorry you encountered that issue! what website was the form on? we'll see if we can catch the error!

curious what you mean by generating the workflow that you run on your own hardware? Is this different than running Autotab locally?

pugio · a year ago
Hah, looks like you guys found my account error via my profile email, nice! Thanks for fixing that bug. I'll try again tomorrow when the fix is pushed.

My other request is probably not in line with your business model. I get the sense that Autotab is always communicating with some server on your end, probably for the various bits of AI functionality. What I was asking for is the ability to export the actions/workflow as, say, a python script (like a Selenium script, or even better, a script which drives your browser) which performs the actions in the Autotab workflow.

I need AI understanding when creating the workflow, or healing in case of an error, but I don't always need it when just executing a prepared script. In those (non AI needed) cases, I don't really want to use up my runtime minutes just because I'm executing a previously generated workflow.

rava-dosa · a year ago
Really exciting to see this approach to automation and intent specification! We’ve been working with similar challenges at Origins AI, where we focus on deep tech solutions.

I can’t overstate how much having a robust system for breaking down tasks and iterating on them has helped us.

For one of our recent projects, we had to integrate complex workflows with third-party systems, and it was clear that reliability came down to how well we could define and refine intent over time.

I’m especially curious about your self-healing automations. That’s an area where we’ve found a lot of value using models that can adapt to subtle UI changes, but it’s always a tradeoff with latency. Would love to hear more about how you balance that in production!

Looking forward to trying Autotab and seeing how it compares with some of the internal tools we’ve built!

jonasnelle · a year ago
Agree on the tradeoff between ability to handle novel situations and speed/cost. Autotab uses a “ladder of compute” system that escalates to the minimal level of compute required to solve a given subtask. I wrote a longer comment about this on another thread
MattDaEskimo · a year ago
Very neat in theory but I'm failing to find any technical details.

Which layer is the automation happening? Inside using Dev tools? Multiple?

What is the self-healing mechanic? I'm guessing invoking an LLM to find what happened and fix it?

I guess what I'm wondering is. Is this some sort of hybrid between computer use and Dev tools usage?

jonasnelle · a year ago
Autotab is definitely a hybrid approach, because when it comes to deciding where on the page to take an action, Autotab has to be fast & cheap (humans are both of those) while also being robust to changes. The solution we use is a "ladder of compute" where Autotab uses everything from really fast heuristics and local models up to the biggest frontier models, depending on how difficult the task is.

For instance, if Autotab is trying to click the "submit" button on a sparse page that looks like previous versions of that page, that click might take a few hundred milliseconds. But if the page is very noisy, and Autotab has to scroll, and the button says "next" on it because the flow has an additional step added to it, Autotab will probably escalate to a bigger model to help it find the right answer with enough certainty to proceed.

There is a certain cutoff in that hierarchy of compute that we decided to call "self-healing" because latency is high enough that we wanted to let users know it might take a bit longer for Autotab to proceed to the next step.

thelastparadise · a year ago
So no computer use (pixel-level understanding).

That's disappointing as the devtools approach always has limitations.

Kura agents, Runner H, and scrapybara will all end up more reliable than you.

Carrok · a year ago
You say "try it for free" but your website has no pricing information at all. Is this free for just a while? Free forever? What is your monetization strategy?

Can I point it at my own LLM or am I locked into using OpenAI?

alexirobbins · a year ago
We have unlimited free editing, so you can fully try everything out and know your skill will work before we ask you to subscribe. You also get 5m of free runtime. Subscriptions start at $39/month with 300 minutes of runtime included.

Right now we do not let you BYO llm, but it's something we would love to provide an option for where possible!

Carrok · a year ago
5 minutes seems like barely enough time to complete any given task, let alone actually try it out. $40/mo for a capped plan seems steep, but maybe I'm not your target customer. Best of luck!
jonasnelle · a year ago
Good point, will add pricing information to our website ASAP, had skipped that one in the push to launch (it is only available in the app at the moment)
adamkhakhar · a year ago
This is awesome! What is your most common use case? Have you thought of competing with https://scribehow.com/ in the documentation space?
jonasnelle · a year ago
Thanks! Our most common use cases are repetitive tasks people have at work, think updating Hubspot with analytics data from an internal tool or reconciling payments between an invoicing system, a payment system and a CRM.

Haven’t done a lot with Scribe-like documentation cases. Given the pace at which this technology is developing we’re focused on making Autotab really good at the most economically valuable tasks.

_1tem · a year ago
How on earth does this help with reconciling payments? Can Autotab also recognize "this transaction belongs to this invoice" or does it just copy and paste all transaction and invoice data into a spreadsheet for manual reconciliation?
alex_c · a year ago
The functionality looks very very cool. But the privacy policy raises an eyebrow - am I overreacting?

Usage Information. To help us understand how you use our Services and to help us improve them, we automatically receive information about your interactions with our Services, like the pages or other content you view, the searches you conduct, and the dates and times of your visits.

Desktop Activity on our Services. In order to provide the Services, we need to collect recordings of your desktop activity while using our Services, which may include audio and video screen recordings, your cookies, photos, local storage, search history, advertising interactions, and keystrokes.

Information from Cookies and Other Tracking Technologies. We and our third-party partners collect information using cookies, pixel tags, SDKs, or other tracking technologies. Our third-party partners, such as analytics partners, may use these technologies to collect information about your online activities over time and across different services.

[...]

How We Disclose the Information We Collect

Affiliates.We may disclose any information we receive to any current or future affiliates for any of the purposes described in this Privacy Policy.

Vendors and Service Providers. We may disclose any information we receive to vendors and service providers retained in connection with the provision of our Services.

alexirobbins · a year ago
We work with fortune 500 companies and have HIPAA compliant offerings, so we are very sensitive to privacy and security concerns. Fundamentally the models need to operate on whatever browser tasks users ask Autotab to perform, and we need to use frontier vision models like 4o and Claude to reliably perform them (model providers are the affiliates in question). If you have specific concerns happy to answer them.
alienallys · a year ago
Your response doesn't seem to address the Privacy concerns raised. Why is the policy so broad and invasive? There's no mention of how you handle PII data collected as telemetry.
handfuloflight · a year ago
I see it's able to perform data extraction, but what if you wanted to enter in data from another system, or generated by an LLM during the workflow?
jonasnelle · a year ago
Data from external systems can be provided to Autotab in the form of CSV files or string inputs, which can be passed to the API to parametrize skills. However, in most cases, ingesting data into Autotab is easiest by just having Autotab navigate to the website where the data is present.

Autotab has a structured type system underlying the workflows, so any data processed in the course of an automation can be referenced in later steps. It's a bit like a fuzzy programming language for automation, and the model generates schemas to ensure data flows reliably through the series of steps.

For example, users often start by collecting information in one system (using an extract step as you mentioned), then cross reference it in another and then submit some data by having Autotab type it into a third system. In Autotab, you can just type @ to reference a variable, each step has access to data from previous steps.

At the end, you can get a dump of all of Autotab's data from a run as a JSON file, or turn specific arrays of data into CSV files using a table step.

grugagag · a year ago
I don’t know what your intention is but I imagine that’s how more and more are going to push LLM slop on all corners of the internet. It’ll be easy to do in massive quantities.