How does it know when to stop and ask a human to intervene?
More on this soon! How would you imagine this would be useful?
How does it know when to stop and ask a human to intervene?
More on this soon! How would you imagine this would be useful?
Run the workflow again and it’ll run through that cached trajectory as best as it can, falling back to computer use if needed.
I assume you send screenshot to claude for nest action to take, how are you able to reduce this exact step by working deterministically? What is the is deterministic part and how you figure it out?
But during that replayed action, we do bring in smaller LLMs to just keep in check to see if anything unexpected happened (like a popup). If so, we fall back to computer use to take it home.
Does that make sense? At the end of the day, our agent compiles down to Pyautogui, with smart fallback to the agent if needed.
The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.
So over time, the system just learns, and gets cheaper and faster.
Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.
And yes we've found the computer use models are quite reliable.
Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.
So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).
My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?
So if you come across a local model that can do that well, let us know! We're also keeping a close watch.
Is it possible to verify that?
Dead Comment
What would happen in this case?
After the focused action is done, it’ll go right back to deterministic!