Readit News logoReadit News
sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
throw03172019 · 9 days ago
Thanks. In the OpenDental example, if the task is to update a different patient, is it falling back to computer use because the “search results” are in different places? I.e search for “John” there may be two results. John and Johnson.

What would happen in this case?

sgtwompwomp · 4 days ago
Yup, this is a case where you always want an agent to do that step. So in the prompt you just say “do a focused_action to select the search result with John”, and then the pathfinder agent will cache in it’s memory to delegate that step to a mini computer use agent, just for that particular task.

After the focused action is done, it’ll go right back to deterministic!

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
iamcreasy · 14 days ago
Congratulations on the launch!

How does it know when to stop and ask a human to intervene?

sgtwompwomp · 12 days ago
Our agent would have a tool to essentially bring in the human. Not built this yet, but the closest thing we do have is that our agent can declare a task as failed if it determines it can’t proceed (based on your instructions).

More on this soon! How would you imagine this would be useful?

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
throw03172019 · 15 days ago
How do you determine what action is to use the “cached” agent to reduce cost/increase speed?
sgtwompwomp · 12 days ago
So once you run a workflow (prompt) once, then that trajectory is cached.

Run the workflow again and it’ll run through that cached trajectory as best as it can, falling back to computer use if needed.

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
mattfrommars · 13 days ago
Thanks for the reply. I lost you in this part, Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home)

I assume you send screenshot to claude for nest action to take, how are you able to reduce this exact step by working deterministically? What is the is deterministic part and how you figure it out?

sgtwompwomp · 12 days ago
So what I meant is this: When you run our Cyberdesk agent the first time, it runs with the computer use agent. But then once that’s complete, we cache every exact step it took to successfully complete that task (every click, type, scroll) and then simply replay that the next time.

But during that replayed action, we do bring in smaller LLMs to just keep in check to see if anything unexpected happened (like a popup). If so, we fall back to computer use to take it home.

Does that make sense? At the end of the day, our agent compiles down to Pyautogui, with smart fallback to the agent if needed.

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
gerdesj · 17 days ago
Autoit must be a good 20 years old: https://www.autoitscript.com/site/
sgtwompwomp · 17 days ago
Unfortunately these scripting tools just are untenable when dealing with so many desktop flows that all have changing UIs and random popups. You end up having to repair all of them all the time, in fact there's a whole consulting industry out there just to do this all day.

The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.

So over time, the system just learns, and gets cheaper and faster.

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
MetaWhirledPeas · 17 days ago
Can it do assertions? This could be useful for testing old software.
sgtwompwomp · 17 days ago
Yup, a few of our clients have a need to verify something in the software, so we support an agentic step where we look at the screen and can verify whether something exists, or whatever a step was completed, etc!
sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
mattfrommars · 17 days ago
Looks great to automate workload for Windows desktop application. I'd love to understand more deeply how your application works, so the set of commands your backend send is click, scroll, screenshot. Does it send command to say type character into an input field? How is it able to pin point a text field from a screenshot? Is LLM reliable to pin point x and y to click on a field?

Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.

sgtwompwomp · 17 days ago
Thanks! And yes, so our pathfinder agents utilize Sonnet 4's precise coordinate generation capabilities. You give it a screenshot, give it a task, and it can output exact coordinates of where to click on an input field, for example.

And yes we've found the computer use models are quite reliable.

Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.

So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
rm_-rf_slash · 17 days ago
Out of curiosity, what would the minimum specs need to be in order to run this locally?

My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?

sgtwompwomp · 17 days ago
There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.

So if you come across a local model that can do that well, let us know! We're also keeping a close watch.

sgtwompwomp commented on Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps    · Posted by u/mahmoud-almadi
hermitcrab · 17 days ago
>none of the information getting sent to their LLMs ever get retained

Is it possible to verify that?

sgtwompwomp · 17 days ago
Yup! We have signed certificates that explicitly state this, with all LLM providers we use.

Dead Comment

u/sgtwompwomp

KarmaCake day8March 5, 2022View Original