And they usually try to sell you their courses/mentorships.
I don't know if I'm being suspicious because they seem fake or because they came out of nowhere and are earning in a month what I make in a year.
Step 1: Vibecode 5 trash generic apps (eg AI interior designers, gpt wrappers)
Step 2: Launch with paying 15k in google/meta ads
Step 3: Receive back ~5k in revenue
Step 4: Spam on twitter+linkedin clickbaity "heres how i reached 60K ARR in 3 milliseconds" posts to attract ex-crypto bros and hustlers
Step 5: Use your newfound following for sponsored content, courses and hopefully actual organic growth for your aforementioned trash apps
1) The funny thing about determinism is how deterministic you should be when to break, its kind of a recursive problem. agents are inherently very tough to guardrail on an action space so big like in CUA. The guys from browser use realized it as well and built workflow-use. Or you could try RL or finetuning per task but is not viable(economically or tech wise) currently.
2) As you know, It's a very client facing/customized solution space You might find this interesting, it reflects my thoughts in the space as well. Tough to scale as a fresh startup unless you really niche down on some specific workflows. https://x.com/erikdunteman/status/1923140514549043413 (he is also building in the deterministic agent space now funnily enough) 3) It actually is annoyingly expensive with Claude if you break caching, which you have to at some point if you feed in every screenshot etc. You mentioned you use multiple models (i guess uitars/omniparser?), but in the comments you said claude?
4) Ultimately the big bet in the RPA space, as again you know, is that the TAM wont shrink a lot due to more and more SAP's, ERP's etc implementing API's. Of course the big money will always be in ancient apps that wont, but then again in that space, uipath and the others have a chokehold. (and their agentic tech is actually surprisingly good when i had a look 3 months ago)
Good luck in any case! I feel like its one of those spaces that we are definitely still a touch too early, but its such a big market that there is plenty of space for a lot of people.
I find it interesting how marketers are trying to make minimal prompting a good thing, a direction to optimize. Even if i talk to a senior engineer, i'm trying to be specific as possible to avoid ambiguities etc. Pushing the models to just do what they think its best is a weird direction. There are so many subtle things/understandings of the architecture that are just in my head or a colleagues head. Meanwhile, i found that a very good workflow is asking claude code to come back with clarifying questions and then a plan, before just starting to execute.
I also see this in a lot of undergrads i work with. The top 10% is even better with LLMs, they know much more and they are more productive. But the rest have just resulted to turning in clear slop with no care. I still have not read a good solution on how to incentivize/restrict the use of LLms in both academia or at work correctly. Which i suspect is just the old reality of quality work is not desirable by the vast majority, and LLMs are just magnifying this
I coded up the demo myself and didn't anticipate how disruptive the intermittent warning messages about waiting users would become. The demo is quite resource-intensive: each session currently requires its own H100 GPU, and I'm already using a dispatcher-worker setup with 8 parallel workers. Unfortunately, demand exceeded my setup, causing significant lag and I had to limit sessions to 60 more seconds when others are waiting. Additionally, the underlying diffusion model itself is slow to run, resulting in a frame rate typically below 2 fps, further compounded by network bottlenecks.
As for model capabilities, NeuralOS is indeed quite limited at this point (as acknowledged in my paper abstract). That's why the demo interactions shown in my tweet were minimal (opening Firefox, typing a URL).
Overall, this is meant as a proof-of-concept demonstrating the potential of generative, neural-network-powered GUIs. It's fully open-source, and I hope others can help improve it going forward!
Thanks again for the honest feedback.
What they did was:
1) Prompt LLM for a generic description of potential buffer overflows in strcopy() and a generic demonstration code for a buffer overflow. (With no connection to curl or even OpenSSL at all)
2) Present some stack traces and grep results that show usage of strcopy() in curl and OpenSSL.
3) Simply claim that the strcopy() usages from 2) somehow indicate a buffer overflow, with no additional evidence.
4) When called out, just pretend that the demonstrator code from 1) were the evidence, even though it's obvious that it's just a textbook example and doesn't call any code from curl.
It's not that they found some potentially dangerous code in curl and didn't go all the way to prove an overflow, which could have at least some value.
The entire thing is just bullshit made to look like a vulnerability report. There is nothing behind it at all.
Edit: Oh, cherry on top: The demonstrator doesn't even use strcopy() - nor any other kind of buffer overflow. It tries to construct some shellcode in a buffer, then gives up and literally calls execve("/bin/sh")...
The worst part is that once they are asked for clarifications by the poor maintainers, they go on offense and become aggressive. Like imagine the nerve of some people, to use LLMs to try to gaslight an actual expert that they made a mistake, and then act annoyed/angry when the expert asks normal questions
Deleted Comment