Obviously it doesn't always end badly. But we get a massively skewed view from survivor bias.
My life turned out pretty damn well once I got a plain ordinary job working for someone else. But I don't kid myself: when it comes to starting a startup, I did fail. The main lesson I learned was that I was always going to.
I hear this a lot and I think it is good advice because the only person who should actually start a startup is the one who sees this but still does it.
Should we not mimic our biology as closely as possible rather than trying to model how we __think__ it works (i.e. chain of thought, etc.). This is how neural networks got started, right? Recreate something nature has taken millions of years developing and see what happens. This stuff is so interesting.
This is a lot more than an agent able to use your computer as a tool (and understanding how to do that) - it's basically an autonomous reasoning agent that you can give a goal to, and it will then use reasoning, as well as it's access to your computer, to achieve that goal.
Take a look at their demo of using this for coding.
https://www.youtube.com/watch?v=vH2f7cjXjKI
This seems to be an OpenAI GPT-o1 killer - it may be using an agent to do reasoning (still not clear exactly what is under the hood) as opposed to GPT-o1 supposedly being a model (but still basically a loop around an LLM), but the reasoning it is able to achieve in pursuit of a real world goal is very impressive. It'd be mind boggling if we hadn't had the last few years to get used to this escalation of capabilities.
It's also interesting to consider this from POV of Anthropic's focus on AI safety. On their web site they have a bunch of advice on how to stay safe by sandboxing, limiting what it has access to, etc, but at the end of the day this is a very capable AI able to use your computer and browser to do whatever it deems necessary to achieve a requested goal. How far are we from paperclip optimization, or at least autonomous AI hacking ?
Give examples of how the LLM should respond. Always give it a default response as well (e.g. "If the user response does not fall into any of these categories, say x").
> I can manually add validation on the response but then it breaks streaming and hence is visibly slower in response.
I've had this exact issue (streaming + JSON). Here's how I approached it: 1. Instruct the LLM to return the key "test" in its response. 2. Make the streaming call. 3. Build your JSON response as a string as you get chunks from the stream. 4. Once you detect "key" in that string, start sending all subsequent chunks wherever you need. 5. Once you get the end quotation, end the stream.