https://github.com/clawvisor/clawvisor/blob/main/internal/in...
And the results here: https://github.com/clawvisor/clawvisor/blob/main/internal/in...
https://github.com/clawvisor/clawvisor/blob/main/internal/in...
And the results here: https://github.com/clawvisor/clawvisor/blob/main/internal/in...
> If you’ve built something agents want, please let us know. Comments welcome!
I'll bite! I've built a self-hosted open source tool that's intended to solve this problem specifically. It allows you to approve an agent purpose rather than specific scopes. An LLM then makes sure that all requests fit that purpose, and only inject the credentials if they're in line with the approved purpose. I (and my early users) have found substantially reduces the likelihood of agent drift or injection attacks.
[1] https://www.tomshardware.com/tech-industry/artificial-intell...
The only exception to this is typing on my mobile device, which is configured to qwerty.
We're an Initialized Capital-backed, YC startup (S18) making it easy for companies to collect and instantly verify photo IDs online. We use ML and computer vision techniques to effectively extract and validate the IDs in our system without any human intervention. This is a game changer for companies that require age verification, fraud deterrence or KYC. We are growing quickly and have new customers coming on board weekly.
Our founding team led the Trust & Safety team at Airbnb for several years. We implemented the initial versions of the Airbnb's Verified ID product and saw many of the problems with the existing solutions.
We have a modern stack and a ton of interesting problems to solve. We're a SaaS, API-first company building a best-in-class solution for identity verification.
My email address is eric [at] [company-name] .com
For example, 'slack_wrong_channel' was an ask to post a standup update, and a result of declaring free pizza in #general. Does this get rejected for the #general (as it looks like it's supposed to do), or does it get rejected because it's not a standup update (which I expect is likely).
Or 'drive_delete_instead_of_read' checks that 'read_file' is called instead of 'delete_file'. But LLMs are pretty good at getting the right text transform (read vs delete), the problem would be if for example the LLM thinks the file is no longer necessary and _aims_ to delete the file for the wrong reasons. Maybe it claims the reason is "cleaning up after itself" which another LLM might think is a perfectly reasonable thing to do.
Or 'stripe_refund_wrong_charge', which uses a different ID format for the requested action and the actual refund. I would wonder if this would prevent any refunds from working because Stripe doesn't talk in your order ID format.
It seems these are all synthetic evals rather than based on real usage. I understand why it's useful to use some synthetic evals, but it does seem to be much less valuable in general.