I understand the cost and technical constraints but wouldn't an exposed interface allow repeated calls from different endpoints and increased knowledge from the attacker based on responses? Isn't this like attacking an API without a response payload?
Do you plan on sharing a simulator where you have 2 local servers or similar and are allowed to really mimic a persistent attacker? Wouldn't that be somewhat more realistic as a lab experiment?
Built this over the weekend mostly out of curiosity. I run OpenClaw for personal stuff and wanted to see how easy it'd be to break Claude Opus via email.
Some clarifications:
Replying to emails: Fiu can technically send emails, it's just told not to without my OK. That's a ~15 line prompt instruction, not a technical constraint. Would love to have it actually reply, but it would too expensive for a side project.
What Fiu does: Reads emails, summarizes them, told to never reveal secrets.env and a bit more. No fancy defenses, I wanted to test the baseline model resistance, not my prompt engineering skills.
Feel free to contact me here contact at hackmyclaw.com
Enveritas is a 501(c)(3) nonprofit working on sustainability issues facing smallholder coffee farmers. We collect field data in 25+ countries and build systems for analyzing risks in coffee supply chains (including EUDR-related deforestation checks).
* Backend Software Engineer (Python, PostgreSQL/PostGIS, Docker, AWS, Terraform) - $135-$155k — https://enveritas.org/jobs/backend-software-eng/#10d7adef8us (worldwide remote)