If anyone is interested i built for myself and open sourced parse.dev
On a one by one basis I can use VSCode github copilot to rewrite each one the way I want it.
What I want to do is iterate through all functions in the files and do each one of them.
I know we are getting there, but does anybody know how that can be done right now?
Notable instances of this strategy include Slack with Mattermost, Tableau with Metabase, and Calendly with Cal.com.
Excellent work, team. I'm optimistic about the success of this approach
For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.
Disclaimer: I work in Medical AI and co-founded the AI Health Institute (https://aihealthinstitute.org/).