def make_pass@1_agent(agent, n):
def retry_agent(problem):
for attempt in range(n):
result = agent(problem)
if result.success:
return result
return result
return retry_agent
Building multiple attempts into your agent is stretching the rules, even if technically it’s acceptable
71.2% puts it at 5th, which is 4 points below the leader (four points is a lot) and just over 1% lower than Anthropic’s own submission for Claude Sonnet 4 - the same model these guys are running.
But the top rated submissions aren’t running production products. They generally have extensive scaffolding or harnesses that were built *specifically for SWE bench*, which kind of defeats the whole purpose of the benchmark.
Take for example Refact which is at #2 with 74.4%, they built a 2k lines of code framework around their agent specifically for SWE bench (https://github.com/smallcloudai/refact-bench/). It’s pretty elaborate, orchestrating multiple agents, with a debug agent that kicks in if the main agent fails. The debug agent analyzes the failure and gives insights to the main agent which tries again, so it’s effectively multiple attempts per problem.
If the results can be reproduced “out-of-the-box” with their coding agent like they claim, it puts it up there as one of the top 2-3 CLI agents available right now.
A good wrapper has deep domain knowledge baked into it, combined with automation and expert use of the LLM.
It maybe isn’t super innovative but it’s a bit of an art form and unlocks the utility of the underlying LLM
It’s a radical statement that effectively denies the rights of millions of people to exist and is especially problematic given the historical context of the establishment of Israel.
The statement gets thrown around so much in certain circles that it’s gotten normalized. You’ve apparently lost sight of or never stopped to think what actually means, to the point where you’re providing it as an example of an innocent statement that got you banned for no reason. Taking this statement out of radical activist circles and into the real world won’t go well.
Take some time to educate yourself and reflect on what it actually means.
def make_pass@1_agent(agent, n):
If they are running their production product as is, then of course whatever is built into the product is fine.