As others have noted, the prompt/eval is also garbage. It’s measuring a non-representative sub-task with a weird prompt that isn’t how you’d use agents in, say, Claude Code. (See the METR evals if you want a solid eval giving evidence that they are getting better at longer-horizon dev tasks.)
This is a recurring fallacy with AI that needs a name. “AI is dumber than humans on some sub-task, therefore it must be dumb”. The correct way of using these tools is to understand the contours of their jagged intelligence and carefully buttress the weak spots, to enable the super-human areas to shine.
Make of that what you will…
Which do you think is easier, insurgency or leaving twitter?
I didn't just protest against this.
I voted, ran mutual aid networks, donated, and canvassed for progressive candidates. And when that failed I canvassed for moderates that I didn't want. I went to churches and asked why they were pushing this garbage. My aunt and uncle were told they weren't welcome in their local church unless they voted for trump and disowned all family members that didn't. The church has a trump sign right on the front lawn. When I went to ask about that being illegal they looked at me like I had 3 heads. My aunt and uncle had to move cities.
My friend at the the no kings protest had police fire a beanbag into his eyesocket. He was in his second story apartment watching the protest from above, I was right next to him. We weren't even on the street and they fired right into his eye. He was forced to take a plea deal even though the cops had body cameras. The footage was never handed over, despite a FOIA request. He got no money. He already didn't have health insurance.
Its comes off as incredibly hollow when I hear europeans say "nobody tried to stop this" or "Its amazing how little resistance they've faced". We had millions of people protesting without a peep in the news.
I really don't want the hear talking about how we should use the 2nd amendment from people that can't even quit twitter.
Take care