codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.
I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.
https://xcancel.com/ben_burtenshaw/status/200023306951767675...That said, it's not a perfect comparison because of the Codex model mismatch between runs.
The author seems to be doing a lot of work on skills evaluation.
The majority in this country is "didn't vote". Multitudes of reasons for this.
They forgot.
They dont care.
They missed the registration deadline.
They're homeless, and no address.
They can't get proper papers, even though they are US born.
They're in prison/jail.
The candidates suck, so you dont vote.
Can't afford to take time off work.
They've been gerrymandered, so their votes are significantly degraded.
To think that the minority segment that, due to election game rules and FPTP, that a minority of the minority somehow reflects a majority? I wholly reject that.
[0] https://www.brookings.edu/wp-content/uploads/2017/01/vitalst...