Readit News logoReadit News
floodfx commented on Many SWE-bench-Passing PRs would not be merged   metr.org/notes/2026-03-10... · Posted by u/mustaphah
bisonbear · 3 days ago
I've been working on building out "evals for your repo" based on the theory that commonly used benchmarks like SWE-bench are broken as they are not testing the right / valuable things, and are baked into the training data (see OpenAI's research on this here https://openai.com/index/why-we-no-longer-evaluate-swe-bench...)

Interestingly, I had a similar finding where, on the 3 open-source repos I ran evals on, the models (5.1-codex-mini, 5.3-codex, 5.4) all had relatively similar test scores, but when looking at other metrics, such as code quality, or equivalence to the original PR the task was based on, they had massive differences. posted results here if anyone is curious https://www.stet.sh/leaderboard

floodfx · 3 days ago
Working on that too. Lmk if you’re up for a chat?
floodfx commented on A67z   a67z.com/... · Posted by u/dvrp
floodfx · 4 months ago
I salute this artistry
floodfx commented on Can you save on LLM tokens using images instead of text?   pagewatch.ai/blog/post/ll... · Posted by u/lpellis
floodfx · 4 months ago
Why are completion tokens more with image prompts yet the text output was about the same?
floodfx commented on Go Primitive in Java, or Go in a Box   donraab.medium.com/go-pri... · Posted by u/ingve
nchmy · 4 months ago
Am I wrong to have expected this to be about using Golang in Java, in some way?
floodfx · 4 months ago
Had same expectation and read through half the article before realizing it was not Golang related.
floodfx commented on Tour de France confronts a new threat: Are cyclists using tiny motors?   washingtonpost.com/world/... · Posted by u/bookofjoe
LeifCarrotson · 7 months ago
The difference between the top 0.0000001% of humanity and second place is very, very small. Fractions of a watt. Adding just 10W would be game changing, and modern lipos and brushless motors add far, far more power than their weight penalty subtracts.
floodfx · 7 months ago
10W for a sustained time perhaps but these are looong climbs. Col de la Loze is 26.4km with an average gradient of 6.5%.
floodfx commented on Tour de France confronts a new threat: Are cyclists using tiny motors?   washingtonpost.com/world/... · Posted by u/bookofjoe
ahi · 7 months ago
I replied directly to OP, but applies here as well. Cycling is far more specialized than other sports so the pay off for doping is greater.
floodfx · 7 months ago
Why is pay off greater in cycling than other sports? Salary of the top riders? Compared to say NBA players, pro cyclist make relatively little. Tadej Pogacar (best and top paid cyclist) makes about $8M (euros) in salary per year. Steph Curry (highest paid) NBA player makes $55M (dollars) in salary per year.
floodfx commented on Tour de France confronts a new threat: Are cyclists using tiny motors?   washingtonpost.com/world/... · Posted by u/bookofjoe
floodfx · 7 months ago
Bikers and their teams are known for removing as much weight as possible from their bikes. Would love to see the math for weight/power/time ratio for a motor like this. Would it be worth it considering you'd have to expend additional watts lugging it around all stage? My guess is probably not. Especially on a mountain stage which is where the tour is really won or lost.
floodfx commented on What if we made advertising illegal?   simone.org/advertising/... · Posted by u/smnrg
floodfx · a year ago
I think a better thing to do would be to outlaw algorithmic feeds where monetization is via advertising. If subscription based that is fine. The incentive for sub based monetization is to keep you long enough to continue subscribing. For ads it is to keep you on as long as possible which trends towards divisive / fear / anger inducing content.
floodfx commented on Show HN: Learn where countries are on the world map with Spaced Repetition   map.koljapluemer.com... · Posted by u/blackbrokkoli
floodfx · a year ago
Love this! Played Worldle (https://worldle.teuteuf.fr/) for a while and this would have been helpful.

Great application of spaced repetition beyond cards.

floodfx commented on The housing theory of everything (2021)   worksinprogress.co/issue/... · Posted by u/lifeisstillgood
tptacek · a year ago
I think YIMBYs really like to cast NIMBYs as their evil adversaries, but the problem is systemic. Any policy change, be it "what can be built on this lot", or "what social services do we fund", or, in particular for my muni, "how do we deal with leaf collection in Autumn" will generate three cohorts of people:

(i) People who don't like the change

(ii) People who don't care about the change (most people)

(iii) People who do like the change

People who don't like the change (i), regardless of the amplitude of their dislike, will turn out and give public comment and put up yard signs.

People who like the change (iii) will turn out and give public comment only if they are weirdos like me, with off-the-charts amplitude for their feelings.

The net result is that the only public opinion that is legible to staff and electeds is opposite. Again: regardless of what the change is.

floodfx · a year ago
Insightful!

Makes me think a bit about how negative content engages more people. Is this the same with people who don't like change? Not liking change activates people more than people who do like change?

u/floodfx

KarmaCake day264December 17, 2008
About
https://twitter.com/floodfx

donnie at floodx dot com

View Original