Roritharr (u/Roritharr)

Roritharr commented on Qodo CLI agent scores 71.2% on SWE-bench Verified qodo.ai/blog/qodo-command... · Posted by u/bobismyuncle

bluelightning2k · 22 days ago

Just looked them up. Their pricing is around buying "coins" with no transparency as to what that gets. Hard pass

Roritharr · 22 days ago

You realize that you can self-host their stuff? https://github.com/smallcloudai/refact

Roritharr commented on Qodo CLI agent scores 71.2% on SWE-bench Verified qodo.ai/blog/qodo-command... · Posted by u/bobismyuncle

gronky_ · 22 days ago

I’ve been running a bunch of coding agents on benchmarks recently as part of consulting, and this is actually much more impressive than it seems at first glance.

71.2% puts it at 5th, which is 4 points below the leader (four points is a lot) and just over 1% lower than Anthropic’s own submission for Claude Sonnet 4 - the same model these guys are running.

But the top rated submissions aren’t running production products. They generally have extensive scaffolding or harnesses that were built *specifically for SWE bench*, which kind of defeats the whole purpose of the benchmark.

Take for example Refact which is at #2 with 74.4%, they built a 2k lines of code framework around their agent specifically for SWE bench (https://github.com/smallcloudai/refact-bench/). It’s pretty elaborate, orchestrating multiple agents, with a debug agent that kicks in if the main agent fails. The debug agent analyzes the failure and gives insights to the main agent which tries again, so it’s effectively multiple attempts per problem.

If the results can be reproduced “out-of-the-box” with their coding agent like they claim, it puts it up there as one of the top 2-3 CLI agents available right now.

Roritharr · 22 days ago

Finally someone mentions Refact, I was in contact with the team, rooting for them really.

Roritharr commented on Claude Code SDK docs.anthropic.com/en/doc... · Posted by u/sync

jdmoreira · 3 months ago

Not sure about that golden end state. Mine would be being in a room surround by screens with AI agents coding, designing, testing, etc. I would be there in the center giving guidance, direction, applying taste, etc… All conversational, wouldn’t need to touch the keyboard 99% of the time.

That's what I want and look forward one day

Roritharr · 3 months ago

Is this a me thing, or a millenial thing?

I hate using voice for anything. I hate getting voice messages, I hate creating them. I get cold sweats just thinking about having to direct 10 AI Agents via voice. Just give me a keyboard and a bunch of screens, thanks.

Roritharr commented on The Gang Has a Mid-Life Crisis chris-martin.org/2025/the... · Posted by u/dralley

K0balt · 4 months ago

This is a sort of perplexing subject for me. I grew up pretty poor. We had a well, but not running water. We flushed with a bucket, bathed out of a trash can-cum-water barrel. We subsistence hunted. We had vehicles that mostly ran, most of the time.

Yet I can see that I was , in fact, born into privilege.

Not a privilege of money, but a privilege of priority, skills, and acceptance of risk.

My parents prioritized one single thing above all others. Land. They bought land. Remote land, useless land, land wherever it was cheap.

They could have fixed the car, but instead bought an acre of land. We would go 100 miles from the nearest town to eke out a parcel of land in some Godforsaken place I haven’t been to since.

Because of that, and the skills I learned because I had to do everything myself, I have never had to pay rent. Because I knew how to live without luxury, I built a cabin when I was 16 on my parent’s land with salvaged lumber and fixtures and wire and things I got from demolishing houses. I raised three children in various iterations of that eventually 600 square foot house.

By that time I was successful in infotech, so we bought and rebuilt (ourselves) a 63 foot steel schooner and finished raising our children at many ports in the world, so that they would grow up with the same privilege of mind, but with broader horizons.

But I never forgot land. Land, not a house, land . Land is the key. Just a couple hundred square meters is fine.

You can still do exactly what I did today. You can buy land cheaply in many places in the world, including the USA. I just bought a half acre in Montana for $1200, with road access. (I sometimes buy cheap land sight unseen halfway across the world when drunk and bored at 3am, the results are kinda hit and miss, but it makes for a good excuse to travel to see what happens) On eBay there are many deals owner financed with nominal or zero down, with payments from 50 to a few hundred dollars a month.

You can still tear down old structures for people and get building supplies. You can get furniture and appliances curbside or on Craigslist, etc. I don’t need to, but I sometimes still do.

Every opportunity I took advantage of is still practical today. You can still buy land on fast food wages, you just won’t be able to live near a big city while you do it. That also was impossible in my youth. The sacrifices were substantial, the discomfort at times severe.

Nothing has changed except the expectations that people have about life and what they can or cannot do.

I was born into privilege for sure, but it was a privilege of a culture of independence and a deep understanding of the value of owning outright a place to stand.

Except those born into poverty in a truly hopeless place in the world, we suffer mostly from our attitudes and lack of knowledge, and belief in our ability to do reasonable things that other people don’t believe we can do, because they are not willing to.

Roritharr · 4 months ago

This is gut-wrenching to read from Germany.

I grew up with a mentality of "you can't do that, there's a rule against that" and had to slowly break out of it as much as I could. Just knowing that there's people like you out there makes me happy. I applaud your freedom.

Roritharr commented on Meta AI App built with Llama 4 about.fb.com/news/2025/04... · Posted by u/friggeri

Roritharr · 4 months ago

Interesting that they tie the glasses so into this new app, while all AI functionality of the glasses still being disabled in the EU.

Roritharr commented on Kagi for Kids help.kagi.com/kagi/plans/... · Posted by u/ryanjamurphy

Roritharr · 5 months ago

What Kagi or anyone could work on, is an actually working version of YouTube Kids.

I literally Pi-Hole Blocked all of YouTube after my son started reading the Bible after a Minecraft Influencer started preaching throughout most of his videos to the point my son became a bit too much interested in the topic.

Not that I'm a rabid atheist or would deny my child such a thing, but if THAT can enter my 8yr olds brain via his short allowed time where he can browse by himself, i'm worried what else is coming his way through it.

I'd love to give him access to valuable videos between rules I describe by natural language and can test myself, but nothing like this exists.

Roritharr commented on I use Cursor daily - here's how I avoid the garbage parts nickcraux.com/blog/cursor... · Posted by u/striat

laborcontract · 6 months ago

If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.

Roritharr · 6 months ago

Good callout, will try! I haven't considered switching tools, it's mostly convenience of just continuing, instead of stopping mid-way through and switch out the tools. But also I only code intermittently, a couple of days a week at most these days, because it's only part of what I do, so I can get to experiment with new tooling much less than i'd like.

Roritharr commented on I use Cursor daily - here's how I avoid the garbage parts nickcraux.com/blog/cursor... · Posted by u/striat

laborcontract · 6 months ago

Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.

Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.

On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.

Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.

Roritharr · 6 months ago

This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.

Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.

Roritharr commented on Why is it so hard to buy things that work well? (2022) danluu.com/nothing-works/... · Posted by u/janandonly

tmountain · 9 months ago

It makes me really appreciate tools that DO work. Things like: the Linux kernel, Vim, PostgreSQL, the Golang compiler, etc. Interestingly, the aforementioned tools come from different ecosystems, and levels of financial backing, but all of them have been reliable tools for me for many years, all are complex, and yes... they all have bugs, but of acceptable severity and manageability.

Roritharr · 9 months ago

For me the most interesting case is HeidiSQL. I find it easily the most useful SQL GUI client, but it crashes pretty frequently, but not frequently enough for me to stop using it over the alternatives.

I often wondered how to strike the balance right on these things, since apparently all options can lead to success.

Roritharr commented on Show HN: LlamaPReview – AI GitHub PR reviewer that learns your codebase github.com/marketplace/ll... · Posted by u/Jet_Xu

Roritharr · 10 months ago

Where's the AI Running? Where are you sending the code? Are you keeping some of it?

I hate to be the compliance guy, but even from a startup perspective you'd at least want to mention what you promise to do here.