NitpickLawyer (u/NitpickLawyer)

NitpickLawyer commented on How to build a coding agent ghuntley.com/agent/... · Posted by u/ghuntley

Anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget

now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase

lets see how many tokens it consumes per bug fix or feature addition.

NitpickLawyer · 26 minutes ago

There's "swe re-bench", a benchmark that tracks model release dates, and you can see how the model did for "real-world" bugs that got submitted on github after the model was released. (obviously works best for open models).

There are a few models that solve 30-50% of (new) tasks pulled from real-wolrd repos. So ... yeah.

NitpickLawyer commented on How to build a coding agent ghuntley.com/agent/... · Posted by u/ghuntley

cryptoz · an hour ago

Oh that's wild, I did suspect that but didn't know it outright. Mind-blowing Google would release that kind of thing, I had wondered why it sucked so much haha. Okay so what is a good representation of the current state of coding agents? Which one should I try that does a better job at code modifications?

NitpickLawyer · 29 minutes ago

Claude code is the strongest atm, but roocode or cline (vscode extensions) can also work well. Roo with gpt5-mini (so cheap, pretty fast) does diff based edits w/ good coordination over a task, and finishes most tasks that I tried. It even calls them "surgical diffs" :D

NitpickLawyer commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

refulgentis · 14 hours ago

It's been such a mind-boggling decline in intellect, combined with really odd and intense conspiratorial behavior around crypto, that I went into a bit a few months ago.

My weak, uncited, understanding from then they're poorly positioned, i.e in our set they're still the guys who write you a big check for software, but in the VC set they're a joke: i.e. they misunderstood carpet bombing investment as something that scales, and went all in on way too many crypto firm. Now, they have embarrassed themselves with a ton of assets that need to get marked down, it's clearly behind the other bigs, but there's no forcing function to do markdowns.

So we get primal screams about politics and LLM-generated articles about how a $9K video card is the perfect blend between price and performance.

There's other comments effusively praising them on their unique technical expertise. I maintain a llama.cpp client on every platform you can think of. Nothing in this article makes any sense. If you're training, you wouldn't do it on only 4 $9K GPUs that you own. If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.

NitpickLawyer · 13 hours ago

> If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.

I was with you up till here. Come on! CPU inferencing is not it, even macs struggle with bigger models, longer contexts (esp. visible when agentic stuff gets > 32k tokens).

The PRO6000 is the first gpu that actually makes sense to own from their "workstation" series.

NitpickLawyer commented on Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing arxiv.org/abs/2508.12631... · Posted by u/omarsar

datadrivenangel · 2 days ago

Paper and repo do not mention routing latency, which I think is a concern.

Also the paper has some pie chart crimes on page 6.

NitpickLawyer · 2 days ago

Just from a brief look at the repo they seem to be doing semantic embeddings w/ Qwen3-Embedding-8B, which should be in the high thousands pp t/s on recent hardware. With a sufficiently large dataset after using it for a while you could probably fine-tune a smaller model as well (4B and 0.6B available from the same family)

NitpickLawyer commented on Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing arxiv.org/abs/2508.12631... · Posted by u/omarsar

bachittle · 2 days ago

I’m fascinated by this new paradigm. We’ve more or less perfected Mixture-of-Experts inside a single model, where routing happens between subnetworks. What GPT-5 auto (and this paper) are doing is a step further: “LLM routing” across multiple distinct models. It’s still rough right now, but it feels inevitable that this will get much better over time.

NitpickLawyer · 2 days ago

> It’s still rough right now, but it feels inevitable that this will get much better over time.

Yeah, the signals they get will improve things over time. You can do a lot of heavy lifting with embedding models nowadays, get "satisfaction" signals from chats, and adjust your router based on those. It will be weird at first, some people will complain, but at the end of the day, you don't need imo-gold levels of thinking to write a fitness plan that most likely the user won't even follow :)

Signal gathering is likely the driver of most of the subsidised model offerings we see today.

NitpickLawyer commented on What is going on right now? catskull.net/what-the-hel... · Posted by u/todsacerdoti

rickreynoldssf · 2 days ago

I think a lot of the impressions of AI generating slop is a case of garbage in/garbage out. You need to learn HOW to ask for things. Just asking "write code to do X" is wrong in most cases. You have to provide some specifications and expectations just like working with a junior engineer. You also can't ask "write me a module that does X". You need to design the module yourself and maybe ask AI for help with each specific individual endpoint.

These juniors you're complaining about are going to get better in making these requests of AI and blow right past all the seniors who yell at clouds running AI.

NitpickLawyer · 2 days ago

> I think a lot of the impressions of AI generating slop is a case of garbage in/garbage out.

I've been coding for 25 years and what I feel reading posts & comments like in this thread is what I felt in the first few days of that black-blue/white-gold dress thing. I legitimately felt like half the people were trolling.

It's the same with LLM assisted coding. I can't possibly be getting such good results when all the rest are getting garbage, right? Impostor syndrome? Are they trolling?

But yeah, I agree fully with you. You need to actively try everything yourself, and this is what I recommend to my colleagues and friends. Try it out. See what works and what doesn't. Focus on what works, and put it in markdown files. Avoid what doesn't work today, but be ready because tomorrow it might work. Use flows. Use plan / act accordingly. Use the correct tools (context7 is a big one). Use search before planning. Search, write it to md files, add it in the repo. READ the plans carefully. Edit before you start a task. Edit, edit edit. Use git trees, use tools that you'd be using anyway in your pipelines. Pay attention to the output. Don't argue, go back to step1, plan better. See what works for context, what doesn't work. Add things, remove things. Have examples ready. Use examples properly. There's sooo much to learn here.

NitpickLawyer commented on Analysis of the GFW's Unconditional Port 443 Block on August 20, 2025 github.com/net4people/bbs... · Posted by u/Welteneroberer

NitpickLawyer · 2 days ago

> It appears that the SYN packet triggered three forged RST+ACK packets, each with a relative sequence number 0, as well as incremental TCP window sizes of 1980, 1981, and 1982.

Ah, there's a joke there, if only they used 10 packets :D

> I observed that existing cross-border Internet connections were not affected, while both new IPv4 and IPv6 connections were reset.

Interesting, that would suggest it wasn't an intentional kill-switch (as rumoured initially, testing the capability), but rather likely a misbehaving device / service.

There's also a comment about Pakistan having issues in the same window (tho larger outage for them) so it might be an update / config for some of the same equipment family?

NitpickLawyer commented on In the long run, LLMs make us dumber desunit.com/blog/in-the-l... · Posted by u/speckx

Refreeze5224 · 2 days ago

I imagine his memory and those of people who memorized instead of wrote were better. So by that metric, writing is making people dumber. It's just not all that relevant today, and we don't prioritize memorization to the extent Plate and the ancient Greeks probably did.

NitpickLawyer · 2 days ago

> It's just not all that relevant today, and we don't prioritize memorization to the extent Plate and the ancient Greeks probably did.

Funny enough, that's kinda what we're seeing with LLMs. We're past the "regurgitate the training set" now, and we're more interested in mixing and matching stuff in the context window so we get to a desired goal (i.e. tool use, search, "thinking" and so on). How about that...

NitpickLawyer commented on The contrarian physics podcast subculture timothynguyen.org/2025/08... · Posted by u/Emerson1

cauch · 2 days ago

Even if we are generous and accept that GU was more criticized than other bullshit papers, the claim still needs to prove that the difference of treatment is due to some real bias and not a simple fluctuation.

"I saw 2 persons being judged by a judge, and turned out they were both guilty of the same crime, but the first one got less than the second one. The first one had the same letter in second position in their family name as the judge, so it's the proof that judges are biased favorably towards people who have the same second letter"

But then, the problem is that "their own bullshit papers" is doing a very heavy lifting here. The point of Hossenfelder is that String Theory is as bad as GU. But is it really the case? Hossenfelder keep saying it's true, but a lot of people are not convinced by her arguments and provide convincing reasons for not being convinced. The same kinds of reasons don't apply to GU, so it already shows that GU and String Theory are not on the same level. Even if String Theory has some flow or is misguided on some aspect, does it mean that the level of rejection in an unbiased world will obviously be the same as any other bullshit theory.

Another aspect that is unfair is that a lot of "bullshit theory within the sector" dies without any publicity. They stop rapidly because from within the sector, it is more difficult to surface them without being criticized early. For example, you can have 100 bullshit theories "within the sector" and 3 survive and surface without being as criticized as GU while 97 have been criticized "as much" as GU during their beginning which stopped them growing. Then, you can just point at one of the 3 and say "look, there is one bullshit theory there, it's the proof that scientists never confront bullshit theories when it comes from within". Without being able to quantify properly how the GU-like theories are treated when they are "within", it is just impossible to conclude "when it is from within, it is less criticized".

NitpickLawyer · 2 days ago

I think I get your point. Unfortunately I'm in no way able to speak to string theory other than what I know from pop culture, so it's way out of my league. I only commented on this thread because after reading the blog and having watched the video, it felt that I got something else from the video. Perhaps being "in" you get other nuances. That makes sense.

NitpickLawyer commented on The contrarian physics podcast subculture timothynguyen.org/2025/08... · Posted by u/Emerson1

NitpickLawyer · 2 days ago

I ... Uh... That's not what capitalism means. Sorry. We have plenty of capitalist countries w/ great healthcare, and social programs. Capitalism means "free market" + rules. If you're unhappy about something, fix that, don't throw the baby with the water.