pcwelder (u/pcwelder)

Readit News

pcwelder commented on How to build a coding agent ghuntley.com/agent/... · Posted by u/ghuntley

faangguyindia · 2 days ago

Anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget

now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase

lets see how many tokens it consumes per bug fix or feature addition.

pcwelder · 2 days ago

Agree. To reduce costs:

1. Precompute frequently used knowledge and surface early. For example repository structure, os information, system time.

2. Anticipate next tool calls. If a match is not found while editing, instead of simply failing, return closest matching snippet. If read file tool gets a directory, return directory contents.

3. Parallel tool calls. Claude needs either a batch tool or special scaffolding to promote parallel tool calls. Single tool call per turn is very expensive.

Are there any other such general ideas?

pcwelder commented on Let's properly analyze an AI article for once nibblestew.blogspot.com/2... · Posted by u/pabs3

pcwelder · 17 days ago

>I found, a required sample size for just one thousand people would be 278

It's interesting to note that for a billion people this number changes to a whopping ... 385. Doesn't change much.

I was curious, with 22 sample size (assuming unbiased sample, yada yada), while estimating the proportion of people satisfying a criteria, the margin of error is 22%.

While bad, if done properly, it may still be insightful.

pcwelder commented on Getting good results from Claude Code dzombak.com/blog/2025/08/... · Posted by u/ingve

libraryofbabel · 18 days ago

Yeah, agree that the benchmarks don't really seem to reflect the community consensus. I wonder if part of it is the better symbiosis between the agent (Claude Code) and the Opus and Sonnet models it uses, which supposedly are fine-tuned on Claude Code tool calls? But agree, there is probably some additional secret sauce in the training, perhaps to do with RL on multi-step problems...

pcwelder · 17 days ago

I get similar accuracy to claude code using claude desktop app with a file+bash mcp (different tools same performance).

My guess for why GPT5 scores more on benchmarks is that they evaluate on well defined tasks with all instructions given at the start.

Real life is multi turn. Multiple set of prompts to adhere to. This is where Claude is likely better.

Deleted Comment

pcwelder commented on Why build a domain-specific agent for front end tasks? kombai.com/why... · Posted by u/pcwelder

alganet · a month ago

Why not simply call it "specialist"? Are you trying to make some close connection to "Domain Specific Languages" somehow?

pcwelder · a month ago

To be absolutely honest, this wasn't a very conscious choice :-)

I don't think a direct similarity with domain specific languages is evident to me. I rather find the messaging similar to some "agents" from other domains. e.g. https://www.harvey.ai/

Posted by u/pcwelder a month ago

Why build a domain-specific agent for front end tasks?kombai.com/why...

pcwelder commented on LLMs are bad at returning code in JSON aider.chat/2024/08/14/cod... · Posted by u/pcwelder

pcwelder · a month ago

PSA: don't generate code using tools (and MCPs) if you're using Gemini or Openai; both ask LLMs to generate JSON directly for function calling. Claude uses XML, so it escapes the issue.

Posted by u/pcwelder a month ago

LLMs are bad at returning code in JSON aider.chat/2024/08/14/cod...

pcwelder commented on Rethinking CLI interfaces for AI notcheckmark.com/2025/07/... · Posted by u/Bogdanp

pcwelder · a month ago

Losing the sense of cwd is the reason why I append it in the output of each command run in wcgw mcp [1]

It rarely does it incorrectly after that.

I won't be surprised if claude code does the same soon.

However, they do have an env flag called CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1

This should also fix the wrong dir behavior.

[1] https://github.com/rusiaaman/wcgw

pcwelder commented on Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)" simonwillison.net/2025/Ju... · Posted by u/simonw

eightysixfour · 2 months ago

He is saying he gave them a prompt to tell them they are built by xAI.

pcwelder · a month ago

Yes, thanks for clarifying. I specified in the system prompt that they're built by xAI and other system instructions from Grok 4.

u/pcwelder

KarmaCake day330November 27, 2019View Original