samdjstephens (u/samdjstephens)

samdjstephens commented on Many SWE-bench-Passing PRs would not be merged metr.org/notes/2026-03-10... · Posted by u/mustaphah

tobr · 18 hours ago

I wonder why they fail this specific way. If you just let them do stuff everything quickly turns spaghetti. They seem to overlook obvious opportunities to simplify things or see a pattern and follow through. The default seems to be to add more, rather than rework or adjust what’s already in place.

samdjstephens · 16 hours ago

I suspect it has something to do with a) the average quality of code in open source repos and b) the way the reward signal is applied in RL post-training - does the model face consequences of a brittle implementation for a task?

I wonder if these RL runs can extend over multiple sequential evaluations, where poor design in an early task hampers performance later on, as measured by amount of tokens required to add new functionality without breaking existing functionality.

samdjstephens commented on Semantic ablation: Why AI writing is generic and boring theregister.com/2026/02/1... · Posted by u/benji8000

doomslayer999 · 23 days ago

Literally the worst thing that happened to the internet after addictive doomscroll feeds and ads everywhere.

And, the worst part is noone will ever make a new internet because of the founder effect. We are basically in the worst timeline.

samdjstephens · 23 days ago

Maybe. Another potential, more positive, timeline is that semantically ablated content filling everyone’s feeds turns people off, and slowly kills the social feed paradigm.

samdjstephens commented on Software engineers should be a little bit cynical seangoedecke.com/a-little... · Posted by u/zdw

AttentionBlock · 2 months ago

> It’s a cynical way to view the C-staff of a company. I think it’s also inaccurate: from my limited experience, the people who run large tech companies really do want to deliver good software to users.

I strongly disagree with this statement. What C-staff cares about is share-holder value. What middle management care about is empire building and promotions.

> for instance, to make it possible for GitHub’s 150M users to use LaTeX in markdown - you need to coordinate with many other people at the company, which means you need to be involved in politics.

You presented your point in a misleading way. I would classify this as collaboration/communication rather than politics.

Politics is when you need to tick off a useless boxes for your promo, when you try to to take credits for work you haven't helped with, when you throw your colleague under the bus, when you get undeserved performance rating because the manager thinks you are his good boy. There's a lot more, I didn't read any of your previous blogs, but all of these things are what engineers dread when we refer to politics.

samdjstephens · 2 months ago

Politics is accruing and deploying political capital within an organisation - or less abstractly, building relationships and using them.

What you’re describing is a particular form of manipulative and divisive politics which is performed by insecure, desperate or selfish people.

Many engineers are not good at building relationships (the job of coding isn’t optimal for it after all), so painting the people who are good at is as narcissistic may be comforting but isn’t correct.

samdjstephens commented on The port I couldn't ship ammil.industries/the-port... · Posted by u/cjlm

gortok · 3 months ago

While there's not a lot of meat on the bone for this post, one section of it reflects the overall problem with the idea of Claude-as-everything:

> I spent weeks casually trying to replicate what took years to build. My inability to assess the complexity of the source material was matched by the inability of the models to understand what it was generating.

When the trough of disillusionment hits, I anticipate this will become collective wisdom, and we'll tailor LLMs to the subset of uses where they can be more helpful than hurtful. Until then, we'll try to use AI to replace in weeks what took us years to build.

samdjstephens · 3 months ago

If LLMs stopped improving today I’m sure you would be correct- as it is I think it’s very hard to predict what the future holds and where the advancements take us.

I don’t see a particularly good reason why LLMs wouldn’t be able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well.

samdjstephens commented on Tiny electric motor can produce more than 1,000 horsepower supercarblondie.com/elect... · Posted by u/chris_overseas

fainpul · 4 months ago

> 59kW/kg

At this point why don't we get rid of the k prefix and write 59W/g?

Edit:

I was half joking, but various answers mention kW being standard for motors, kg being the SI unit for mass etc. All true, but as used here in a combined unit, which means "power density" it still would make sense IMO. It's not like the "59" tells you that it's a strong motor and hence you want kW to compare it to other motors. You can't, it's just a ratio (power to weigth). W/g just reads much nicer in my head. Or we could come up with a name, like for other units. Let's call it "fainpul" (short fp) for example :)

59 fp is a new record for electric motors!

samdjstephens · 4 months ago

kg is the SI unit for mass, I think that would be why

samdjstephens commented on Claude Skills are awesome, maybe a bigger deal than MCP simonwillison.net/2025/Oc... · Posted by u/weinzierl

samdjstephens · 5 months ago

It seems to me that MCP and Skills are solving 2 different problems and provide solutions that compliment each other quite nicely.

MCP is about integration of external systems and services. Skills are about context management - providing context on demand.

As Simon mentions, one issue with MCP is token use. Skills seem like a straightforward way to manage that problem: just put the MCP tools list inside a skill where they use no tokens until required.

samdjstephens commented on OpenAI reaches agreement to buy Windsurf for $3B bloomberg.com/news/articl... · Posted by u/swyx

lolinder · 10 months ago

> Looking for a moat in the technology is always a bit of a trap - it’s in the traction, the brand awareness, the user data etc.

Traction, brand awareness, and user data do not favor Windsurf over GitHub Copilot. The few of us who follow all the new developments are aware that Windsurf has been roughly leading the pack in terms of capabilities, but do not underestimate the power of being bundled into both VS Code and GitHub by default. Everyone else is an upstart by comparison and needs some form of edge to make up for it, and without a moat it will be very hard for them to maintain their edge long enough to beat GitHub's dominance.

samdjstephens · 10 months ago

Definitely take that point. But this valuation is perhaps more about how much that traction, brand and data is worth to OpenAI, who cannot buy Copilot. $3bn doesn’t seem so disproportionate in that context especially given the amount of money being attracted to the space.

samdjstephens commented on OpenAI reaches agreement to buy Windsurf for $3B bloomberg.com/news/articl... · Posted by u/swyx

retornam · 10 months ago

I'm skeptical about this VSCode fork commanding a $3 billion valuation when it depends on API services it doesn't own. What's their moat here?

For comparison, JetBrains generates over $400 million in annual revenue and is valued around $7 billion. They've built proprietary technology and deep expertise in that market over decades.

If AI (terminology aside) replaces many professional software engineers and programmers like some of its fierce advocates say it would, wouldn't their potential customer base shrink?

Professionals typically drive enterprise revenue, while hobbyists—who might become the primary users—generally don't support the same business model or spending levels.

What am I missing here?

samdjstephens · 10 months ago

Just consider what it fundamentally is: a company at the leading edge of a product category that has found absurdly strong technology/use-case fit, and is growing insanely fast.

Looking for a moat in the technology is always a bit of a trap - it’s in the traction, the brand awareness, the user data etc.

samdjstephens commented on DeepSeek-R1 github.com/deepseek-ai/De... · Posted by u/meetpateltech

widdershins · a year ago

Yeesh, that shows a pretty comprehensive dearth of humour in the model. It did a decent examination of characteristics that might form the components of a joke, but completely failed to actually construct one.

I couldn't see a single idea or wordplay that actually made sense or elicited anything like a chuckle. The model _nearly_ got there with 'krill' and 'kill', but failed to actually make the pun that it had already identified.

samdjstephens · a year ago

Yeah it's very interesting... It appears to lead itself astray: the way it looks at several situational characteristics, gives each a "throw-away" example, only to then mushing all those examples together to make a joke seems to be it's downfall in this particular case.

Also I can't help but think that if it had written out a few example jokes about animals rather than simply "thinking" about jokes, it might have come up with something better

samdjstephens commented on TSMC's Arizona Plant to Start Making Advanced Chips spectrum.ieee.org/tsmc-ar... · Posted by u/rbanffy

ksec · a year ago

I just want to add the term "ADVANCED" in terms of foundry node now has an official meaning anything sub 7nm. With specific rules in place in terms of export especially to China. This was a reference from ASML presentation not so long ago.

It is also important to point out, the achievement here is how fast TSMC manage to set things up and running even without the home ground advantage. Intel couldn't even replicate this time frame if it was their Intel 7nm Fab. And of course the greatest record was that with enough planning and permission done before hand TSMC manage to have the fab built and running within 18 months in Taiwan. ( Arguably closer to 12 months )

This also means unless a miracle happen or US Gov being unfair with certain things the chances of Intel catching up with its current team, management, board members and investors, against TSMC in terms of capacity, price, and lead time as a foundry is close to zero. ( I am sorry but I lost all faith and hope now Pat Gelsinger is out. )

Once TSMC 2nm hits the ground later this year, TSMC US will also start their 3nm work if they haven't started now.

samdjstephens · a year ago

It’s about demand isn’t it? TSMC have red hot demand, it’s not hard to understand their urgency in setting up new fabs, wherever they may be. Intel don’t have the same incentive - their incentive is to take the money (because, why wouldn’t you), build newer fabs and hope for some breakthrough in demand. The urgency is not there: being complete before there is demand could be detrimental