TLDR: I think OpenAI may have taken the medal for best available open weight model back from the Chinese AI labs. Will be interesting to see if independent benchmarks resolve in that direction as well.
The 20B model runs on my Mac laptop using less than 15GB of RAM.
A lot of the advantage is that it can make forward progress when I can’t. I can check to see if an agent is stuck, and sometimes reprompt it, in the downtime between meetings or after lunch before I start whatever deep thinking session I need to do. That’s pure time recovered for me. I wouldn’t have finished _any_ work with that time previously.
I don’t need to optimize my time around babysitting the agent. I can do that in the margins. Watching the agents is low context work. That adds the capability to generate working solutions during times that was previously barred from that.
Either way, I'm happy that you are getting so much out of the tools. Perhaps I need to prompt harder, or the codebase I work on has just deviated too much from the stuff the LLMs like and simply isn't a good candidate. Either way, appreciate talking to you!
I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
You can keep being mad at me for not providing a detailed target list, I said several times that that's not what the point of this is. You can keep refusing to actually elaborate on how you use AI day to day and solve its problems. That's fine. I don't care. I care a lot more to talk about the people who are actually engaging with me (such as your friend) and helping me to understand what they are doing. Right now, if you're going to keep not actually contributing to the conversation, you're just kinda being a salty guy with an almost unfathomable 408,000 karma going through every HN thread every single day and making hot takes.
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.
For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.
But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.
Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.
If your organization is routinely spending 3 months on a code review, it sounds like there's probably a 10 to 100x improvement you can extract from fixing your process before you even start using AI.
You're rebutting a claim about your rant that -if it ever did exist- has been backed away from and disowned several times.
From [0]
> > Wait, now you're saying I set the 10x bar? No, I did not.
>
> I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article.
and from [1]
> I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
[0] <https://news.ycombinator.com/item?id=44799049>
[1] <https://news.ycombinator.com/item?id=44804434>
My post is about how those types of claims are unfounded and make people feel anxious unnecessarily. He just doesn't want to confront that he wrote an article that directly says these words and that those words have an effect. He wants to use strong language without any consequences. So he's trying to nitpick the things I say and ignore my requests for further information. It's kinda sad to watch, honestly.