regularfry (u/regularfry)

regularfry commented on We tasked Opus 4.6 using agent teams to build a C Compiler anthropic.com/engineering... · Posted by u/modeless

whinvik · 3 days ago

It's weird to see the expectation that the result should be perfect.

All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!

regularfry · 3 days ago

This is firmly where I am. "The wonder is not how well the dog dances, it is that it dances at all."

regularfry commented on We tasked Opus 4.6 using agent teams to build a C Compiler anthropic.com/engineering... · Posted by u/modeless

cheema33 · 3 days ago

As others have pointed out, humans train on existing codebases as well. And then use that knowledge to build clean room implementations.

regularfry · 3 days ago

What they don't do is read the product they're clean-rooming. That's kinda disqualifying. Impossible to know if the GCC source is in 4.6's training set but it would be kinda weird if it wasn't.

Deleted Comment

regularfry commented on GPT-5.3-Codex openai.com/index/introduc... · Posted by u/meetpateltech

edem · 3 days ago

So can I use this from Opencode? Because Anthropic started to enforce their TOS to kill the Opencode integration

regularfry · 3 days ago

I've tried opus 4.5 in opencode via the GitHub Copilot API, mostly to see if it works all. I don't think that broke any terms of service? But also I haven't checked how much more expensive I made it for myself over just calling them directly.

regularfry commented on New York’s budget bill would require “blocking technology” on all 3D printers blog.adafruit.com/2026/02... · Posted by u/ptorrone

crote · 4 days ago

Prusa had been moving towards proprietary licensing (if they release files at all) for a while now, due to their open source design files being used to undercut the original with cheaper clones.

regularfry · 4 days ago

I seriously doubt it's the undercutting that's the problem here. When they release a new model they can't keep up with demand anyway, they max out production capacity on legitimate orders.

I think, if anything, the problem is when people buy a cheap clone and blame Prusa when it fails.

regularfry commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

segmondy · 5 days ago

you do realize claude opus/gpt5 are probably like 1000B-2000B models? So trying to have a model that's < 60B offer the same level of performance will be a miracle...

regularfry · 5 days ago

There is (must be - information theory) a size/capacity efficiency frontier. There is no particular reason to think we're anywhere near it right now.

regularfry commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

codazoda · 5 days ago

I can't get Codex CLI or Claude Code to use small local models and to use tools. This is because those tools use XML and the small local models have JSON tool use baked into them. No amount of prompting can fix it.

In a day or two I'll release my answer to this problem. But, I'm curious, have you had a different experience where tool use works in one of these CLIs with a small local model?

regularfry · 5 days ago

Surely the answer is a very small proxy server between the two?

regularfry commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

noveltyaccount · 5 days ago

> separate models for /plan and /build

I had not considered that, seems like a great solution for local models that may be more resource-constrained.

regularfry · 5 days ago

You can configure aider that way. You get three, in fact: an architect model, a code editor model, and a quick model for things like commit messages. Although I'm not sure if it's got doc searching capabilities.

regularfry commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

yorwba · 5 days ago

SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.

regularfry · 5 days ago

If this is genuinely better than K2.5 even at a third the speed then my openrouter credits are going to go unused.

regularfry commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

noveltyaccount · 5 days ago

I think I like coding models that know a lot about the world. They can disambiguate my requirements and build better products.

regularfry · 5 days ago

I generally prefer a coding model that can google for the docs, but separate models for /plan and /build is also a thing.

u/regularfry

KarmaCake day8936January 6, 2009View Original