Readit News logoReadit News
dust42 commented on Claude Code: connect to a local model when your quota runs out   boxc.net/blog/2026/claude... · Posted by u/fugu2
Gravey · 4 days ago
Would you mind sharing your hardware setup and use case(s)?
dust42 · 4 days ago
The brand new Qwen3-Coder-Next runs at 300Tok/s PP and 40Tok/s on M1 64GB with 4-bit MLX quant. Together with Qwen Code (fork of Gemini) it is actually pretty capable.

Before that I used Qwen3-30B which is good enough for some quick javascript or Python, like 'add a new endpoint /api/foobar which does foobaz'. Also very decent for a quick summary of code.

It is 530Tok/s PP and 50Tok/s TG. If you have it spit out lots of the code that is just copy of the input, then it does 200Tok/s, i.e. 'add a new endpoint /api/foobar which does foobaz and return the whole file'

dust42 commented on French streamer unbanked by Qonto after criticizing Palantir and Peter Thiel   twitter.com/Ced_haurus/st... · Posted by u/hocuspocus
gruez · 4 days ago
>It seems pretty in character and it's not like there is another more plausible reason being offered.

In character of what, that Thiel is a mustache twirling villain? Did other companies backed by him have a history of banning his critics?

>It seems pretty in character and it's not like there is another more plausible reason being offered.

By his own admission, neobanks have a history of banning clients arbitrarily without recourse. My guess it's run of the mill incompetence, not oppressing Thiel's critics.

dust42 · 4 days ago
Well, he destroyed Gawker. Not that I think they were good people. But it was definitely a personal vendetta.
dust42 commented on French streamer unbanked by Qonto after criticizing Palantir and Peter Thiel   twitter.com/Ced_haurus/st... · Posted by u/hocuspocus
bhouston · 4 days ago
That can not be actually what happened is it? That would be insane.

It should be against the law to privately retaliate like this.

dust42 · 4 days ago
There are now quite a few cases in Europe where the EU or local govs been de-banking individuals. No court, no judge needed. Much more efficient way to shut down critics. We ain't need no people who delegitimize those in power.
dust42 commented on AliSQL: Alibaba's open-source MySQL with vector and DuckDB engines   github.com/alibaba/AliSQL... · Posted by u/baotiao
mavamaarten · 5 days ago
So you pasted someone's comment in an LLM and posted the output here. Cool. Not really.
dust42 · 5 days ago
He's Chinese and if you had looked into his comment history you'd know this is not someone who uses LLMs for karma farming and looking at his blog he has a long history of posting about database topics going back before there was GPT.

Should I ever participate in a Chinese speaking forum, I'd certainly use an LLM for translation as well.

dust42 commented on Qwen3-Coder-Next   qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen
cgearhart · 5 days ago
Any notes on the problems with MLX caching? I’ve experimented with local models on my MacBook and there’s usually a good speedup from MLX, but I wasn’t aware there’s an issue with prompt caching. Is it from MLX itself or LMstudio/mlx-lm/etc?
dust42 · 5 days ago
It is the buffer implementation. [u1 10kTok]->[a1]->[u2]->[a2]. If you branch between the assistant1 and user2 answers then MLX does reprocess the u1 prompt of let's say 10k tokens while llama.cpp does not.

I just tested with GGUF and MLX of Qwen3-Coder-Next with llama.cpp and now with LMStudio. As I do branching very often, it is highly annoying for me to the point of being unusable. Q3-30B is much more usable then on Mac - but by far not as powerful.

dust42 commented on Qwen3-Coder-Next   qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen
ttoinou · 5 days ago
I can run nightmedia/qwen3-next-80b-a3b-instruct-mlx at 60-74 tps using LM Studio. What did you try ? What benefit do you get from KV Caching ?
dust42 · 5 days ago
KV caching means that when you have 10k prompt, all follow up questions return immediately - this is standard with all inference engines.

Now if you are not happy with the last answer, you maybe want to simply regenerate it or change your last question - this is branching of the conversation. Llama.cpp is capable of re-using the KV cache up to that point while MLX does not (I am using MLX server from MLX community project). I haven't tried with LMStudio. Maybe worth a try, thanks for the heads-up.

dust42 commented on Qwen3-Coder-Next   qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen
simonw · 5 days ago
This GGUF is 48.4GB - https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/... - which should be usable on higher end laptops.

I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.

Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: https://unsloth.ai/docs/models/qwen3-coder-next

dust42 · 5 days ago
Unfortunately Qwen3-next is not well supported on Apple silicon, it seems the Qwen team doesn't really care about Apple.

On M1 64GB Q4KM on llama.cpp gives only 20Tok/s while on MLX it is more than twice as fast. However, MLX has problems with kv cache consistency and especially with branching. So while in theory it is twice as fast as llama.cpp it often does the PP all over again which completely trashes performance especially with agentic coding.

So the agony is to decide whether to endure half the possible speed but getting much better kv-caching in return. Or to have twice the speed but then often you have again to sit through prompt processing.

But who knows, maybe Qwen gives them a hand? (hint,hint)

dust42 commented on Two kinds of AI users are emerging   martinalderson.com/posts/... · Posted by u/martinald
somat · 7 days ago
Isn't this true of any greenfield project? with or without generative models. The first few days are amazingly productive. and then features and fixes get slower and slower. And you get to see how good an engineer you really are, as your initial architecture starts straining under the demands of changing real world requirements and you hope it holds together long enough to ship something.

"I could make that in a weekend"

"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"

dust42 · 7 days ago
From personal experience I'd like to add the last 5% take 95% of the time - at least if you are working on a make over of an old legacy system.
dust42 commented on Waymo robotaxi hits a child near an elementary school in Santa Monica   techcrunch.com/2026/01/29... · Posted by u/voxadam
dyauspitr · 10 days ago
It’s great handling of the situation. They should release a video as well.
dust42 · 10 days ago
Indeed. Rather than having the company telling me that they did great I'd rather make up my own mind and watch the video.
dust42 commented on AISLE’s autonomous analyzer found all CVEs in the January OpenSSL release   aisle.com/blog/aisle-disc... · Posted by u/mmsc
teiferer · 12 days ago
> the problem is two fold

No, the biggest problem at the root of all this is complexity. OpenSSL is a garbled mess. No matter AI or not, such software should not be the security backbone of the internet.

People writing and maintaining software need to optimize for simplicity, readibility, maintainability. Whether they use an LLM to achieve that is seconday. The humans in the loop must understand what's going on.

dust42 · 12 days ago
> People writing and maintaining software need to optimize for simplicity, readibility, maintainability. Whether they use an LLM to achieve that is seconday. The humans in the loop must understand what's going on.

In a perfect world that is.

u/dust42

KarmaCake day370February 7, 2025View Original