anon373839 (u/anon373839)

anon373839 commented on OpenClaw is what Apple intelligence should have been jakequist.com/thoughts/op... · Posted by u/jakequist

huwsername · 5 days ago

I don’t believe this was ever confirmed by Apple, but there was widespread speculation at the time[1] that the delay was due to the very prompt injection attacks OpenClaw users are now discovering. It would be genuinely catastrophic to ship an insecure system with this kind of data access, even with an ‘unsafe mode’.

These kinds of risks can only be _consented to_ by technical people who correctly understand them, let alone borne by them, but if this shipped there would be thousands of Facebook videos explaining to the elderly how to disable the safety features and open themselves up to identity theft.

The article also confuses me because Apple _are_ shipping this, it’s pretty much exactly the demo they gave at WWDC24, it’s just delayed while they iron this out (if that is at all possible). By all accounts it might ship as early as next week in the iOS 26.4 beta.

[1]: https://simonwillison.net/2025/Mar/8/delaying-personalized-s...

anon373839 · 5 days ago

Exactly. Apple operates at a scale where it's very difficult to deploy this technology for its sexy applications. The tech is simply too broken and flawed at this point. (Whatever Apple does deploy, you can bet it will be heavily guardrailed.) With ~2.5 billion devices in active use, they can't take the Tesla approach of letting AI drive cars into fire trucks.

anon373839 commented on Claude Code: connect to a local model when your quota runs out boxc.net/blog/2026/claude... · Posted by u/fugu2

paxys · 5 days ago

> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

anon373839 · 5 days ago

It's true that open models are a half-step behind the frontier, but I can't say that I've seen "sheer intelligence" from the models you mentioned. Just a couple of days ago Gemini 3 Pro was happily writing naive graph traversal code without any cycle detection or safety measures. If nothing else, I would have thought these models could nail basic algorithms by now?

anon373839 commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

Kostic · 6 days ago

I would not go below q8 if comparing to sonnet.

anon373839 · 6 days ago

Yeah. Q2 in any model is just severely damaged, unfortunately. Wish it weren’t so.

anon373839 commented on Qwen3-Coder-Next qwen.ai/blog?id=qwen3-cod... · Posted by u/danielhanchen

cgearhart · 6 days ago

Any notes on the problems with MLX caching? I’ve experimented with local models on my MacBook and there’s usually a good speedup from MLX, but I wasn’t aware there’s an issue with prompt caching. Is it from MLX itself or LMstudio/mlx-lm/etc?

anon373839 · 6 days ago

There’s this issue/outstanding PR: https://github.com/lmstudio-ai/mlx-engine/pull/188#issuecomm...

anon373839 commented on The Codex App openai.com/index/introduc... · Posted by u/meetpateltech

IhateAI · 7 days ago

I'm not making any guesses, I happen to know for a fact what it costs. Please go try to sell inference and compete on price. You actually have no clue what you're talking about. I knew when I sent that response I was going to get "but Kimi!"

anon373839 · 7 days ago

The numbers you stated sound off ($500k capex + electricity per 3 concurrent requests?). Especially now that the frontier has moved to ultra sparse MoE architectures. I’ve also read a couple of commodity inference providers claiming that their unit economics are profitable.

anon373839 commented on Trinity large: An open 400B sparse MoE model arcee.ai/blog/trinity-lar... · Posted by u/linolevan

nl · 12 days ago

Progress has not become linear. We've just hit the limits of what we can measure and explain easily.

One year ago coding agents could barely do decent auto-complete.

Now they can write whole applications.

That's much more difficult to show than an ELO score based on how people like emjois and bold text in their chat responses.

Don't forget Llama4 led Lmarena and turned out to be very weak.

anon373839 · 12 days ago

Much of these gains can be attributed to better tooling and harnesses around the models. Yes, the models also had to be retrained to work with the new tooling, but that doesn’t mean there was a step change in their general “intelligence” or capabilities. And sure enough, I’m seeing the same old flaws as always: frontier models fabricating info not present in the context, having blindness to what is present, getting into loops, failing to follow simple instructions…

anon373839 commented on LM Studio 0.4 lmstudio.ai/blog/0.4.0... · Posted by u/jiqiren

PlatoIsADisease · 12 days ago

I originally used local models as a somewhat therapeutic/advice thing. I didn't want to give openAI all my dirt.

But then I decided I'm just a chemical reaction and a product of my environment, so I gave chatGPT all my dirt anyway.

But before, I cared about my privacy.

anon373839 · 12 days ago

> But then I decided I'm just a chemical reaction

That doesn’t address the practical significance of privacy, though. The real risk isn’t that OpenAI employees will read your chats for personal amusement. The risk is that OpenAI will exploit the secrets you’ve entrusted to them, to manipulate you, or to enable others to manipulate you.

The more information an unscrupulous actor has about you, the more damage they can do.

anon373839 commented on LM Studio 0.4 lmstudio.ai/blog/0.4.0... · Posted by u/jiqiren

nubg · 12 days ago

are parallel requests "free"? or do you half performance when sending two requests in parallel?

anon373839 · 12 days ago

I have seen ~1,300 tokens/sec of total throughout with Llama 3 8B on a MacBook Pro. So no, you don’t halve the performance. But running batched inference takes more memory, so you have to use shorter contexts than if you weren’t batching.

anon373839 commented on Bypassing Gemma and Qwen safety with raw strings teendifferent.substack.co... · Posted by u/teendifferent

dvt · 21 days ago

One AI smell is "it's not just X <stop> it's Y." Can be done with semicolons, em dashes, periods, etc. It's especially smelly when Y is a non sequitur. For example what, exactly, is a "high-utility response to harmful queries?" It's gibberish. It sounds like it means something, but it doesn't actually mean anything. (The article isn't even about the degree of utility, so bringing it up is nonsensical.)

Another smell is wordiness (you would get marked down for this phrase even in a high school paper): "it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting." But more specifically, the smelly words are "fragile state," "evaporates," "deviate" and (arguably) "expected."

anon373839 · 21 days ago

I think this is 100% in your mind. The article does not in any way read to me as having AI-generated prose.

anon373839 commented on Anthropic Explicitly Blocking OpenCode gist.github.com/R44VC0RP/... · Posted by u/ryanvogel

pton_xd · a month ago

I do admit to feeling some schadenfreude over them reacting to their product being leeched by others.

I get it though, Anthropic has to protect their investment in their work. They are in a position to do that, whereas most of us are not.

anon373839 · a month ago

> protect their investment

Viewed another way, the preferential pricing they're giving to Claude Code (and only Claude Code) is anticompetitive behavior that may be illegal.