Well, you're getting the ability to maintain a context bigger than 8K or so, for one thing.
I was with you up till here. Come on! CPU inferencing is not it, even macs struggle with bigger models, longer contexts (esp. visible when agentic stuff gets > 32k tokens).
The PRO6000 is the first gpu that actually makes sense to own from their "workstation" series.
The Framework Desktop thing is that has unified memory with the GPU, so much like an M-series, you can inference disproportionately large models.
The funny part is that they still make money. It seems like once you’ve got the connections, being a VC is a very easy job these days.
My weak, uncited, understanding from then they're poorly positioned, i.e in our set they're still the guys who write you a big check for software, but in the VC set they're a joke: i.e. they misunderstood carpet bombing investment as something that scales, and went all in on way too many crypto firm. Now, they have embarrassed themselves with a ton of assets that need to get marked down, it's clearly behind the other bigs, but there's no forcing function to do markdowns.
So we get primal screams about politics and LLM-generated articles about how a $9K video card is the perfect blend between price and performance.
There's other comments effusively praising them on their unique technical expertise. I maintain a llama.cpp client on every platform you can think of. Nothing in this article makes any sense. If you're training, you wouldn't do it on only 4 $9K GPUs that you own. If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.
I thought the APIs in use generally interface with backend systems supporting logit manipulation, so there is no need to reject and reinference anything; its guaranteed right the first time because any token that would be invalid has a 0% chance of being produced.
I guess for the closed commercial systems that's speculative, but all the discussion of the internals of the open source systems I’ve seen has indicated that and I don't know why the closed systems would be less sophisticated.
There is a substantial performance cost to nuking, the open source internals discussion may have glossed over that for clarity (see github.com/llama.cpp/... below). The cost is very high, default in API* is not artificially lower other logits, and only do that if the first inference attempt yields a token invalid in the compiled grammar.
Similarly, I was hoping to be on target w/r/t to what strict mode is in an API, and am sort of describing the "outer loop" of sampling
* blissfully, you do not have to implement it manually anymore - it is a parameter in the sampling params member of the inference params
* "the grammar constraints applied on the full vocabulary can be very taxing. To improve performance, the grammar can be applied only to the sampled token..and nd only if the token doesn't fit the grammar, the grammar constraints are applied to the full vocabulary and the token is resampled." https://github.com/ggml-org/llama.cpp/blob/54a241f505d515d62...
One of the funnier info scandals of 2025 has been that only Claude was even close to properly trained on JSON file edits until o3 was released, and even then it needed a bespoke format. Geminis have required using a non-formalized diff format by Aider. Wasn't until June Gemini could do diff-string-in-JSON better than 30% of the time and until GPT-5 that an OpenAI model could. (Though v4a, as OpenAI's bespoke edit format is called, is fine because it at least worked well in tool calls. Geminis was a clown show, you had to post process regular text completions to parse out any diffs)
She also fails to mention that the Baby's First Year study was unfortunately overlapped by the Covid epidemic, introducing an enormous confounding factor (made all the more significant since the study measured child welfare), not to mention the Covid payments that likely dwarfed the $333/mo study payments and would have been received by both control and test subjects.
https://www.denverbasicincomeproject.org/research
https://newrepublic.com/article/199070/government-cash-payme...
This is a brand new publication and I really wish they skipped her, made the whole endeavour seem unserious and extremely online to me (which it is! but I wouldn't have noticed. so I guess I'm grateful?)
It also made me appreciate how little editorial there is left, so many publications, especially online, are stripped down to the point its freelance bloggers that kinda stay the same no matter what, rather than people growing as writers.
My two most scarring facepalms in recent memory:
- [my home] Oakland is safe, the Feds coming in isn't needed, But..............the real problem of them being here would be Duh Dems complaining, people know crime is real and bad and hate being lied to.
- It'd be bad if we deported people for their views but then again we don't know what we don't know about the level of terrorist support provided by these people who complain about Gaza.
We are rounding a surface street with a 45 limit to a highway at 60 and then pretend its obviously unsafe. This is obviously wrong, given the crosswalks.
Also, we have 0 idea if the child was allowed to jaywalk. We know they were on the phone with the older one at at least some point. That's all.
It's a tragedy, but, hard to get my head to the idea that its manslaughter that both parents are culpable for. As noted in coverage, it's an odd gap compared to how unsecured guns are treated.
As Mr. Hildebrand used to say, when you assume, you make...
(also note the article specifically frames this speccing out as about training :) not just me suggesting it)