refulgentis (u/refulgentis)

refulgentis commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

CamperBob2 · 5 hours ago

Compared to 4x RTX 6000 Blackwell boards, it's GPU poor. There has to be a reason they want to load up a tower chassis with $35K worth of GPUs, right? I'd have to assume it has strong advantages for inference as well as training, given that the GPU has more influence on TTFT with longer contexts than the CPU does.

refulgentis · 2 hours ago

Right - I'd suggest the idea that 128 GB of GPU RAM gives you an 8K context shows us it may be worth revising priors such as "it has strong advantages for inference as well as training"

As Mr. Hildebrand used to say, when you assume, you make...

(also note the article specifically frames this speccing out as about training :) not just me suggesting it)

refulgentis commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

CamperBob2 · 7 hours ago

What's the TTFT like on a GPU-poor rig, though, once you actually take advantage of large contexts?

refulgentis · 6 hours ago

I guess I'd say, why is the framework perceived as GPU poor? I don't have one but I also don't know why TTFT would be significantly lower than M-series (it's a good GPU!)

refulgentis commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

CamperBob2 · 11 hours ago

If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.

Well, you're getting the ability to maintain a context bigger than 8K or so, for one thing.

refulgentis · 9 hours ago

Well, no, at least, we're off by a factor of about 64x at the very least: 64 GB GPU M2 Max/M4 max top out at about 512K context for 20B params, and the Framework desktop I am referencing has 128 GB unified memory.

refulgentis commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

NitpickLawyer · 13 hours ago

> If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.

I was with you up till here. Come on! CPU inferencing is not it, even macs struggle with bigger models, longer contexts (esp. visible when agentic stuff gets > 32k tokens).

The PRO6000 is the first gpu that actually makes sense to own from their "workstation" series.

refulgentis · 9 hours ago

Er, CPU inferencing? :) I didn't think I mentioned that!

The Framework Desktop thing is that has unified memory with the GPU, so much like an M-series, you can inference disproportionately large models.

refulgentis commented on Building A16Z's Personal AI Workstation a16z.com/building-a16zs-p... · Posted by u/ProofHouse

chis · 13 hours ago

A16Z is consistently the most embarrassing VC firm at any given point in time. I guess optimistically they might be doing “outrage marketing” but it feels more like one of those places where the CEO is just an idiot and tells his employees to jump on every trend.

The funny part is that they still make money. It seems like once you’ve got the connections, being a VC is a very easy job these days.

refulgentis · 13 hours ago

It's been such a mind-boggling decline in intellect, combined with really odd and intense conspiratorial behavior around crypto, that I went into a bit a few months ago.

My weak, uncited, understanding from then they're poorly positioned, i.e in our set they're still the guys who write you a big check for software, but in the VC set they're a joke: i.e. they misunderstood carpet bombing investment as something that scales, and went all in on way too many crypto firm. Now, they have embarrassed themselves with a ton of assets that need to get marked down, it's clearly behind the other bigs, but there's no forcing function to do markdowns.

So we get primal screams about politics and LLM-generated articles about how a $9K video card is the perfect blend between price and performance.

There's other comments effusively praising them on their unique technical expertise. I maintain a llama.cpp client on every platform you can think of. Nothing in this article makes any sense. If you're training, you wouldn't do it on only 4 $9K GPUs that you own. If you're inferencing, you're not getting much more out of this than you would a ~$2K Framework desktop.

refulgentis commented on FFmpeg 8.0 ffmpeg.org/index.html#pr8... · Posted by u/gyan

droopyEyelids · 2 days ago

Could you explain more about it? I assumed the maintainers are doing it as part of their jobs for a company (completely baseless assumption)

refulgentis · 2 days ago

Reupvoted you from gray because I don't think that's fair, but I also don't know how much there is to add. As far as why I'm contributing, I haven't been socially involved in the ffmpeg dev community in a decade, but, it is a very reasonable floor to assume it's 80% not full time paid contributors.

refulgentis commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

dragonwriter · 2 days ago

> In the modes in APIs, the sampling code essentially "rejects and reinference" any token sampled that wouldn't create valid JSON under a grammar created from the schema.

I thought the APIs in use generally interface with backend systems supporting logit manipulation, so there is no need to reject and reinference anything; its guaranteed right the first time because any token that would be invalid has a 0% chance of being produced.

I guess for the closed commercial systems that's speculative, but all the discussion of the internals of the open source systems I’ve seen has indicated that and I don't know why the closed systems would be less sophisticated.

refulgentis · 2 days ago

I maintain a cross-platform llama.cpp client - you're right to point out that generally we expect nuking logits can take care of it.

There is a substantial performance cost to nuking, the open source internals discussion may have glossed over that for clarity (see github.com/llama.cpp/... below). The cost is very high, default in API* is not artificially lower other logits, and only do that if the first inference attempt yields a token invalid in the compiled grammar.

Similarly, I was hoping to be on target w/r/t to what strict mode is in an API, and am sort of describing the "outer loop" of sampling

* blissfully, you do not have to implement it manually anymore - it is a parameter in the sampling params member of the inference params

* "the grammar constraints applied on the full vocabulary can be very taxing. To improve performance, the grammar can be applied only to the sampled token..and nd only if the token doesn't fit the grammar, the grammar constraints are applied to the full vocabulary and the token is resampled." https://github.com/ggml-org/llama.cpp/blob/54a241f505d515d62...

refulgentis commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

ivape · 2 days ago

What formats? I thought the very schema of json is what allows these LLMs to enforce structured outputs at the decoder level? I guess you can do it with any format, but why stray from json?

refulgentis · 2 days ago

In the modes in APIs, the sampling code essentially "rejects and reinference" any token sampled that wouldn't create valid JSON under a grammar created from the schema. Generally, the training is doing 99% of the work, of course, it's just "strict" means "we'll check it's work to the point a GBNF grammar created from the schema will validate."

One of the funnier info scandals of 2025 has been that only Claude was even close to properly trained on JSON file edits until o3 was released, and even then it needed a bespoke format. Geminis have required using a non-formalized diff format by Aider. Wasn't until June Gemini could do diff-string-in-JSON better than 30% of the time and until GPT-5 that an OpenAI model could. (Though v4a, as OpenAI's bespoke edit format is called, is fine because it at least worked well in tool calls. Geminis was a clown show, you had to post process regular text completions to parse out any diffs)

refulgentis commented on Giving people money helped less than I thought it would theargumentmag.com/p/givi... · Posted by u/tekla

vannevar · 4 days ago

The centerpiece of the author's thesis, which is that "the media" exaggerates the impact of cash payments to the poor, is undercut by an egregiously sloppy reading of the results of the Denver Basic Income study: she criticizes the project's claim that there was significant improvement in housing for people receiving $1000/mo vs the control group receiving $50/mo, citing results that show 43% of the controls were in housing by the end of the study while 44% of the test group were. What she fails to mention is that 12% of the controls were already housed at the beginning of the study, vs only 6% of the test group.

She also fails to mention that the Baby's First Year study was unfortunately overlapped by the Covid epidemic, introducing an enormous confounding factor (made all the more significant since the study measured child welfare), not to mention the Covid payments that likely dwarfed the $333/mo study payments and would have been received by both control and test subjects.

https://www.denverbasicincomeproject.org/research

https://newrepublic.com/article/199070/government-cash-payme...

refulgentis · 4 days ago

I am ashamed to complain she's the worst writer I've had the privilege of shaking my head at in my 37 years. There's this rushed, Eye of Sauron saccades, extremely-online, consistent undercurrent, stapled to a Stanford Rationalist™ who has never had to struggle to make a stronger argument - which opened my eyes to how much "Rationalism" is "performing thought in a particular style in a particular social group"

This is a brand new publication and I really wish they skipped her, made the whole endeavour seem unserious and extremely online to me (which it is! but I wouldn't have noticed. so I guess I'm grateful?)

It also made me appreciate how little editorial there is left, so many publications, especially online, are stripped down to the point its freelance bloggers that kinda stay the same no matter what, rather than people growing as writers.

My two most scarring facepalms in recent memory:

- [my home] Oakland is safe, the Feds coming in isn't needed, But..............the real problem of them being here would be Duh Dems complaining, people know crime is real and bad and hate being lied to.

- It'd be bad if we deported people for their views but then again we don't know what we don't know about the level of terrorist support provided by these people who complain about Gaza.

refulgentis commented on The road that killed Legend Jenkins was working as designed strongtowns.org/journal/2... · Posted by u/h14h

refulgentis · 6 days ago

Saw enough comments like this with unbelievable details that I went and figured out how to see it. Here: https://maps.app.goo.gl/aEjjXcmDDaFKtivR9

We are rounding a surface street with a 45 limit to a highway at 60 and then pretend its obviously unsafe. This is obviously wrong, given the crosswalks.

Also, we have 0 idea if the child was allowed to jaywalk. We know they were on the phone with the older one at at least some point. That's all.

It's a tragedy, but, hard to get my head to the idea that its manslaughter that both parents are culpable for. As noted in coverage, it's an odd gap compared to how unsecured guns are treated.

refulgentis · 4 days ago

-1. What a great website its been here lately. Lets just make things up altogether, and damn those worryworts arguing with us: as long as it feels good, we are happy.