axiom92 (u/axiom92) - Readit News

axiom92 commented on Adaptive LLM routing under budget constraints arxiv.org/abs/2508.21141... · Posted by u/tdchaitanya

axiom92 · 8 days ago

From last neurips https://automix-llm.github.io/automix/

axiom92 commented on Grok 4 Launch [video] twitter.com/xai/status/19... · Posted by u/meetpateltech

Mystery-Machine · 2 months ago

Did no one notice that their voice demo was staged and prerecorded with several cuts and several different videos patched?

axiom92 · 2 months ago

The demo was done live (as was everything else).

axiom92 commented on The Dark Forest hypothesis is absurd noahpinion.blog/p/the-dar... · Posted by u/paulpauper

romanhn · 2 years ago

Just a PSA that 3 Body Problem is coming in March as a Netflix series. I was pleasantly surprised to see a trailer come out recently.

axiom92 · 2 years ago

Looks pretty cool https://www.youtube.com/watch?v=mogSbMD6EcY

Although, it seems it's only going to cover the first book (which makes sense, given how difficult the other two would be to film). The real magic for me was in book 3. It was inspiring to see someone think so far out, so boldly.

axiom92 commented on Turing Complete Transformers: Two Transformers Are More Powerful Than One openreview.net/forum?id=M... · Posted by u/georgehill

qsort · 2 years ago

These reviews are brutal. It's basically science-speak for "the paper is utter trash".

"The main claim [...] is both somewhat obvious and previously already stated"

"Many pieces of writing are overly assertive and inaccurate."

"I do not think it deserves spending half a page demonstrating that {0^n 1^n} is not in the regular language."

axiom92 · 2 years ago

Welcome to one of the most hated parts of the academia.

axiom92 commented on Sam Altman: if I start going off, board should go after me for my shares twitter.com/sama/status/1... · Posted by u/crhulls

axiom92 · 2 years ago

The joke is that he doesn't own any OpenAI shares.

[1] https://www.cnbc.com/2023/03/24/openai-ceo-sam-altman-didnt-...

axiom92 commented on Fuyu-8B: A multimodal architecture for AI agents adept.ai/blog/fuyu-8b... · Posted by u/averylamp

GaggiX · 2 years ago

>This is by far the best open source vlm model

LLaVA 1.5 is very good, at least at describing images. http://llava.hliu.cc/

axiom92 · 2 years ago

Right, but no separate image encoder + half the size could be very helpful for many applications.

axiom92 commented on Against LLM Maximalism explosion.ai/blog/against... · Posted by u/pmoriarty

phillipcarter · 2 years ago

So I think this is an excellent post. Indeed, LLM maximalism is pretty dumb. They're awesome at specific things and mediocre at others. In particular, I get the most frustrated when I see people try to use them for tasks that need deterministic outputs and the thing you need to create is already known statically. My hope is that it's just people being super excited by the tech.

I wanted to call this out, though, as it makes the case that to improve any component (and really make it production-worthy), you need an evaluation system:

> Intrinsic evaluation is like a unit test, while extrinsic evaluation is like an integration test. You do need both. It’s very common to start building an evaluation set, and find that your ideas about how you expect the component to behave are much vaguer than you realized. You need a clear specification of the component to improve it, and to improve the system as a whole. Otherwise, you’ll end up in a local maximum: changes to one component will seem to make sense in themselves, but you’ll see worse results overall, because the previous behavior was compensating for problems elsewhere. Systems like that are very difficult to improve.

I think this makes sense from the perspective of a team with deeper ML expertise.

What it doesn't mention is that this is an enormous effort, made even larger when you don't have existing ML expertise. I've been finding this one out the hard way.

I've found that if you have "hard criteria" to evaluate (i.e., getting the LLM to produce a given structure rather than an open-ended output for a chat app) you can quantify improvements using Observability tools (SLOs!) and iterating in production. Ship changes daily, track versions of what you're doing, and keep on top of behavior over a period of time. It's arguably a lot less "clean" but it's way faster, and because it's working on the real-world usage data, it's really effective. An ML engineer might call that some form of "online test" but I don't think it really applies.

At any rate, there are other use cases where you really do need evaluations, though. The more important correct output is, the more it's worth investing in evals. I would argue that if bad outputs have high consequences, then maybe LLMs also aren't the right tech for the job, but that'll probably change in a few years. And hopefully making evaluations will be easier too.

axiom92 · 2 years ago

> tasks that need deterministic outputs and the thing you need to create is already known statically

Wow, interesting. Do you have any example for this?

I've realized that LLMs are fairly good at string processing tasks that a really complex regex might also do, so I can see the point in those.

axiom92 commented on How Is LLaMa.cpp Possible? finbarr.ca/how-is-llama-c... · Posted by u/birriel

redox99 · 2 years ago

Because 175B parameters (350GB for the weights FP16, let's say a bit over 400GB for actual inference), fit very comfortably on 8xA100 (640GB VRAM total).

And basically all servers will have 8xA100 (maybe 4xA100). Nobody bothers with a single A100 (of course in a VM you might have access to only one)

axiom92 · 2 years ago

> And basically all servers will have 8xA100

for those wondering: no this is not the norm. My lab at CMU doesn't own any A100s (we have A6000s).

axiom92 commented on Never waste a midlife crisis austinkleon.com/2023/07/1... · Posted by u/herbertl

axiom92 · 2 years ago

> The options seemed to be: If I went for it, I’d be penniless, and if I didn’t go for it, I’d be bitter. I’d be bitter going forward. Penniless certainly beats bitter. So I made the decision.

Kind of like industry -> PhD decision.

axiom92 commented on AI agents that “self-reflect” perform better in changing environments hai.stanford.edu/news/ai-... · Posted by u/chdoyle

axiom92 · 2 years ago

Some of our recent/relevant work: https://selfrefine.info/