Readit News logoReadit News
axiom92 commented on Grok 4 Launch [video]   twitter.com/xai/status/19... · Posted by u/meetpateltech
Mystery-Machine · 2 months ago
Did no one notice that their voice demo was staged and prerecorded with several cuts and several different videos patched?
axiom92 · 2 months ago
The demo was done live (as was everything else).
axiom92 commented on The Dark Forest hypothesis is absurd   noahpinion.blog/p/the-dar... · Posted by u/paulpauper
romanhn · 2 years ago
Just a PSA that 3 Body Problem is coming in March as a Netflix series. I was pleasantly surprised to see a trailer come out recently.
axiom92 · 2 years ago
Looks pretty cool https://www.youtube.com/watch?v=mogSbMD6EcY

Although, it seems it's only going to cover the first book (which makes sense, given how difficult the other two would be to film). The real magic for me was in book 3. It was inspiring to see someone think so far out, so boldly.

axiom92 commented on Turing Complete Transformers: Two Transformers Are More Powerful Than One   openreview.net/forum?id=M... · Posted by u/georgehill
qsort · 2 years ago
These reviews are brutal. It's basically science-speak for "the paper is utter trash".

"The main claim [...] is both somewhat obvious and previously already stated"

"Many pieces of writing are overly assertive and inaccurate."

"I do not think it deserves spending half a page demonstrating that {0^n 1^n} is not in the regular language."

axiom92 · 2 years ago
Welcome to one of the most hated parts of the academia.
axiom92 commented on Fuyu-8B: A multimodal architecture for AI agents   adept.ai/blog/fuyu-8b... · Posted by u/averylamp
GaggiX · 2 years ago
>This is by far the best open source vlm model

LLaVA 1.5 is very good, at least at describing images. http://llava.hliu.cc/

axiom92 · 2 years ago
Right, but no separate image encoder + half the size could be very helpful for many applications.
axiom92 commented on Against LLM Maximalism   explosion.ai/blog/against... · Posted by u/pmoriarty
phillipcarter · 2 years ago
So I think this is an excellent post. Indeed, LLM maximalism is pretty dumb. They're awesome at specific things and mediocre at others. In particular, I get the most frustrated when I see people try to use them for tasks that need deterministic outputs and the thing you need to create is already known statically. My hope is that it's just people being super excited by the tech.

I wanted to call this out, though, as it makes the case that to improve any component (and really make it production-worthy), you need an evaluation system:

> Intrinsic evaluation is like a unit test, while extrinsic evaluation is like an integration test. You do need both. It’s very common to start building an evaluation set, and find that your ideas about how you expect the component to behave are much vaguer than you realized. You need a clear specification of the component to improve it, and to improve the system as a whole. Otherwise, you’ll end up in a local maximum: changes to one component will seem to make sense in themselves, but you’ll see worse results overall, because the previous behavior was compensating for problems elsewhere. Systems like that are very difficult to improve.

I think this makes sense from the perspective of a team with deeper ML expertise.

What it doesn't mention is that this is an enormous effort, made even larger when you don't have existing ML expertise. I've been finding this one out the hard way.

I've found that if you have "hard criteria" to evaluate (i.e., getting the LLM to produce a given structure rather than an open-ended output for a chat app) you can quantify improvements using Observability tools (SLOs!) and iterating in production. Ship changes daily, track versions of what you're doing, and keep on top of behavior over a period of time. It's arguably a lot less "clean" but it's way faster, and because it's working on the real-world usage data, it's really effective. An ML engineer might call that some form of "online test" but I don't think it really applies.

At any rate, there are other use cases where you really do need evaluations, though. The more important correct output is, the more it's worth investing in evals. I would argue that if bad outputs have high consequences, then maybe LLMs also aren't the right tech for the job, but that'll probably change in a few years. And hopefully making evaluations will be easier too.

axiom92 · 2 years ago
> tasks that need deterministic outputs and the thing you need to create is already known statically

Wow, interesting. Do you have any example for this?

I've realized that LLMs are fairly good at string processing tasks that a really complex regex might also do, so I can see the point in those.

axiom92 commented on How Is LLaMa.cpp Possible?   finbarr.ca/how-is-llama-c... · Posted by u/birriel
redox99 · 2 years ago
Because 175B parameters (350GB for the weights FP16, let's say a bit over 400GB for actual inference), fit very comfortably on 8xA100 (640GB VRAM total).

And basically all servers will have 8xA100 (maybe 4xA100). Nobody bothers with a single A100 (of course in a VM you might have access to only one)

axiom92 · 2 years ago
> And basically all servers will have 8xA100

for those wondering: no this is not the norm. My lab at CMU doesn't own any A100s (we have A6000s).

axiom92 commented on Never waste a midlife crisis   austinkleon.com/2023/07/1... · Posted by u/herbertl
axiom92 · 2 years ago
> The options seemed to be: If I went for it, I’d be penniless, and if I didn’t go for it, I’d be bitter. I’d be bitter going forward. Penniless certainly beats bitter. So I made the decision.

Kind of like industry -> PhD decision.

u/axiom92

KarmaCake day541February 8, 2015
About
https://madaan.github.io
View Original