gadtfly (u/gadtfly) - Readit News

gadtfly commented on Internet Roadtrip: Vote to steer neal.fun/internet-roadtri... · Posted by u/memalign

HanShotFirst · 4 months ago

The only thing I wish for is that the map showed the whole path taken since the website went live!

gadtfly · 4 months ago

Does it not? Zoom out on the map.

gadtfly commented on Internet Roadtrip: Vote to steer neal.fun/internet-roadtri... · Posted by u/memalign

gadtfly · 4 months ago

Great old Twitch energy.

Badly needs inline chat to be complete.

gadtfly commented on Expanding on what we missed with sycophancy openai.com/index/expandin... · Posted by u/synthwave

gadtfly · 4 months ago

https://nitter.net/alth0u/status/1917021100900516239

gadtfly commented on Transfer between Modalities with MetaQueries arxiv.org/abs/2504.06256... · Posted by u/Xiaozaa

Xiaozaa · 4 months ago

Just read the latest paper of MetaQueries. Have some thoughts and list here. Building AI that gets images and creates them in one go (unified models) is the dream. But reality bites: current approaches often mean complex training, weird performance trade-offs (better generation kills understanding?), and clunky control. Just look at the hoops papers like ILLUME+, Harmon, MetaQuery, and Emu3 jump through.

So, what's next? Maybe the answer isn't one giant model trained from scratch (looking at you, Emu3/Chameleon style). The trend, hinted by stuff like GPT-4o and proven by MetaQuery, looks modular.

Prediction 1: Modularity Wins. Forget monolithic monsters. The smart play seems to be connecting the best pre-trained parts:

Grab a top-tier MLLM (like Qwen, Llama-VL) as the "brain." It already understands vision and language incredibly well.

Plug it into a SOTA generator (diffusion like Stable Diffusion/Sana, or a killer visual tokenizer/decoder if you prefer LLM-native generation) as the "hand." MetaQuery showed this works shockingly well even keeping the MLLM frozen. Way cheaper and faster than training from zero.

Prediction 2: Pre-trained Everything. Why reinvent the wheel? Leverage existing SOTA MLLMs and generators. The real work shifts from building the core components to connecting them efficiently. Expect more focus on clever adapters, connectors, and interfaces (MetaQuery's core idea, ILLUME+'s adapters). This lowers the bar and speeds up innovation.

Prediction 3: Generation Heads Don't Matter (as much). Understanding Does. LLM Head (predicting visual tokens like Emu3/ILLUME+) vs. Diffusion Head (driving diffusion like MetaQuery/ILLUME+ option)? This might become a flexible choice based on speed/quality needs, not a fundamental religious war. ILLUME+'s optional diffusion decoder hints at this.

The real bottleneck isn't the pixel renderer, it's the quality of the control signal. This is where the MLLM brain shines. Diffusion models are amazing renderers but dumb reasoners. A powerful MLLM can:

Understand complex, nuanced instructions.

Inject world knowledge and common sense (MetaQuery proved this: frozen MLLM guided diffusion to draw things needing reasoning).

Potentially output weighted or prioritized control signals (inspired by how fixing attention maps, like in Leffa, boosts detail control – the MLLM could provide that high-level guidance).

The Payoff: Understanding-Driven Control. This modular, understanding-first approach unlocks:

Truly fine-grained editing.

Generation based on knowledge and reasoning, not just text matching.

Complex instruction following for advanced tasks (subject locking, style mixing, etc.).

Hurdles: Still need better/faster interfaces, good control-focused training data (MetaQuery's mining idea is key), better evals than FID/CLIP, and faster inference.

TL;DR: Future text-to-image looks modular. Use the best pre-trained MLLM brain, connect it smartly to the best generator hand (diffusion or token-based). Let deep understanding drive precise creation. Less focus on one model to rule them all, more on intelligent integration.

gadtfly · 4 months ago

> The trend, hinted by stuff like GPT-4o and proven by MetaQuery, looks modular.

Could you make this more explicit? What modularity is hinted at by 4o? The OpenAI blog post you cite (and anything else I've casually heard about it) seems to only imply the opposite.

gadtfly commented on Nix Derivations, Without Guessing bernsteinbear.com/blog/ni... · Posted by u/surprisetalk

tikhonj · 5 months ago

For context, this particular article is a cool deep dive into how Nix works, but it doesn't represent what using Nix + Nixpkgs is like in practice. I've been using Nix personally and professionally for almost 10 years now (yikes has time passed quickly!) and I have never needed to operate at the level of derivations like this.

gadtfly · 5 months ago

Completely unrelated to anything I'm just taking this as an opportunity to yell this into the void while nix is on topic:

I have a theory that a problem for Nix understanding and adoption out of all apparent proportion is its use of ; in a way that is just subtly, right in the uncanny valley, different from what ; means in any other language.

In the default autogenerated file everyone is given to start with, it immediately hits you with:

    environment.systemPackages = with pkgs; [ foo ];

How is that supposed to read as a single expression in a pure functional language?

gadtfly commented on An analysis of DeepSeek's R1-Zero and R1 arcprize.org/blog/r1-zero... · Posted by u/meetpateltech

anothermathbozo · 7 months ago

The claim is that this removes the human bottleneck (aka SFT or supervised fine tuning) on domains with a verifiable reward. Critically, this verifiable reward is extremely hard to pin down in nearly all domains besides mathematics and computer science.

gadtfly · 7 months ago

Reasoning transfers across domains.

gadtfly commented on On DeepSeek and export controls darioamodei.com/on-deepse... · Posted by u/jrmyphlmn

gadtfly · 7 months ago

^ This is publicly new information, and the 2nd part especially contradicts consequential rumours that had been all-but-cemented in closely-following outsiders' understanding of Sonnet and Anthropic. Completely aside from anything else in this article.

gadtfly · 7 months ago

Also, though it's not "new information": "Making AI that is smarter than almost all humans at almost all things [...] is most likely to happen in 2026-2027." continues to sail over everybody's head, not a single comment about it, even to shit on it. People will continue to act as though they are literally blind to this, as though they literally don't see it.

gadtfly commented on On DeepSeek and export controls darioamodei.com/on-deepse... · Posted by u/jrmyphlmn

epoch_100 · 7 months ago

> DeepSeek does not "do for $6M what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train (I won't give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors).

Wow!

gadtfly · 7 months ago

^ This is publicly new information, and the 2nd part especially contradicts consequential rumours that had been all-but-cemented in closely-following outsiders' understanding of Sonnet and Anthropic. Completely aside from anything else in this article.

gadtfly commented on Why Canada Should Join the EU economist.com/europe/2025... · Posted by u/gpi

gadtfly · 8 months ago

Breaking innovative new ground on how to make things even worse.