Readit News logoReadit News
qumpis commented on Beyond Diffusion: Inductive Moment Matching   lumalabs.ai/news/inductiv... · Posted by u/outrun86
bearobear · 10 months ago
Last author here (I also did the DDIM paper, https://arxiv.org/abs/2010.02502). I know this is going to be very tricky math-wise (and in the paper we just wrote the most general thing to make reviewers happy), so I tried to explain the idea more easily under the blog post (https://lumalabs.ai/news/inductive-moment-matching).

If you look at how a single step of the DDIM sampler interacts with the target timestep, it is actually just a linear function. This is obviously quite inflexible if we want to use it to represent a flexible function where we can choose any target timestep. So just add this as an argument to the neural network and then train it with a moment matching objective.

In general, I feel that analyzing a method's inference-time properties before training it can be helpful to not only diffusion models, but also LLMs including various recent diffusion LLMs, which prompted me to write a position paper in the hopes that others develop cool new ideas (https://arxiv.org/abs/2503.07154).

qumpis · 10 months ago
What happens if we don't add any moments matching objective? e.g. at train time just fit a diffusion model that predicts the target given any pair of timesteps (t, t')? Why is moment matching critical here?

Also regarding linearity, why is it inflexible? It seems quite convenient that a simple linear interpolation is used for reconstruction, besides, even in DDIM, the directions towards the final target changes at each step as the images become less noisy. In standard diffusion models or even flow matching, denoising is always equal to the prediction of the original data + direction from current timestep to the timestep t'. Just to be clear, it is intuitive that such models are inferior in few-step generations since they don't optimise for test time efficiency (in terms of the tradeoff of quality vs compute), but it's unclear what inflexibility exists there beyond this limitation.

Clearly there's no expected benefit in quality if all timesteps are used in denoising?

qumpis commented on I always knew I was different, I didn't know I was a sociopath   wsj.com/health/wellness/i... · Posted by u/erehweb
qumpis · 2 years ago
Fascinating article. I wonder what made her do the steps in finding what's "different" with her, and most of all, why the need to fix it arose. Is it to "fit in", understandably? It somehow felt alien to me, and this shows my ignorance on the topic, that people lacking in empathy department would attempt to understand the reasons and act "good" towards others even if this feeling is only understood intellectually.
qumpis commented on Stable Diffusion 3   stability.ai/news/stable-... · Posted by u/reqo
feoren · 2 years ago
It kinda makes sense, doesn't it? What are the largest convolutions you've heard of -- 11 x 11 pixels? Not much more than that, surely? So how much can one part of the image influence another part 1000 pixels away? But I am not an expert in any of this, so an expert's opinion would be welcome.
qumpis · 2 years ago
Yes it makes sense a bit. Many popular convents operate on 3x3 kernels. But the number of channel increases per layer. This, coupled with the fact that the receptive field increases per layer and allows convnets to essentially see the whole image relatively early in model's depth (esp. coupled with pooling operations which increase the receptive field rapidly), makes this intuition questionable. Transformers on the other hand, operate on attention which allows them to weight each patch dynamically, but it's clear to me that this allows them to attend to all parts of the image in a way different from convnets.
qumpis commented on Stable Diffusion 3   stability.ai/news/stable-... · Posted by u/reqo
ttul · 2 years ago
It’s the transformer making the difference. Original stable diffusion uses convolutions, which are bad at capturing long range spatial dependencies. The diffusion transformer chops the image into patches, mixes them with a positional embedding, and then just passes that through multiple transformer layers as in an LLM. At the end, the model unpatchify’s (yes, that term is in the source code) the patched tokens to generate output as a 2D image again.

The transformer layers perform self-attention between all pairs of patches, allowing the model to build a rich understanding of the relationships between areas of an image. These relationships extend into the dimensions of the conditioning prompts, which is why you can say “put a red cube over there” and it actually is able to do that.

I suspect that the smaller model versions will do a great job of generating imagery, but may not follow the prompt as closely, but that’s just a hunch.

qumpis · 2 years ago
Convolutions are bad at long range spatial dependencies? What makes you say that - any chance you have a reference?
qumpis commented on A* tricks for videogame path finding   timmastny.com/blog/a-star... · Posted by u/azhenley
markisus · 2 years ago
It would be interesting to see these classical methods compared to a simple RL method that optimizes a tiny neural net or decision tree which is given access to the map, player, and monster positions. I think it would cost a tiny bit more compute but have fewer annoying edge cases.
qumpis · 2 years ago
I haven't seen RL with decision trees! it sounds really interesting. Any classic results worth looking into?
qumpis commented on Reindeer sleep and eat simultaneously   smithsonianmag.com/scienc... · Posted by u/gmays
ModernMech · 2 years ago
The logic was that I could have 36 hours uninterrupted, and then sleep for like 12, averaging 6 per 24 hours, which seemed reasonable.

It’s a recipe for burnout, but I did get a lot done. A good protocol for very tight important deadlines, but it took its toll.

qumpis · 2 years ago
Toll noticable only back in those days or even in the future?
qumpis commented on Reindeer sleep and eat simultaneously   smithsonianmag.com/scienc... · Posted by u/gmays
ModernMech · 2 years ago
In grad school I developed a sleep every other day lifestyle. Would not recommend.
qumpis · 2 years ago
Was it more productive, objectively speaking, than being consistent and not dealing with the constant drainage due to severely lacking sleep?
qumpis commented on Augmenting long-term memory (2018)   augmentingcognition.com/l... · Posted by u/MovingTheLimit
marviel · 2 years ago
For the past few months, I've spent most of my nights and weekends working on an LLM-based learning management system called Reasonote, which aims to hybridize the SRS strategies of Anki with the smoothness of Duolingo. It focuses on a curiosity driven approach, where users can aggregate "Skills" into their library as they participate in Activities. Activities can be anything -- flashcards, quizzes, games, chatrooms -- and can be dynamically generated, or manually created if you prefer.

I'll be releasing more information soon, but since this article's content seems aligned with my mission (build the Young Lady's Illustrated Primer, or something better), I wanted to give it a mention.

If you'd like to beta test, or collaborate, please send me a note directly, explaining what skills you most want to learn -- luke (at) lukebechtel.com

qumpis · 2 years ago
Can you give some example usecases of your application? I wonder how it scales to complex (in terms of structure) information processing, e.g. digesting scientific topics
qumpis commented on What’s behind the Freud resurgence?   chronicle.com/article/the... · Posted by u/pepys
jdietrich · 2 years ago
From an empirical perspective, the theories of psychotherapy are profoundly uninteresting. Debate has raged for decades about the dodo bird verdict - the hypothesis that all psychotherapies have equivalent outcomes - but the data has rendered that debate moot.

Psychotherapy is effective, but only marginally better than placebo. The differences in efficacy between various modes of psychotherapy are statistically insignificant and undoubtedly clinically insignificant. People receiving psychoanalysis improve at basically the same rate as people receiving CBT or IPT or ACT or a raft of other interventions that loosely resemble psychotherapy. In the most basic sense, it doesn't matter whether Freud was a genius or a fraud; entertaining his theories, even if only to criticise them, is a distraction at best.

The bottom line is that people tend to feel a bit better when they talk about their problems with someone who is attentive, supportive and non-judgemental. That's a valuable insight, but it's inherently limited and it's never going to yield the kind of treatments that we want and need.

https://onlinelibrary.wiley.com/doi/10.1002/wps.20941

qumpis · 2 years ago
So therapy is not effective as per the paper. This seems like an astounding conclusion. Has anyone read the paper in detail and has a deeper opinion?
qumpis commented on Mamba: Linear-Time Sequence Modeling with Selective State Spaces   arxiv.org/abs/2312.00752... · Posted by u/anigbrowl
ttul · 2 years ago
This paper introduces a new class of models called selective state space models (S6 or selective SSMs). The key ideas and results are:

1. SSMs are a type of recurrent model that can scale linearly with sequence length, making them more efficient than Transformers. However, prior SSMs struggled with discrete sequence modeling tasks like language.

2. This paper augments SSMs with a "selection mechanism" that allows the model dynamics to depend on the input, giving it the ability to selectively remember or forget information. This makes SSMs effective on tasks requiring discrete reasoning.

3. They design an efficient parallel scan algorithm to implement selective SSMs on GPUs. Despite recurrency, this achieves up to 5x higher throughput over Transformers in benchmarks.

4. They simplify prior SSM architectures into a new model called Mamba. On language modeling, Mamba matches or exceeds Transformers of 2-3x its size, while retaining linear scaling. It also achieves state-of-the-art results on audio, genomics, and synthetic tasks requiring long-term reasoning.

This work makes SSMs truly competitive with Transformers through selectivity and efficient engineering. Mamba matches or beats Transformers on major modalities while being substantially more efficient in computation, memory, and scaling to long sequences. If replicated, it's arguably the first linear-time architecture with Transformer-quality performance!!

Can't wait to see the code!

qumpis · 2 years ago
How does this differ from RNNs and their gating mechanism?

u/qumpis

KarmaCake day228August 27, 2021View Original