Readit News logoReadit News
psb217 commented on DeepSeek OCR   github.com/deepseek-ai/De... · Posted by u/pierre
mapleshamrock · 2 months ago
Couldn't you do something like add a bidirectional encoder after your embedding look up table to compress your text into some smaller token-count semantic space before feeding your transformer blocks to get a similar effect, then?
psb217 · 2 months ago
Yes, you can get good compression of a long sequence of "base" text tokens into a shorter sequence of "meta" text tokens, where each meta token represents the information from multiple base tokens. But, grouping a fixed number of base tokens into each meta token isn't ideal, since that won't align neatly with sensible semantic boundaries, like words, phrases, sentences, etc. So, the trick is how decide which base tokens should be grouped into each meta token....

This sort of "dynamic chunking" of low-level information, perhaps down to the level of raw bytes, into shorter sequences of meta tokens for input to some big sequence processing model is an active area of research. Eg, one neat paper exploring this direction is: "Dynamic Chunking for End-to-End Hierarchical Sequence Modeling" [1], from one of the main guys behind Mamba and other major advances in state-space models.

[1] - https://arxiv.org/abs/2507.07955

psb217 commented on DeepSeek OCR   github.com/deepseek-ai/De... · Posted by u/pierre
krackers · 2 months ago
But naively wouldn't you expect the representation of a piece of text in terms of vision tokens to be roughly the same number of bits (or more) than the representation as textual token? You're changing representation sure, but that by itself doesn't give you any compute advantages unless there is some sparsity/compressability you can take advantage of in the domain you transform to right?

So I guess my question is where is the juice being squeezed from, why does the vision token representation end up being more efficient than text tokens.

psb217 · 2 months ago
The trick is that the vision tokens are continuous valued vectors, while the text tokens are elements from a small discrete set (which are converted into continuous valued vectors by a lookup table). So, vision tokens can convey significantly more bits per token than text tokens. This allows them to pack the content of multiple text tokens into a single vision token.
psb217 commented on The maths you need to start understanding LLMs   gilesthomas.com/2025/09/m... · Posted by u/gpjt
libraryofbabel · 3 months ago
Way back when, I did a masters in physics. I learned a lot of math: vectors, a ton of linear algebra, thermodynamics (aka entropy), multi-variable and then tensor calculus.

This all turned out to be mostly irrelevant in my subsequent programming career.

Then LLMs came along and I wanted to learn how they work. Suddenly the physics training is directly useful again! Backprop is one big tensor calculus calculation, minimizing… entropy! Everything is matrix multiplications. Things are actually differentiable, unlike most of the rest of computer science.

It’s fun using this stuff again. All but the tensor calculus on curved spacetime, I haven’t had to reach for that yet.

psb217 · 3 months ago
That past work will pay off even more when you start looking into diffusion and flow-based models for generating images, videos, and sometimes text.

Deleted Comment

Deleted Comment

psb217 commented on Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]   youtube.com/watch?v=_PioN... · Posted by u/sandslash
coldtea · 6 months ago
>there is really only one usable dataset: the world itself, which cannot be compacted or fed into a computer at high speed.

Why wouldn't it be? If the world is ingressed via video sensors and lidar sensor, what's the hangup in recording such input and then replaying it faster?

psb217 · 6 months ago
I think there's an implicit assumption here that interaction with the world is critical for effective learning. In that case, you're bottlenecked by the speed of the world... when learning with a single agent. One neat thing about artificial computational agents, in contrast to natural biological agents, is that they can share the same brain and share lived experience, so the "speed of reality" bottleneck is much less of an issue.
psb217 commented on The Death of the Middle-Class Musician   thewalrus.ca/the-death-of... · Posted by u/pseudolus
Kinrany · 6 months ago
You can of course create wealth in such a way that inequality stays the same. Not all types of wealth are finite for practical purposes.
psb217 · 6 months ago
But, if empirically our current system for net wealth creation tends to also produce wealth concentration, it makes sense to consider ways of modifying the system to mitigate some of the wealth concentration while maintaining as much of the wealth creation as possible.
psb217 commented on Sam Altman says Meta offered OpenAI staffers $100M bonuses   bloomberg.com/news/articl... · Posted by u/EvgeniyZh
namblooc · 6 months ago
I was never involved in doing ML myself, even through my CS studies. However, from the outside it looks... not that complicated? How do they justify these salaries? Where do they see it coming back to them in terms of revenue?
psb217 · 6 months ago
Most of the people pursued in these "AI talent wars" are folks deeply involved in training or developing infrastructure for training LLMs at whatever level is currently state-of-the-art. Due to the resources required for projects that can provide this sort of experience, the pool of folks with this experience is limited to those with significant clout in orgs with money to burn on LLM projects. These people are expensive to hire, and can kind of run through a loop of jumping from company to company in an upward compensation spiral.

Ie, the skills aren't particularly complicated in principle, but the conditions needed to acquire them aren't widely available, so the pool of people with the skills is limited.

psb217 commented on Meta invests $14.3B in Scale AI to kick-start superintelligence lab   nytimes.com/2025/06/12/te... · Posted by u/RyanShook
CamperBob2 · 6 months ago
I don't know about "useful" but this answer from o3-pro was nicely-inspired, I thought: https://chatgpt.com/share/684c805d-ef08-800b-b725-970561aaf5...

I wonder if the comparison is actually original.

psb217 · 6 months ago
Comparing the process of research to tending a garden or raising children is fairly common. This is an iteration on that theme. One thing I find interesting about this analogy is that there's a strong sense of the model's autoregressiveness here in that the model commits early to the gardening analogy and then finds a way to make it work (more or less).

The sorts of useful analogies I was mostly talking about are those that appear in scientific research involving actionable technical details. Eg, diffusion models came about when folks with a background in statistical physics saw some connections between the math for variational autoencoders and the math for non-equilibrium thermodynamics. Guided by this connection, they decided to train models to generate data by learning to invert a diffusion process that gradually transforms complexly structured data into a much simpler distribution -- in this case, a basic multidimensional Gaussian.

I feel like these sorts of technical analogies are harder to stumble on than more common "linguistic" analogies. The latter can be useful tools for thinking, but tend to require some post-hoc interpretation and hand waving before they produce any actionable insight. The former are more direct bridges between domains that allow direct transfer of knowledge about one class of problems to another.

psb217 commented on Meta invests $14.3B in Scale AI to kick-start superintelligence lab   nytimes.com/2025/06/12/te... · Posted by u/RyanShook
zozbot234 · 6 months ago
> It's a high bar, but I think that's fair for declaring superintelligence.

I have to disagree because the distinction between "superficial similarities" and genuinely "useful" analogies is pretty clearly one of degree. Spend enough time and effort asking even a low-intelligence AI about "dumb" similarities, and it'll eventually hit a new and perhaps "useful" analogy simply as a matter of luck. This becomes even easier if you can provide the AI with a lot of "context" input, which is something that models have been improving at. But either way it's not superintelligent or superhuman, just part of the general 'wild' weirdness of AI's as a whole.

psb217 · 6 months ago
I think you misunderstood what I meant about setting a high bar. First, passing the bar is a necessary but not sufficient condition for superintelligence. Secondly, by "fair for" I meant it's fair to set a high bar, not that this particular bar is the one fair bar for measuring intelligence. It's obvious that usefulness of an analogy generator is a matter of degree. Eg, a uniform random string generator is guaranteed to produce all possible insightful analogies, but would not be considered useful or intelligent.

I think you're basically agreeing with me. Ie, current models are not superintelligent. Even though they can "think" super fast, they don't pass a minimum bar of producing novel and useful connections between domains without significant human intervention. And, our evaluation of their abilities is clouded by the way in which their intelligence differs from our own.

u/psb217

KarmaCake day331September 2, 2008View Original