glowcoil (u/glowcoil)

glowcoil commented on Modular Manifolds thinkingmachines.ai/blog/... · Posted by u/babelfish

aghilmort · 3 months ago

Our ECC construction induces an emergent modular manifold during KVQ computation.

Suppose we use 3 codeword lanes every codeword which is our default. Each lane of tokens is based on some prime, p, so collectively forms CRT-driven codeword (Chinese Remainder Theorem). This is discretely equivalent to labeling every k tokens with 1x globally unique indexing grammar.

That interleaving also corresponds to a triple of adjacent orthogonal embeddings since those tokens still retain a random gaussian embedding. The net effect is we similarly slice the latent space into spaced chain of modular manifolds within the latent space every k content tokens.

We also refer to that interleaving as Steifel frames for similar reasons as the post reads etc. We began work this spring or so to inject that net construction inside the model with early results in similar direction as post described. That's another way of saying this sort of approach lets us make that chained atlas (wc?) of modular manifolds as tight as possible within dimensional limits of the embedding, floating point precision, etc.

We somewhat tongue-in-cheek refer to this as the retokenization group at the prompt level re: renormalization group / tensor nets / etc. Relayering group is the same net intuition or perhaps reconnection group at architecture level.

glowcoil · 3 months ago

I'm sorry, but even if I am maximally charitable and assume that everything you are saying is meaningful and makes sense, it still has essentially nothing to do with the original article. The original article is about imposing constraints on the weights of a neural network, during training, so that they lie on a particular manifold inside the overall weight space. The "modular" part is about being able to specify these constraints separately for individual layers or modules of a network and then compose them together into a meaningful constraint for the global network.

You are talking about latent space during inference, not weight space during training, and you are talking about interleaving tokens with random Gaussian tokens, not constraining values to lie on a manifold within a larger space. Whether or not the thing you are describing is meaningful or useful, it is basically unrelated to the original article, and you are not using the term "modular manifold" to refer to the same thing.

glowcoil commented on Modular Manifolds thinkingmachines.ai/blog/... · Posted by u/babelfish

aghilmort · 3 months ago

Interesting. Modular manifolds are precisely what hypertokens use for prompt compiling.

Specifically, we linearize the emergent KVQ operations of an arbitrary prompt in any arbitrary model by way of interleaving error-correcting code (ECC).

ECC tokens are out-of-band tokens, e.g., Unicode's Private Use Area (PUA), interleaved with raw context tokens. This construction induces an in-context associate memory.

Any sort of interleaved labeling basis, e.g., A1, quick brown fox, A2, jumped lazy dog, induces a similar effect to for chaining recall & reasoning more reliably.

This trick works because PUA tokens are generally untrained hence their initial embedding is still random Gaussian w.h.p. Similar effects can be achieved by simply using token combos unlikely to exist and are often in practice more effective since PUA tokens like emojis or Mandarin characters are often 2,3, or 4 tokens after tokenization vs. codeword combos like zy-qu-qwerty every k content tokens, where can be variable.

Building attention architecture using modular manifolds in white / gray-box models like this new work shows vs. prompt-based black box injection is a natural next step, and so can at least anecdotally validate what they're building ahead of next paper or two.

Which is all to say, absolutely great to see others building in this way!

glowcoil · 3 months ago

The original article discusses techniques for constraining the weights of a neural network to a submanifold of weight space during training. Your comment discusses interleaving the tokens of an LLM prompt with Unicode PUA code points. These are two almost completely unrelated things, so it is very confusing to me that you are confidently asserting that they are the same thing. Can you please elaborate on why you think there is any connection at all between your comment and the original article?

glowcoil commented on There is no memory safety without thread safety ralfj.de/blog/2025/07/24/... · Posted by u/tavianator

pizlonator · 5 months ago

False.

Java got this right. Fil-C gets it right, too. So, there is memory safety without thread safety. And it’s really not that hard.

Memory safety is a separate property unless your language chooses to gate it on thread safety. Go (and some other languages) have such a gate. Not all memory safe languages have such a gate.

glowcoil · 5 months ago

I would recommend reading beyond the title of a post before leaving replies like this, as your comment is thoroughly addressed in the text of the article:

> At this point you might be wondering, isn’t this a problem in many languages? Doesn’t Java also allow data races? And yes, Java does allow data races, but the Java developers spent a lot of effort to ensure that even programs with data races remain entirely well-defined. They even developed the first industrially deployed concurrency memory model for this purpose, many years before the C++11 memory model. The result of all of this work is that in a concurrent Java program, you might see unexpected outdated values for certain variables, such as a null pointer where you expected the reference to be properly initialized, but you will never be able to actually break the language and dereference an invalid dangling pointer and segfault at address 0x2a. In that sense, all Java programs are thread-safe.

And:

> Java programmers will sometimes use the terms “thread safe” and “memory safe” differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as not having “unintended” data races or not having null pointer exceptions. However, such bugs cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms.

Java is in fact thread-safe in the sense of the term used in the article, unlike Go, so it is not a counterexample to the article's point at all.

glowcoil commented on Implementing a Tiny CPU Rasterizer lisyarus.github.io/blog/p... · Posted by u/atan2

Joker_vD · a year ago

I disagree. This problem exists even when the fog is completely absent and also distorts the objects at the sides of the screen regardless of the fog's presence or absence. I guess you could use fog, rendered in a particular way, to make it less noticeable but it's still there. So the root cause is the perspective projection.

Now, I've googled a bit on my own, trying all kinds of search phraes, and apparently it is a known problem that the perspective projection, when wide (about 75 degrees and up) FOV is used, will distort objects at the side of the screen. One of the solutions appears to be a post-processing pass called "Panini Projection" which undoes that damage at the sides of the screen. From what I understand, it uses cylinder (but not a sphere) as the projection surface instead of a plane.

glowcoil · a year ago

You originally described a problem where fog had a different falloff in world space at the edges of the screen compared to the center of the screen. The root cause of that is not the perspective projection; it's how the fog is being rendered.

The issue you are describing now is called perspective distortion (https://en.wikipedia.org/wiki/Perspective_distortion), and it is something that also happens with physical cameras when using a wide-angle lens. There is no single correct answer for dealing with this: similarly to the situation with map projections, every projection is a compromise between different types of distortion.

Anyway, if you're writing a ray tracer it's possible to use whatever projection you want, but if you're using the rasterizer in the GPU you're stuck with rectilinear projection and any alternate projection has to be approximated some other way (such as via post-processing, like you mention).

glowcoil commented on Implementing a Tiny CPU Rasterizer lisyarus.github.io/blog/p... · Posted by u/atan2

nkrisc · a year ago

But it might more accurately reflect human visual perception. I can’t think of any case where your peripheral vision will perceive things that you won’t perceive by looking directly at them.

glowcoil · a year ago

The only physically accurate answer for where to put the far plane is "behind everything you want to be visible". It fundamentally does not make any sense to change the shape of the far plane to "more accurately reflect human visual perception" because there is no far plane involved in human visual perception, period.

glowcoil commented on Implementing a Tiny CPU Rasterizer lisyarus.github.io/blog/p... · Posted by u/atan2

Joker_vD · a year ago

> Light can travel from any distance to reach your eye.

Yes, but it won't necessarily register so unless we're talking about insanely bright distant object that should be visible through fog from any distance and which also is not a part of the skybox, this particular problem practically never arises. The flat far plane, on the other hand, is glaringly obvious in e.g. Minecraft, or any Unity game with first-person view and fog.

glowcoil · a year ago

You're describing a problem with a particular method of fog rendering. The correct way to address that would be to change how fog is rendered. The perspective projection and the far plane are simply not the correct place to look for a solution to this.

glowcoil commented on A simple proof that pi is irrational [pdf] (1946) ams.org/journals/bull/194... · Posted by u/pipopi

kazinator · a year ago

The problem is that the sin and cos functions the proof relies on are embroiled with pi. It's not obvious whether or not that is a confounding issue or not.

glowcoil · a year ago

It's not. Why would it be?

glowcoil commented on A Review of the Odin Programming Language graphitemaster.github.io/... · Posted by u/gingerBill

naniwaduni · 3 years ago

> The compiler is already doing that when it performs any of the optimizations I mentioned above. When the compiler takes a stack-allocated variable (whose address is never directly taken) and promotes it to a register, removes dead stores to it, or constant-folds it out of existence, it does so under the assumption that the program is not performing aliasing loads and stores to that location on the stack. In other words, it is leaving the behavior of a program that performs such loads and stores undefined, and in doing so it is directly enabling some of the most basic, pervasive optimizations that we expect a compiler to perform.

No, that's C-think. Yes, when you take a stack-allocated variable and do those transformations, you must assume away the possibility that it's there are aliasing accesses to its location on the stack. Thus, those are not safe optimizations for the compiler to perform on a stack-allocated variable.

It's not something you have to do. The model of treating each variable as stack-allocated until proven (potentially fallaciously) otherwise is distinctly C brain damage.

> If that is indeed what you want, then what you want is something closer to a macro assembler than a high-level language with an optimizing compiler like C. It's a valid thing to want, but you can't have your cake and eat it too.

This is a false dichotomy advanced to discredit compilers outside the nothing-must-be-faster-than-C paradigm, and frankly a pretty absurd claim. There are plenty of "high-level" but transparent language constructs that can be implemented without substantially assuming non-aliasing. It's totally possible to lexically isolate raw pointer accesses and optimize around them. There is a history of computing before C! Heck, there are C compilers with "optimization" sets that don't behave as pathologically awfully as mainstream modern compilers do when you turn the "optimizations" off; you have to set a pretty odd bar for "optimizing compiler" to make that look closer to a macro assembler.

It's okay if your compiler can't generate numerical code faster than Fortran. That's not supposed to be the minimum bar for an "optimizing" compiler.

glowcoil · 3 years ago

> The model of treating each variable as stack-allocated until proven (potentially fallaciously) otherwise is distinctly C brain damage.

OK, let's consider block-local variables to have indeterminate storage location unless their address is taken. It doesn't substantively change the situation. Sometimes the compiler will store that variable in a register, sometimes it won't store it anywhere at all (if it gets constant-folded away), and sometimes it will store it on the stack. In the last case, it will generate and optimize code under the assumption that no aliasing loads or stores are being performed at that location on the stack, so we're back where we started.

glowcoil commented on A Review of the Odin Programming Language graphitemaster.github.io/... · Posted by u/gingerBill

hsn915 · 3 years ago

No, the compiler is not "already" doing that. Odin uses the llvm as a backend (for now) and it turns off some of those UB-driven optimimzations (as mentioned in the OP).

Some things are defined by the language, some things are defined by the operating system, some by the hardware.

It would be silly for Odin to say "you can't access a freed pointer" because it would have to presume to know ahead of time how you utilize memory. It does not. In Odin, you are free to create an allocator where the `free` call is a no-op, or it just logs the information somewhere without actually reclaiming the 'freed' memory.

I can't speak for gingerBill but I think one of the reasons to create the language is to break free from the bullying of spec laywers who get in the way of systems programming and suck all the joy out of it.

> it does so under the assumption that the program is not performing aliasing loads and stores to that location on the stack

If you write code that tries to get a pointer to the first variable in the stack, and guess the stack size and read everything in it, Odin does not prevent that, it also (AFAIK) does not prevent the compiler from promoting local variables to registers.

Again, go back to the twitter thread. An explicit example is mentioned:

https://twitter.com/TheGingerBill/status/1496154788194668546

If you reference a variable, the langauge spec guarantees that it wil have an address that you can take, so there's that. But if you use that address to try to get other stack variables indirectly, then the language does not define what happens in a strict sense, but it's not 'undefined' behavior. It's a memory access to a specific address. The behavior depends on how the OS and the Hardware handle that.

The compiler does not get to look at that and say "well this looks like undefined behavior, let me get rid of this line!".

glowcoil · 3 years ago

> If you write code that tries to get a pointer to the first variable in the stack, and guess the stack size and read everything in it, Odin does not prevent that, it also (AFAIK) does not prevent the compiler from promoting local variables to registers.

This is exactly what I described above. Odin does not define the behavior of a program which indirectly pokes at stack memory, and it is thus able to perform optimizations which exploit the fact that that behavior is left undefined.

> The compiler does not get to look at that and say "well this looks like undefined behavior, let me get rid of this line!".

This is a misleading caricature of the relationship between optimizations and undefined behavior. C compilers do not hunt for possible occurrences of undefined behavior so they can gleefully get rid of lines of code. They perform optimizing transformations which are guaranteed to preserve the behavior of valid programs. Some programs are considered invalid (those which execute invalid operations like out-of-bounds array accesses at runtime), and those same optimizing transformations are simply not required to preserve the behavior of such programs. Odin does not work fundamentally differently in this regard.

If you want to get rid of a particular source of undefined behavior entirely, you either have to catch and reject all programs which contain that behavior at compile time, or you have to actually define the behavior (possibly at some runtime cost) so that compiler optimizations can preserve it. The way Odin defines the results of integer overflow and bit shifts larger than the width of the operand is a good example of the latter.

C does have a particularly broad and programmer-hostile set of UB-producing operations, and I applaud Odin both for entirely removing particular sources of UB (integer overflow, bit shifts) and for making it easier to avoid it in general (bounds-checked slices, an optional type). These are absolutely good things. However, I consider it misleading and false to claim that Odin has no UB whatsoever; you can insist on calling it something else, but that doesn't change the practical implications.