markisus (u/markisus)

markisus commented on iPhone Typos? It's Not Just You – The iOS Keyboard Is Broken [video] youtube.com/watch?v=hksVv... · Posted by u/walterbell

baseballdork · 5 days ago

> I've never noticed the "censorship issue"

Really? If you swipe "kill" and then try "yourself" or "myself" does it ever get it right or provide it as one of the options? Doing it right now myself and I can't get it to do either. I have manually entered those words and hit the "myself" in the suggestion box to try and convince it that that's an acceptable correction to no avail.

> I inevitably do the "wrong thing" and fall victim to the editing again, or tap something wrong, or.. I don't know

Every. Time. I like to think that I'm not an idiot and can generally pattern recognize, but it just feels so inconsistent that I'm always doing the wrong thing.

markisus · 5 days ago

I’ve confirmed this on my iphone as well.

Using swipe, no space bar after kill: Kill maps Jill myself Jill myself

Using swipe, manually pressing space bar after kill: Kill mussels Kill mussels Kill mussels

markisus commented on The universal weight subspace hypothesis arxiv.org/abs/2512.05117... · Posted by u/lukeplato

bigbuppo · 8 days ago

Wouldn't this also mean that there's an inherent limit to that sort of model?

markisus · 7 days ago

On the contrary, I think it demonstrates an inherent limit to the kind of tasks / datasets that human beings care about.

It's known that large neural networks can even memorize random data. The number of random datasets is unfathomably large, and the weight space of neural networks trained on random data would probably not live in a low dimensional subspace.

It's only the interesting-to-human datasets, as far as I know, that drive the neural network weights to a low dimensional subspace.

markisus commented on The universal weight subspace hypothesis arxiv.org/abs/2512.05117... · Posted by u/lukeplato

modeless · 8 days ago

This seems confusingly phrased. When they say things like "500 Vision Transformers", what they mean is 500 finetunes of the same base model, downloaded from the huggingface accounts of anonymous randos. These spaces are only "universal" to a single pretrained base model AFAICT. Is it really that surprising that finetunes would be extremely similar to each other? Especially LoRAs?

I visited one of the models they reference and huggingface says it has malware in it: https://huggingface.co/lucascruz/CheXpert-ViT-U-MultiClass

markisus · 7 days ago

Each fine tune drags the model weights away from the base model in a certain direction.

Given 500 fine tune datasets, we could expect the 500 drag directions to span a 500 dimensional space. After all, 500 random vectors in a high dimensional space are likely to be mutually orthogonal.

The paper shows, however, that the 500 drag directions live in a ~40 dimensional subspace.

Another way to say it is that you can compress fine tune weights into a vector of 40 floats.

Imagine if, one day, fine tunes on huggingface were not measured in gigabytes, megabytes, or even kilobytes. Suppose you started to see listings like 160 bytes. Would that be surprising?

I’m leaving out the detail that the basis direction vectors themselves would have to be on your machine and each basis direction is as big as the model itself. And I’m also taking for granted that the subspace dimension will not increase as the number of fine tune datasets increases.

I agree that the authors decision to use random models on hugging face is unfortunate. I’m hopeful that this paper will inspire follow up works that train large models from scratch.

markisus commented on Spinlocks vs. Mutexes: When to Spin and When to Sleep howtech.substack.com/p/sp... · Posted by u/birdculture

markisus · 9 days ago

Where do lock free algorithms fall in this analysis?

markisus commented on Google Titans architecture, helping AI have long-term memory research.google/blog/tita... · Posted by u/Alifatisk

okdood64 · 9 days ago

Oh yes, I believe that's right. What's some frontier research Meta has shared in the last couple years?

markisus · 9 days ago

Their VGGT, Dinov3, and segment anything models are pretty impressive.

markisus commented on SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds deepmind.google/blog/sima... · Posted by u/meetpateltech

wordpad · a month ago

Why? Physics of large discrete objects (such as a robot) isn't very complicated.

I thought it's fast accurate OCR that's holding everything back.

markisus · a month ago

The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.

Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.

markisus commented on How does gradient descent work? centralflows.github.io/pa... · Posted by u/jxmorris12

DoctorOetker · 2 months ago

Fascinating, do the gained insights allow to directly compute the central flow in order to speed up convergence? Or is this preliminary exploration to understand how it had been working?

They explicitly ignore momentum and exponentially weighted moving average, but that should result in the time-averaged gradient descent (along the valley, not across it). But that requires multiple evaluations, do any of the expressions for the central flow admit fast / computationally efficient central flow calculation?

markisus · 2 months ago

The authors somewhat address your questions in the accompanying paper https://arxiv.org/abs/2410.24206

> We emphasize that the central flow is a theoretical tool for understanding optimizer behavior, not a practical optimization method. In practice, maintaining an exponential moving average of the iterates (e.g., Morales-Brotons et al., 2024) is likely a computational feasible way to estimate the optimizer’s time-averaged trajectory.

They analyze the behavior of RMSProp (Adam without momentum) using their framework to come up with simplified mathematical models that are able to predict actual training behavior in experiments. It looks like their mathematical models explain why RMSProp works, in a way that is more satisfying than the usual hand waving explanations.

markisus commented on How does gradient descent work? centralflows.github.io/pa... · Posted by u/jxmorris12

markisus · 2 months ago

First I thought this would be just another gradient descent tutorial for beginners. But the article goes quite deep into gradient descent dynamics, looking into third order approximations of the loss function and eventually motivating a concept called "central flows." Their central flow model was able to predict loss graphs for various training runs across different neural network architectures.

markisus commented on LoRA Without Regret thinkingmachines.ai/blog/... · Posted by u/grantpitt

markisus · 2 months ago

Can someone explain the bit counting argument in the reinforcement learning part?

I don’t get why a trajectory would provide only one bit of information.

Each step of the trajectory is at least giving information about what state transitions are possible.

An infinitely long trajectory can explore the whole state space if there are no absorbing states. Such a trajectory would provide a massive amount of information about the system, even if we ignored the final reward.