Readit News logoReadit News
tbalsam commented on NanoChat – The best ChatGPT that $100 can buy   github.com/karpathy/nanoc... · Posted by u/huseyinkeles
varunneal · 5 months ago
Muon was invented by Keller Jordan (and then optimized by others) for the sake of this speedrunning competition. Even though it was invented less than a year ago, it has already been widely adopted as SOTA for model training
tbalsam · 5 months ago
This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).

Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.

(Source: am experienced speedrunner who's been in these circles for a decent amount of time)

tbalsam commented on Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it   github.com/triton-lang/tr... · Posted by u/mmastrac
_joel · 5 months ago
It's probably been duck typed
tbalsam · 5 months ago
shocked quack
tbalsam commented on Problem solving using Markov chains (2007) [pdf]   math.uchicago.edu/~shmuel... · Posted by u/Alifatisk
0xDEAFBEAD · 7 months ago
What's the easiest way to reliably check if a Youtube channel was sold to private equity? Is that info always a matter of public record?
tbalsam · 7 months ago
I'm not entirely sure, to be honest. If you look at the linked video, they state that it's oftentimes not in the best interest of the private equity group's moneymaking capabilities to announce that a channel has been sold out to them.

How that is in practice, I'm not sure, and I'm sure with some sleuthing it would be possible to find out at least some of it. But on the whole, I'm honestly not sure beyond that.

tbalsam commented on Problem solving using Markov chains (2007) [pdf]   math.uchicago.edu/~shmuel... · Posted by u/Alifatisk
stronglikedan · 7 months ago
Veritasium is quality content. Those eyes don't hurt nothing either.
tbalsam · 7 months ago
They unfortunately recently (last few years) sold out to private equity (which tends to glaze over fundamentals and tries to pump out massive content using previous brand quality to give it credence), so beware of quality in more recent vids:

https://youtu.be/hJ-rRXWhElI?si=Zdsj9i_raNLnajzi

tbalsam commented on Measuring AI Ability to Complete Long Tasks   spectrum.ieee.org/large-l... · Posted by u/pseudolus
LorenDB · 8 months ago
Why would you benchmark the LLMs for 50% success? I expect 100% success, or nearly so, to make an LLM a practical replacement for s human. 50% success is far too unreliable.

Edit: notice that I said "100%, or nearly so". I realize that 100% is an unrealistic metric for an LLM, but come on, the robots should be at least as competent as the humans they replace, and ideally much more so.

tbalsam · 8 months ago
There are versions of this kind of benchmark with a higher threshold, however, it only seems to adjust the timetables by a linear amount, so you're only buying 1-2 years or so depending on what you want that % success rate to be.
tbalsam commented on Measuring AI Ability to Complete Long Tasks   spectrum.ieee.org/large-l... · Posted by u/pseudolus
tbalsam · 8 months ago
The only limit is yourself

Source: One of the most classic internet websites, zombo.com (sound on)

tbalsam · 8 months ago
tbalsam commented on Measuring AI Ability to Complete Long Tasks   spectrum.ieee.org/large-l... · Posted by u/pseudolus
revskill · 8 months ago
Is there any limit ?
tbalsam · 8 months ago
The only limit is yourself

Source: One of the most classic internet websites, zombo.com (sound on)

Deleted Comment

tbalsam commented on Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken   github.com/M4THYOU/TokenD... · Posted by u/matthewolfe
saretup · 8 months ago
And while we’re at it, let’s move away from Python altogether. In the long run it doesn’t make sense just because it’s the language ML engineers are familiar with.
tbalsam · 8 months ago
No! This is not good.

Iteration speed trumps all in research, most of what Python does is launch GPU operations, if you're having slowdowns from Pythonland then you're doing something terribly wrong.

Python is an excellent (and yes, fast!) language for orchestrating and calling ML stuff. If C++ code is needed, call it as a module.

tbalsam commented on Look Ma, No Bubbles: Designing a Low-Latency Megakernel for Llama-1B   hazyresearch.stanford.edu... · Posted by u/ljosifov
tbalsam · 9 months ago
This is (and was) the dream of Cerebras and I am very glad to see it embraced if even in small part on a GPU. Wild to see how much performance is left on the table for these things, it's crazy to think how much can be done by a few bold individuals when it comes to pushing the SOTA of these kinds of things (not just in kernels either -- in other areas as well!)

My experience has been that getting over the daunting factor of feeling afraid of a big wide world with a lot of noise and marketing and simply committing to a problem, learning it, and slowly bootstrapping it over time, tends to yield phenomenal results in the long run for most applications. And, if not, then there's often an applicable one/side field that can be pivoted to for still making immense/incredible progress.

The big players may have the advantage of scale, but there is so, so much that can be done still if you look around and keep a good feel for it. <3 :)

u/tbalsam

KarmaCake day1006January 14, 2021
About
hi. ;P
View Original