Readit News logoReadit News
faraaz98 commented on The Llama 4 herd   ai.meta.com/blog/llama-4-... · Posted by u/georgehill
vessenes · 5 months ago
This was an idea that sounded somewhat silly until it was shown it worked. The idea is that you encourage through training a bunch of “experts” to diversify and “get good” at different things. These experts are say 1/10 to 1/100 of your model size if it were a dense model. So you pack them all up into one model, and you add a layer or a few layers that have the job of picking which small expert model is best for your given token input, route it to that small expert, and voila — you’ve turned a full run through the dense parameters into a quick run through a router and then a 1/10 as long run through a little model. How do you get a “picker” that’s good? Well, it’s differentiable, and all we have in ML is a hammer — so, just do gradient descent on the decider while training the experts!

This generally works well, although there are lots and lots of caveats. But it is (mostly) a free lunch, or at least a discounted lunch. I haven’t seen a ton of analysis on what different experts end up doing, but I believe it’s widely agreed that they tend to specialize. Those specializations (especially if you have a small number of experts) may be pretty esoteric / dense in their own right.

Anthropic’s interpretability team would be the ones to give a really high quality look, but I don’t think any of Anthropic’s current models are MoE.

Anecdotally, I feel MoE models sometimes exhibit slightly less “deep” thinking, but I might just be biased towards more weights. And they are undeniably faster and better per second of clock time, GPU time, memory or bandwidth usage — on all of these - than dense models with similar training regimes.

faraaz98 · 5 months ago
I've been calling for this approach for a while. It's kinda similar to how the human brain has areas that are good at specific tasks
faraaz98 commented on There's too much content, so I built an AI knowledge assistant   faraazahmad.github.io/blo... · Posted by u/faraaz98
_aavaa_ · 5 months ago
An AI assistant to generate abstracts for research papers, and to summarize videos.

While a cool technical project, I think the author falls for the mistaken belief that “if only I approached this in a smarter fashion I could do everything before I die”.

If you have 600 videos queued up, perhaps you need to start pruning them rather than trying to consume more content.

There is no end state here, just a treadmill of ever increasing videos and articles. If you can now get through 600 videos give it a few weeks and it’ll be 6,000 videos, and you’ll be back where you started: information overload.

faraaz98 · 5 months ago
Author here, You're right. That IS my goal though, to get rid of stuff I don't want quicker.

> Then I can take one glance at the summary and skim through the video in my first watch. If I find it interesting, I would watch it completely, else I discard it from my list.

faraaz98 commented on Alphabet spins out Taara – Internet over lasers   x.company/blog/posts/taar... · Posted by u/tadeegan
bsimpson · 6 months ago
Can you imagine if someone just turned off the sky?
faraaz98 · 6 months ago
Who are you? Cixin Liu?
faraaz98 commented on Tiny JITs for a Faster FFI   railsatscale.com/2025-02-... · Posted by u/hahahacorn
jimmaswell · 7 months ago
My impression is that a Rails app is an unmaintainable dynamically-typed ball of mud that might give you the fast upfront development to get to a market or get funded but will quickly fall apart at scale, e.g. Twitter fail whale. And Ruby is too full of "magic" that quickly makes it too hard to tell what's going on or accidentally make something grossly inefficient if you don't understand the magic, which defeats the point of the convenience. Is this perception outdated, and if so what changed?
faraaz98 · 7 months ago
Twitter fail whale was more skill issue that Rails shortcomings. If you read the book Hatching Twitter, you'll know quickly they weren't great at code
faraaz98 commented on Misty: A secure distributed actor language   mistysystem.com/... · Posted by u/m90
faraaz98 · 8 months ago
Tldr for erlang users?
faraaz98 commented on Premature Graying of Hair: Review with Updates   pmc.ncbi.nlm.nih.gov/arti... · Posted by u/luu
faraaz98 · 10 months ago
It can also be caused by autoimmune thyroidism
faraaz98 commented on Building a browser using Servo as a web engine   servo.org/blog/2024/09/11... · Posted by u/TangerineDream
Narishma · a year ago
If they put the URL in the title bar, where does the page title go?
faraaz98 · a year ago
On the URL bar. The URL shows when you click on it

u/faraaz98

KarmaCake day163January 1, 2020View Original