faraaz98 (u/faraaz98)

faraaz98 commented on The Llama 4 herd ai.meta.com/blog/llama-4-... · Posted by u/georgehill

vessenes · 5 months ago

This was an idea that sounded somewhat silly until it was shown it worked. The idea is that you encourage through training a bunch of “experts” to diversify and “get good” at different things. These experts are say 1/10 to 1/100 of your model size if it were a dense model. So you pack them all up into one model, and you add a layer or a few layers that have the job of picking which small expert model is best for your given token input, route it to that small expert, and voila — you’ve turned a full run through the dense parameters into a quick run through a router and then a 1/10 as long run through a little model. How do you get a “picker” that’s good? Well, it’s differentiable, and all we have in ML is a hammer — so, just do gradient descent on the decider while training the experts!

This generally works well, although there are lots and lots of caveats. But it is (mostly) a free lunch, or at least a discounted lunch. I haven’t seen a ton of analysis on what different experts end up doing, but I believe it’s widely agreed that they tend to specialize. Those specializations (especially if you have a small number of experts) may be pretty esoteric / dense in their own right.

Anthropic’s interpretability team would be the ones to give a really high quality look, but I don’t think any of Anthropic’s current models are MoE.

Anecdotally, I feel MoE models sometimes exhibit slightly less “deep” thinking, but I might just be biased towards more weights. And they are undeniably faster and better per second of clock time, GPU time, memory or bandwidth usage — on all of these - than dense models with similar training regimes.

faraaz98 · 5 months ago

I've been calling for this approach for a while. It's kinda similar to how the human brain has areas that are good at specific tasks

faraaz98 commented on There's too much content, so I built an AI knowledge assistant faraazahmad.github.io/blo... · Posted by u/faraaz98

_aavaa_ · 5 months ago

An AI assistant to generate abstracts for research papers, and to summarize videos.

While a cool technical project, I think the author falls for the mistaken belief that “if only I approached this in a smarter fashion I could do everything before I die”.

If you have 600 videos queued up, perhaps you need to start pruning them rather than trying to consume more content.

There is no end state here, just a treadmill of ever increasing videos and articles. If you can now get through 600 videos give it a few weeks and it’ll be 6,000 videos, and you’ll be back where you started: information overload.

faraaz98 · 5 months ago

Author here, You're right. That IS my goal though, to get rid of stuff I don't want quicker.

> Then I can take one glance at the summary and skim through the video in my first watch. If I find it interesting, I would watch it completely, else I discard it from my list.

faraaz98 commented on Alphabet spins out Taara – Internet over lasers x.company/blog/posts/taar... · Posted by u/tadeegan

bsimpson · 6 months ago

Can you imagine if someone just turned off the sky?

faraaz98 · 6 months ago

Who are you? Cixin Liu?

faraaz98 commented on Tiny JITs for a Faster FFI railsatscale.com/2025-02-... · Posted by u/hahahacorn

jimmaswell · 7 months ago

My impression is that a Rails app is an unmaintainable dynamically-typed ball of mud that might give you the fast upfront development to get to a market or get funded but will quickly fall apart at scale, e.g. Twitter fail whale. And Ruby is too full of "magic" that quickly makes it too hard to tell what's going on or accidentally make something grossly inefficient if you don't understand the magic, which defeats the point of the convenience. Is this perception outdated, and if so what changed?

faraaz98 · 7 months ago

Twitter fail whale was more skill issue that Rails shortcomings. If you read the book Hatching Twitter, you'll know quickly they weren't great at code

faraaz98 commented on Misty: A secure distributed actor language mistysystem.com/... · Posted by u/m90

faraaz98 · 8 months ago

Tldr for erlang users?

faraaz98 commented on Premature Graying of Hair: Review with Updates pmc.ncbi.nlm.nih.gov/arti... · Posted by u/luu

faraaz98 · 10 months ago

It can also be caused by autoimmune thyroidism

faraaz98 commented on Building a browser using Servo as a web engine servo.org/blog/2024/09/11... · Posted by u/TangerineDream

Narishma · a year ago

If they put the URL in the title bar, where does the page title go?

faraaz98 · a year ago

On the URL bar. The URL shows when you click on it