While a cool technical project, I think the author falls for the mistaken belief that “if only I approached this in a smarter fashion I could do everything before I die”.
If you have 600 videos queued up, perhaps you need to start pruning them rather than trying to consume more content.
There is no end state here, just a treadmill of ever increasing videos and articles. If you can now get through 600 videos give it a few weeks and it’ll be 6,000 videos, and you’ll be back where you started: information overload.
> Then I can take one glance at the summary and skim through the video in my first watch. If I find it interesting, I would watch it completely, else I discard it from my list.
This generally works well, although there are lots and lots of caveats. But it is (mostly) a free lunch, or at least a discounted lunch. I haven’t seen a ton of analysis on what different experts end up doing, but I believe it’s widely agreed that they tend to specialize. Those specializations (especially if you have a small number of experts) may be pretty esoteric / dense in their own right.
Anthropic’s interpretability team would be the ones to give a really high quality look, but I don’t think any of Anthropic’s current models are MoE.
Anecdotally, I feel MoE models sometimes exhibit slightly less “deep” thinking, but I might just be biased towards more weights. And they are undeniably faster and better per second of clock time, GPU time, memory or bandwidth usage — on all of these - than dense models with similar training regimes.