There are new things being tested and yielding results monthly in modelling. We've deviated quite a bit from the original multi head attention.
Deleted Comment
Deleted Comment
Depends what you're doing. In my case I'm saving microseconds on the step time of an LLM used by hundreds of millions of people.
5. 1 != 6
I have doubts about #2. Weren't Big Tech companies paying senior engineers $300K+ - in 2025-adjusted dollars - back in 2013?
If you know any history, #4 is how many new areas of technology go. A couple ordinary guys built the first working airplane in their bicycle shop. Intel was founded with less than $1M, and fabbed its own chips. Vs. what would be the ante, today, to get into either of those industries?
Yes but big tech got bigger. Google had a 4th of its current workforce for instance, Meta a 10th, etc. It got much easier to get into those companies.