I think that more than physics the bottleneck for this is political (at least in the US). All of the local large projects around me are expensive because of massive amounts of red tape (environmental studies, zoning, planning), and political patronage systems. After the kick backs, political donations, promises to only work 8 hours a day, only use union labor, hire x police officers for y hours in overtime security positions a month, use xyz contractor etc. a small cost seems to be the actual labor and materials. Hell these robots if they work will be made illlegal.
I believe SchemeFlow [0] is working on solving some of these problem, particularly with the insane reporting requirements. But of course, that still leaves the unions...
For argument's sake, suppose we live in a world where many high-quality models can be run on-device. Is there any concern from companies/model developers about exposing their proprietary weights to the end user? It's generally not difficult to intercept traffic (weights) sent to and app, or just reverse the app itself.
Can someone familiar with performance of LLMs please tell me how important this is to the overall perf? I'm interested in looking into optimizing tokenizers, and have not yet run the measurements. I would have assumed that the cost is generally dominated by matmuls but am encouraged by the reception of this post in the comments.
To echo the other replies, the tokenizer is definitely not the bottleneck. It just happens to be the first step in inference, so it's what I did first.
Cool. Would it be possible to eliminate that little vocab format conversion requirement for the vocab I see in the test against tiktoken? It would be nice to have a fully compatible drop in replacement without having to think about details. It also would be nice to have examples that work the other way around: initialize tiktoken as you normally would, including any specialized extension of standard tokenizers, and then use that initialized tokenizer to initialize a new tokendagger and test identity of results.
[0] https://www.schemeflow.com/