Deleted Comment
Deleted Comment
Deleted Comment
We completely rewrite the inference engine and did some tricks. This is a summarization with llama 3.2 1b float16. So most of the times we do much faster than MLX. lmk in comments if you wanna test the inference and I’ll post a link.
I would love to understand how universal these models can become.