Appreciate all the takes so far, the team is reading this thread for feedback. Feel free to pile on with bugs or feature requests we'll all be reading.
In certain areas, perhaps, but Google Workspace at $14/month not only gives you Gemini Pro, but 2 TB of storage, full privacy, email with a custom domain, and whatever else. College students get the AI pro plan for free. I recently looked over all the options for folks like me and my family. Google is obviously the right choice, and it's not particularly close.
[0] https://www.reddit.com/r/GoogleGeminiAI/comments/1jrynhk/war...
"zero-shot accuracy retention at 4- and 3-bit compression to be on par with or better than state-of-the-art methods, while maintaining performance comparable to FP16 baselines."
My reading of that says FP16 accuracy at Q3 or Q4 size / memory bandwidth. Which is a huge advantage.
* LLaMA 3 8B: baseline 72.26, 4-bit 71.31, 3-bit 62.79
* LLaMA 3 70B: baseline 79.51, 4-bit 78.06, 3-bit 74.68
These results seem comparable to modern quantization methods—for example, the ~4-bit results for smaller LLaMA models listed here: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...
Congrats to Apple and Meta, makes sense they did the research, this will go towards efficient serving of LLMs on phones. And it's very easy to implement.
[1] https://www.tomshardware.com/pc-components/cpus/amds-beastly...