> 17 days ago
Anywaaay...
I'm literally asking, quite honestly, if this is just an 'after the fact' update literally weeks later, that they uploaded a bunch of models, or if there is something more significant about this I'm missing.
Last time we only released the quantized GGUFs. Only llama.cpp users could use it (+ Ollama, but without vision).
Now, we released the unquantized checkpoints, so anyone can quantize themselves and use in their favorite tools, including Ollama with vision, MLX, LM Studio, etc. MLX folks also found that the model worked decently with 3 bits compared to naive 3-bit, so by releasing the unquantized checkpoints we allow further experimentation and research.
TL;DR. One was a release in a specific format/tool, we followed-up with a full release of artifacts that enable the community to do much more.
This stuff is on my mind all the time eating out or from plastic-impregnated cardboard food packaging lining, etc. I’m worried about reproductive impact on future generations and overall personal health, etc.
- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key. - Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval. - They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu