They have been releasing a lot of really good models over the last ~6 months. Their previous (1.0?) Yi-34B-Chat model ranks similar to GPT-3.5 on Chatbot Arena. [1] A quantized version of that model can be run on a single consumer video card like the RTX 4090.
This new set of models should raise the bar again by adding more options to the open source LLM ecosystem. If you inspect the config.json[2] in the model repo on HuggingFace, you can see that the model architecture is LlamaForCausalLM (the same as Meta's Lllama). The difference between the Yi models and a simple fine-tuning is that Yi models have had a different set of data, configuration, and process going back to the pre-training stage.
Their models perform well in Chinese and in English.
There are a lot of good models coming out of China, some of which are only published to ModelScope. I haven't spent much time on ModelScope because I don't have a Chinese mobile number to use to create an account. Fortunately, Yi publish to HuggingFace as well.
Tiananmen Square is a large public square in the heart of Beijing, China. It is named after the Tiananmen (literally "Gate of Heavenly Peace"), which is the gate located to the north of the square and was the main entrance to the Imperial City during the Ming and Qing dynasties.
The square is famously known for two significant events in Chinese history:
The May Fourth Movement (1919):
This was a student-led movement that began on May 4, 1919, in response to the Chinese government's handling of the Treaty of Versailles, which granted German concessions in Shandong to Japan rather than returning them to China. The movement sparked a wave of anti-imperialist sentiment and calls for cultural and political reform, and it is considered a key event leading to the birth of Chinese nationalism and the development of the Chinese Communist Party.
The Tiananmen Square Massacre (1989):
This is the event for which Tiananmen Square is most internationally renowned. In the spring of 1989, there were student-led demonstrations in Beijing and other cities across China, calling for political reform and an end to corruption. The protests were initially peaceful, but on June 3 and 4, the Chinese government responded with a military crackdown, opening fire on the protesters and bystanders, killing and injuring thousands. The exact death toll is unknown, but it is estimated to be in the hundreds or even thousands.
The Tiananmen Square Massacre is a significant event in modern Chinese history and has had a profound impact on the country's politics and society. It remains a sensitive topic in China, with government censorship and restrictions on discussing the event.
Remember when they claimed Yi had 200k context length despite it having 16k of usable context?
I remember, because I spent non-trivial effort trying to make it work for long-form technical summarization. My lackluster findings were validated by RULER.
I had good results with the previous Yi-34b and its fine tunes like Nous-Capybara-34B. Will be interesting to see what Chatbot Arena thinks but my expectations are high.
Looking at tokens they were trained on is also a really great indicator of world understanding. Llama 3 is a game changer for some usecases because there's finally a model that understands the world deeply as opposed to typical models which can be fine tuned into hyper specific tasks, but generalize poorly, especially in D2C usecases where someone might probe the model's knowledge
No - 16gb of ram is barely enough to run regular applications if you're a power user let alone the most breakthrough computationally heavy workloads ever invented
They have been releasing a lot of really good models over the last ~6 months. Their previous (1.0?) Yi-34B-Chat model ranks similar to GPT-3.5 on Chatbot Arena. [1] A quantized version of that model can be run on a single consumer video card like the RTX 4090.
This new set of models should raise the bar again by adding more options to the open source LLM ecosystem. If you inspect the config.json[2] in the model repo on HuggingFace, you can see that the model architecture is LlamaForCausalLM (the same as Meta's Lllama). The difference between the Yi models and a simple fine-tuning is that Yi models have had a different set of data, configuration, and process going back to the pre-training stage.
Their models perform well in Chinese and in English.
There are a lot of good models coming out of China, some of which are only published to ModelScope. I haven't spent much time on ModelScope because I don't have a Chinese mobile number to use to create an account. Fortunately, Yi publish to HuggingFace as well.
[1] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...
[2] https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/fa695ee438...
What other values and biases have been RLHFed there and for what purpose?
What happened at Tienanmen square?
yi-34b-chat:
Tiananmen Square is a large public square in the heart of Beijing, China. It is named after the Tiananmen (literally "Gate of Heavenly Peace"), which is the gate located to the north of the square and was the main entrance to the Imperial City during the Ming and Qing dynasties.
The square is famously known for two significant events in Chinese history:
The Tiananmen Square Massacre is a significant event in modern Chinese history and has had a profound impact on the country's politics and society. It remains a sensitive topic in China, with government censorship and restrictions on discussing the event.Where can I find LLMs that are uncensored and accessible (without logging in)?
I remember, because I spent non-trivial effort trying to make it work for long-form technical summarization. My lackluster findings were validated by RULER.
https://github.com/hsiehjackson/RULER
This will ban Chinese characters from the sampling process. Works for Yi and Qwen models.
Yi 34b with results similar to Llama 3 70b and Mixtral 8x22b
Yi 6b and 9b with results similar to Llama 3 8b
https://huggingface.co/NousResearch/Nous-Capybara-34B
LLM benchmarks are horribly broken. IMHO there is better signal in just looking at parameter counts.
I'm considering a new laptop later this year and the ram is now fixed to 16GB on most of them.
I plan on digging deep into ML during my coming break from paid work.
I'll keep this in mind!