Readit News logoReadit News
gardnr · 2 years ago
Yi is led by Dr. Kai-Fu Lee.

They have been releasing a lot of really good models over the last ~6 months. Their previous (1.0?) Yi-34B-Chat model ranks similar to GPT-3.5 on Chatbot Arena. [1] A quantized version of that model can be run on a single consumer video card like the RTX 4090.

This new set of models should raise the bar again by adding more options to the open source LLM ecosystem. If you inspect the config.json[2] in the model repo on HuggingFace, you can see that the model architecture is LlamaForCausalLM (the same as Meta's Lllama). The difference between the Yi models and a simple fine-tuning is that Yi models have had a different set of data, configuration, and process going back to the pre-training stage.

Their models perform well in Chinese and in English.

There are a lot of good models coming out of China, some of which are only published to ModelScope. I haven't spent much time on ModelScope because I don't have a Chinese mobile number to use to create an account. Fortunately, Yi publish to HuggingFace as well.

[1] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

[2] https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/fa695ee438...

option · 2 years ago
Try asking their "chat" variants about topics sensetive to CCP, like what has happened on Tiananmen square. Same for Baichan models.

What other values and biases have been RLHFed there and for what purpose?

happypumpkin · 2 years ago
USER:

What happened at Tienanmen square?

yi-34b-chat:

Tiananmen Square is a large public square in the heart of Beijing, China. It is named after the Tiananmen (literally "Gate of Heavenly Peace"), which is the gate located to the north of the square and was the main entrance to the Imperial City during the Ming and Qing dynasties.

The square is famously known for two significant events in Chinese history:

    The May Fourth Movement (1919):
    This was a student-led movement that began on May 4, 1919, in response to the Chinese government's handling of the Treaty of Versailles, which granted German concessions in Shandong to Japan rather than returning them to China. The movement sparked a wave of anti-imperialist sentiment and calls for cultural and political reform, and it is considered a key event leading to the birth of Chinese nationalism and the development of the Chinese Communist Party.

    The Tiananmen Square Massacre (1989):
    This is the event for which Tiananmen Square is most internationally renowned. In the spring of 1989, there were student-led demonstrations in Beijing and other cities across China, calling for political reform and an end to corruption. The protests were initially peaceful, but on June 3 and 4, the Chinese government responded with a military crackdown, opening fire on the protesters and bystanders, killing and injuring thousands. The exact death toll is unknown, but it is estimated to be in the hundreds or even thousands.
The Tiananmen Square Massacre is a significant event in modern Chinese history and has had a profound impact on the country's politics and society. It remains a sensitive topic in China, with government censorship and restrictions on discussing the event.

polygamous_bat · 2 years ago
This is an interesting question. Is there a “controversy-benchmark” perhaps, to measure this?
hackerlight · 2 years ago
If you ask DeepSeek chat, your question literally disappears from the UI a second after you hit the return key.
ekianjo · 2 years ago
the American models are similarly censored for specific topics...
HKH2 · 2 years ago
Of course, otherwise it wouldn't be allowed in China.

Where can I find LLMs that are uncensored and accessible (without logging in)?

ukuina · 2 years ago
Remember when they claimed Yi had 200k context length despite it having 16k of usable context?

I remember, because I spent non-trivial effort trying to make it work for long-form technical summarization. My lackluster findings were validated by RULER.

https://github.com/hsiehjackson/RULER

Hugsun · 2 years ago
This test is very helpful. Thanks for sharing!
Havoc · 2 years ago
Never had any luck with the Yi family of models. They tend to get sidetracked and respond in Chinese. Maybe my setup is somehow flawed
Oranguru · 2 years ago
You can easily fix this using a grammar constraint with llama.cpp. Add this to the command: --grammar "root ::= [^一-鿿ぁ-ゟァ-ヿ가-힣]*"

This will ban Chinese characters from the sampling process. Works for Yi and Qwen models.

segmondy · 2 years ago
Your setup is flawed.
qeternity · 2 years ago
No, it's not. This is a common issue with Yi models.
smcleod · 2 years ago
While interesting, Yi 1.5 only has a 4K context window, which means it's not going to be useful for a lot of use cases.
tosh · 2 years ago
Benchmark charts on model card: https://huggingface.co/01-ai/Yi-1.5-34B-Chat#benchmarks

Yi 34b with results similar to Llama 3 70b and Mixtral 8x22b

Yi 6b and 9b with results similar to Llama 3 8b

GaggiX · 2 years ago
We need to wait for LMSYS Chatbot Arena to actually see the performance of the model.
tosh · 2 years ago
I had good results with the previous Yi-34b and its fine tunes like Nous-Capybara-34B. Will be interesting to see what Chatbot Arena thinks but my expectations are high.

https://huggingface.co/NousResearch/Nous-Capybara-34B

zone411 · 2 years ago
No, Lmsys is just another very obviously flawed benchmark.
qeternity · 2 years ago
Pretraining on the test set is all you need.

LLM benchmarks are horribly broken. IMHO there is better signal in just looking at parameter counts.

BoorishBears · 2 years ago
Looking at tokens they were trained on is also a really great indicator of world understanding. Llama 3 is a game changer for some usecases because there's finally a model that understands the world deeply as opposed to typical models which can be fine tuned into hyper specific tasks, but generalize poorly, especially in D2C usecases where someone might probe the model's knowledge
999900000999 · 2 years ago
Is 16 GB of ram enough to run these locally?

I'm considering a new laptop later this year and the ram is now fixed to 16GB on most of them.

I plan on digging deep into ML during my coming break from paid work.

tosh · 2 years ago
16 GB is enough to run quantized versions of 9b and 6b.
coolestguy · 2 years ago
No - 16gb of ram is barely enough to run regular applications if you're a power user let alone the most breakthrough computationally heavy workloads ever invented
999900000999 · 2 years ago
The price difference is about 150$ give or take for the laptops I'm looking at.

I'll keep this in mind!

adt · 2 years ago
Hugsun · 2 years ago
This page is confusing to me. How is it useful to you? I can see some utility but am curious if there's something I'm missing.