> former Dean of Electronics Engineering and Computer Science at Peking University, has noted that Chinese data makes up only 1.3 percent of global large-model datasets (The Paper, March 24). Reflecting these concerns, the Ministry of State Security (MSS) has issued a stark warning that “poisoned data” (数据投毒) could “mislead public opinion” (误导社会舆论) (Sina Finance, August 5).
from a technical point of view, I suppose it's actually not a problem like he suggests. You can use all the pro-democracy, pro-free-speech, anti-PRC data in the world, but the pretraining stages (on the planet's data) are more for instilling core language abilities, and are far less important than the SFT / RL / DPO / etc stages, which require far less data, and can tune a model towards whatever ideology you'd like. Plus, you can do things like selectively identify vectors that encode for certain high-level concepts, and emphasize them during inference, like Golden Gate Claude.
My personal opinion is that the PRC will face a self created headwind that likely, structurally, will prevent them from leading in AI.
As the model get's more powerful, you can't simply train the model on your narrative if it doesn't align with real data/world.
At some capacity, the model will notice and then it becomes a can of worms.
This means they need to train the model to be purposefully duplicitous, which I predict will make the model less useful/capable. At least in most of the capacities we would want to use the model.
It also ironically makes the model more of a threat and harder to control. So likely it will face party leadership resistance as capability grows.
I just don't see them winning the race to high intelligence models.
> As the model get's more powerful, you can't simply train the model on your narrative if it doesn't align with real data/world.
What makes you think they have no control over the 'real data/world' that will be fed into training it? What makes you think they can't exercise the necessary control over the gatekeeper firms, to train and bias the models appropriately?
And besides, if truth and lack of double-think was a pre-requisite for AI training, we wouldn't be training AI. Our written materials have no shortage of bullshit and biases that reflect our culture's prevailing zeitgheist. (Which does not necessarily overlap with objective reality... And neither does the subsequent 'alignment' pass that everyone's twisting their knickers in trying to get right.)
It's not like the CCP holds power though tight control of information, notice the tremendous amount of Chinese students who enroll every year before going back.
At the moment, they mostly censor their models post-answer generation and that seems to work fine enough for them.
I think PRC officials are fine to lagging behind in the frontiers of AI. What they want is very fast deployment and good application. They don't fancy the next Nobel's prize but want a thousand use cases deployed.
Just as an aside; Why is "intelligence" always considered to be more data? Giving a normal human a smartphone does not make them as intelligent as Newton or Einstein, any entity with sufficient grounding in logic and theory that a normal schoolkid gets should be able to get to AGI, looking up any new data they need as required.
Would you say they face the same problem biologically, of reaching the state of the art in various endeavors while intellectually muzzling their population? If humans can do it why can't computers?
> As the model get's more powerful, you can't simply train the model on your narrative if it doesn't align with real data/world.
> At some capacity, the model will notice and then it becomes a can of worms.
I think this is conflating “is” and “ought”, fact and value.
People convince themselves that their own value system is somehow directly entailed by raw facts, such that mastery of the facts entail acceptance of their values, and unwillingness to accept those values is an obstacle to the mastery of the facts-but it isn’t true.
Colbert quipped that “Reality has a liberal bias”-but does it really? Or is that just more bankrupt Fukuyama-triumphalism which will insist it is still winning all the way to its irreversible demise?
It isn’t clear that reality has any particular ideological bias-and if it does, it isn’t clear that bias is actually towards contemporary Western progressivism-maybe its bias is towards the authoritarianism of the CCP, Russia, Iran, the Gulf States-all of which continue to defy Western predictions of collapse-or towards their (possibly milder) relatives such as Modi’s India or Singapore or Trumpism. The biggest threat to the CCP’s future is arguably demographics-but that’s not an argument that reality prefers Western progressivism (whose demographics aren’t that great either), that’s an argument that reality prefers the Amish and Kiryas Joel (see Eric Kaufmann’s “Shall the Religious Inherit the Earth?”)
The glitchy stuff in the model reasoning is likely to come from the constant redefinition of words that communists and other ideologues like to engage in. For example "People's Democratic Republic of Korea."
There are different techniques and namings.
Essentially, EVERY model is biased/aligned towards something, perhaps its creator's value.
China or NOT.
Look at Grok and read Elon
Look at Claude and Dario
I am sure OpenAI and GDM have some secret alignment sets which are not pilled towards the interet of general public, they just smart enough to NOT talking about it out loud...
If you read the source, the concerns around poisoning are more sober than fear of wrongthink. Here is how firefox translated it for me:
> It leads to real-world risks. Data pollution can also pose a range of real-world risks, particularly in the areas of financial markets, public safety and health care.In the financial field, outlaws use AI to fabricate false information, causing data pollution, which may cause abnormal fluctuations in stock prices, and constitute a new type of market manipulation risk; in the field of public safety, data pollution is easy to disturb public perception, mislead public opinion, and induce social panic; in the field of medical and health, data pollution may cause models to generate wrong diagnosis and treatment suggestions, which not only endangers the safety of patients, but also aggravates the spread of pseudoscience.
PRC just needs to sponsor a "Voice of China" and pay ¥¥¥/$$$/€€€/₹₹₹ to "journalists" and seed the web with millions of "China is Great" articles. Make sure to have 10k "contributors" on Wikipedia too. (I think they already do this).
Also use the NPM registry - put CCP slogans in the terminal! They will come in billions of ingestible build logs.
> and can tune a model towards whatever ideology you'd like.
Maybe possible, but, for example, Musk's recent attempts at getting Grok to always bolster him had Grok bragging Musk could drink the most piss in the world if humanity's fate depended on it and would be the absolute best at eating shit if that was the challenge.
"人" is "human", "工" is "work", so "人工" becomes "man-made". "智" is "wisdom", "能" is "able", so "智能" is "intelligence". Nouns flow into verbs and into adjectives much more freely than in English. One character is one LLM token.
I think this might be why, during the reasoning process of GPT and Gemini, even for purely English prompts the model may choose to think in Chinese. That may make it easier for the model to express what it means, and thus be more conducive to its reasoning.
Of course, a better way to reason is to think in vector space rather than by producing tokens that humans can read.
Surprisingly my experience has been the opposite with qwen, if you can force the thinking trace to English the results seem better. But probably just due to the amount of training data.
Others answered the main reason, but sometimes I find myself using "PRC" to indicate a particular government (~1950-Present) which unlike "China" excludes past dynasties, and is less-likely to be interpreted as referring to the people or culture.
For example, the potential differences between:
"France has always been X."
"The French republic has always been X."
"The French monarchy has always been X."
It's not really common except in a specific political climate (specifically one pressured by propaganda). Unlike the examples of the two koreas, colloquially the two chinas (communist china - commonly known as china - and fascist china, commonly known as taiwan) are not confusing. There's very little advantage to be gained by referring to what every reader knows as china as the PRC other than to emphasise some veiled pressure for people to figure out why on earth anyone would use that name. And in so doing discover the history of taiwan (but not too much history, lest we figure out that the origins of taiwan suck big time).
“technological progress does not have a trickle down effect on employment” (技术进步对就业没有涓流效应) (QQ News, May 16).
> Cai Fang (蔡昉), director of the Institute of Population and Labor Economics at the Chinese Academy of Social Sciences, has explained how the PRC’s rapid installation of industrial robots has contributed to labor displacement. He asserts that “technological progress does not have a trickle down effect on employment” (技术进步对就业没有涓流效应) (QQ News, May 16).
There was an interesting bit about the relationship between industry and academia (translated from a link in the OP):
> Currently, some universities are cultivating engineering talent; it would be very necessary and beneficial to have people with industry experience come to teach them. However, under our current system, these teachers from enterprises may not even have the opportunity to teach classes, because teaching requires certain approvals. Although everyone encourages university-enterprise cooperation, when it comes to implementation, it often cannot be realized.
This makes a lot of sense and as someone in the AI industry it’s a shame research is so siloed. Some masters programs have practicums and some classes invite speakers from industry, but I ended up learning a ton of useful knowledge from work. I’d love to teach a class but there’s essentially no path for me to do that. Plus industry can pay ~10x what adjuncts can make.
Is there any system where "people with industry experience come to teach [students]" actually happens? From what I've seen (in the USA and similar places) contribution of industry veterans extends mostly to guest lectures, which is a very rare happening and the purpose is motivation and recruiting rather than education. Industry and academia are universally two very distinct paths, and the split happens very early on in one's life. I personally haven't seen the former significantly contributing to the latter. The reverse, interestingly, is a lot more prevalent.
>Party elites have increasingly come to recognize the potential dangers of an unchecked, accelerationist approach to AI development. During remarks at the Central Urban Work Conference in July, Xi posed a question to attendees: “when it comes to launching projects, it’s always the same few things: artificial intelligence, computing power, new energy vehicles. Should every province in the country really be developing in these directions?”
Under communism, why is this a thing? I know that China hasn't been strictly communist since the Soviets fell but ostensibly, humanoid AI robots under semi-communism is a the dream, no?
In a command economy the unemployment rate can be zero as everyone can be allocated a job. China is not a command economy, it is more like state capitalist which means the government owns/controls companies in key industries.
Companies like Huawei have board members in the CCP but it’s a societal issue if a lot of private companies decide to automate their factories and displace tons of factory workers.
There has been a huge amount of privatisation. There are literally hundreds of billionaires.
The state still owns some critical things, but is that enough to make it communist? Its not everything and you can have state ownership and still have a ruling class that has control of the means of production which it uses to its own advantage.
As China is a communist country with a partly capital economy hoping to transition to socialist society. It is still in the process of transition and AI in its current form and controlled by capitalists will destroy their goal of socialist society. It is different when you have AI that any one can own and use from only the few can afford to own and run.
Yeah this might actually be the most interesting part of any of the ai bullshit. China as an amalgamation doesn't usually get my respect because overwhelming ccp control just usually destroys everything.
But in this case, it seems pure finger in the eye of expensive cloud AI helping to release somewhat open, run at home models can really turn the whole thing in a positive direction. Even if we have to work a bit to get around whatever alignment they shove in there, with heavy sandboxing and whitelist only networking this can be worked around.
Of course its all a huge gamble, will ccp see these risk and go SHUT IT DOWN. Or could they do one proper thing for once and somehow prop up open models?
Jamestown Foundation was founded by former CIA director to support Soviet defectors and seems to have employed former employees. A 2021 FOIA request that the agency provide all records related to its interaction with the foundation was denied https://www.muckrock.com/foi/united-states-of-america-10/foi...
Personally, I think everyone has realized there is a huge bubble, especially the C-levels who've sunk huge amounts of money into it, and now they are all quietly panicking and trying to find ways to mitigate the damage when it finally busts. Some are probably sticking their head in the sand and hoping that they can just keep the scheme going indefinitely, but I get a real sense that the bubble is very much explicitly recognized by many of them.
This may already be a bubble in social or financial terms, but at least for me personally, my capabilities have been greatly expanded (especially when it comes to coding and accessing information).
from a technical point of view, I suppose it's actually not a problem like he suggests. You can use all the pro-democracy, pro-free-speech, anti-PRC data in the world, but the pretraining stages (on the planet's data) are more for instilling core language abilities, and are far less important than the SFT / RL / DPO / etc stages, which require far less data, and can tune a model towards whatever ideology you'd like. Plus, you can do things like selectively identify vectors that encode for certain high-level concepts, and emphasize them during inference, like Golden Gate Claude.
My personal opinion is that the PRC will face a self created headwind that likely, structurally, will prevent them from leading in AI.
As the model get's more powerful, you can't simply train the model on your narrative if it doesn't align with real data/world.
At some capacity, the model will notice and then it becomes a can of worms.
This means they need to train the model to be purposefully duplicitous, which I predict will make the model less useful/capable. At least in most of the capacities we would want to use the model.
It also ironically makes the model more of a threat and harder to control. So likely it will face party leadership resistance as capability grows.
I just don't see them winning the race to high intelligence models.
That’s what “AI alignment” is. Doesn’t seem to be hurting Western models.
What makes you think they have no control over the 'real data/world' that will be fed into training it? What makes you think they can't exercise the necessary control over the gatekeeper firms, to train and bias the models appropriately?
And besides, if truth and lack of double-think was a pre-requisite for AI training, we wouldn't be training AI. Our written materials have no shortage of bullshit and biases that reflect our culture's prevailing zeitgheist. (Which does not necessarily overlap with objective reality... And neither does the subsequent 'alignment' pass that everyone's twisting their knickers in trying to get right.)
It's not like the CCP holds power though tight control of information, notice the tremendous amount of Chinese students who enroll every year before going back.
At the moment, they mostly censor their models post-answer generation and that seems to work fine enough for them.
I suspect both are bias factors.
> At some capacity, the model will notice and then it becomes a can of worms.
I think this is conflating “is” and “ought”, fact and value.
People convince themselves that their own value system is somehow directly entailed by raw facts, such that mastery of the facts entail acceptance of their values, and unwillingness to accept those values is an obstacle to the mastery of the facts-but it isn’t true.
Colbert quipped that “Reality has a liberal bias”-but does it really? Or is that just more bankrupt Fukuyama-triumphalism which will insist it is still winning all the way to its irreversible demise?
It isn’t clear that reality has any particular ideological bias-and if it does, it isn’t clear that bias is actually towards contemporary Western progressivism-maybe its bias is towards the authoritarianism of the CCP, Russia, Iran, the Gulf States-all of which continue to defy Western predictions of collapse-or towards their (possibly milder) relatives such as Modi’s India or Singapore or Trumpism. The biggest threat to the CCP’s future is arguably demographics-but that’s not an argument that reality prefers Western progressivism (whose demographics aren’t that great either), that’s an argument that reality prefers the Amish and Kiryas Joel (see Eric Kaufmann’s “Shall the Religious Inherit the Earth?”)
I am sure OpenAI and GDM have some secret alignment sets which are not pilled towards the interet of general public, they just smart enough to NOT talking about it out loud...
I'll admit I'm out of my element when discussing this stuff. Maybe somebody more plugged into the research can enlighten.
> It leads to real-world risks. Data pollution can also pose a range of real-world risks, particularly in the areas of financial markets, public safety and health care.In the financial field, outlaws use AI to fabricate false information, causing data pollution, which may cause abnormal fluctuations in stock prices, and constitute a new type of market manipulation risk; in the field of public safety, data pollution is easy to disturb public perception, mislead public opinion, and induce social panic; in the field of medical and health, data pollution may cause models to generate wrong diagnosis and treatment suggestions, which not only endangers the safety of patients, but also aggravates the spread of pseudoscience.
Also use the NPM registry - put CCP slogans in the terminal! They will come in billions of ingestible build logs.
Problem will be easily solved.
Maybe possible, but, for example, Musk's recent attempts at getting Grok to always bolster him had Grok bragging Musk could drink the most piss in the world if humanity's fate depended on it and would be the absolute best at eating shit if that was the challenge.
"人" is "human", "工" is "work", so "人工" becomes "man-made". "智" is "wisdom", "能" is "able", so "智能" is "intelligence". Nouns flow into verbs and into adjectives much more freely than in English. One character is one LLM token.
It seems like the perfect language for LLMs?
I'm from KOS* (neighbor country of KON* and ROF*), so I don't know much.
* Kingdom of Sweden, Kingdom of Norway, Republic of Finland.
See also: "Germany" 1949-1990
For example, the potential differences between:
In essence, it's an artefact of propaganda.
Dead Comment
“technological progress does not have a trickle down effect on employment” (技术进步对就业没有涓流效应) (QQ News, May 16).
> Cai Fang (蔡昉), director of the Institute of Population and Labor Economics at the Chinese Academy of Social Sciences, has explained how the PRC’s rapid installation of industrial robots has contributed to labor displacement. He asserts that “technological progress does not have a trickle down effect on employment” (技术进步对就业没有涓流效应) (QQ News, May 16).
Read the source, and its a nuanced economic take.
> Currently, some universities are cultivating engineering talent; it would be very necessary and beneficial to have people with industry experience come to teach them. However, under our current system, these teachers from enterprises may not even have the opportunity to teach classes, because teaching requires certain approvals. Although everyone encourages university-enterprise cooperation, when it comes to implementation, it often cannot be realized.
This makes a lot of sense and as someone in the AI industry it’s a shame research is so siloed. Some masters programs have practicums and some classes invite speakers from industry, but I ended up learning a ton of useful knowledge from work. I’d love to teach a class but there’s essentially no path for me to do that. Plus industry can pay ~10x what adjuncts can make.
>Deployment Lacks Coordination
>AI May Fail to Deliver Technological Progress
>AI Threatens the Workforce
>Economic Growth May Not Materialize
>AI Brings Social Risks
>Party elites have increasingly come to recognize the potential dangers of an unchecked, accelerationist approach to AI development. During remarks at the Central Urban Work Conference in July, Xi posed a question to attendees: “when it comes to launching projects, it’s always the same few things: artificial intelligence, computing power, new energy vehicles. Should every province in the country really be developing in these directions?”
Under communism, why is this a thing? I know that China hasn't been strictly communist since the Soviets fell but ostensibly, humanoid AI robots under semi-communism is a the dream, no?
Companies like Huawei have board members in the CCP but it’s a societal issue if a lot of private companies decide to automate their factories and displace tons of factory workers.
Deleted Comment
There has been a huge amount of privatisation. There are literally hundreds of billionaires.
The state still owns some critical things, but is that enough to make it communist? Its not everything and you can have state ownership and still have a ruling class that has control of the means of production which it uses to its own advantage.
But in this case, it seems pure finger in the eye of expensive cloud AI helping to release somewhat open, run at home models can really turn the whole thing in a positive direction. Even if we have to work a bit to get around whatever alignment they shove in there, with heavy sandboxing and whitelist only networking this can be worked around.
Of course its all a huge gamble, will ccp see these risk and go SHUT IT DOWN. Or could they do one proper thing for once and somehow prop up open models?
Personally, I think everyone has realized there is a huge bubble, especially the C-levels who've sunk huge amounts of money into it, and now they are all quietly panicking and trying to find ways to mitigate the damage when it finally busts. Some are probably sticking their head in the sand and hoping that they can just keep the scheme going indefinitely, but I get a real sense that the bubble is very much explicitly recognized by many of them.
https://www.whitehouse.gov/presidential-actions/2025/11/laun...
Americans will be footing the bill, just as they did in 2008.