Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.
The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.
The problem is that the benchmarks they included in the average are cherry-picked.
They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.
This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.
Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.
To try to avoid the inevitable long arguments about which benchmarks or sets of them are universally better: there is no such thing anymore. And even within benchmarks, we're increasingly squinting to see the difference.
Do the benchmarks reflect real-world usability? My feeling is that the benchmark result numbers stop working above 75%.
In a real problem you may need to get 100 things right in a chain which means a 99% chance of getting each single one correct results in only 37% change of getting the correct end result. But creating a diverse test that can correctly identify 99% correct results in complex domains sounds very hard since the answers are often nuanced in details where correctness is hard to define and determine. From working in complex domains as a human, it often is not very clear if something is right or wrong or in a somewhat undefined and underexplored grey area. Yet we have to operate in those areas and then over many iterations converge on a result that works.
Not sure how such complex domains should be benchmarked and how we objectively would compare the results.
GPT-4.5's advantages are supposed to be in aspects that aren't being captured well in current benchmarks, so the claim would be shaky even if ERNIE's benchmarks actually showed better performance.
It doesn't really matter what nationality or ethnicity you are, but if you communicate with the model in Chinese you might get better results from this model.
Then again, if they've misrepresented the strength of the model overall, there might be some other shenanigans with their results. The fact that their results show their model is worse than GPT-4.5 on 2 Chinese language benchmarks, while it's so much stronger on some of the others, is a bit weird.
I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi Trillion for Fabs and Semi?
This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE and Open sourcing them meant there is very little to no space for most.
Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.
Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
> We are slowly but surely witnessing the collapse of Western Tech scene
Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now.
With that, I realize your comment is much broader than AI so below is too domain specific but…
VC has been investing in AI as-if it were a winner takes all market, but it has been obvious that isn’t the case.
Not only that, but the massive amount of cash thrown to anyone with even marginal credentials has undermined the constraints that often lead to innovation.
There is 0 reason that Safe Superintelligence should be raising for the second time at a 30 B valuation with no product.
> VC has been investing in AI as-if it were a winner takes all market, but it has been obvious that isn’t the case.
I really don't see how this was supposed to go, and I've never heard an explanation.
I don't see any kind of coherent vision from any of these types.
Most normal folks (i.e not SV/HN types that seem to desire to replace their marketable programming skills with LLM output) really don't "want" LLMs in any real sense.
Sure, people use them like a search engine, kids cheat on homework with them etc, but there's not this overwhelming universal desire for them like there was for, say iPhones.
I never once have heard any sort of proposed roadmap for how LLMs were supposed to work as a product.
They were just going to get, uh "really good" and take everyone's office job or something?*
Normal F500 organizations that are obviously a target for LLM use (via hyperscaler sales) are still yet to see a clear path to "revolutionizing" their workforce or whatever via LLMs-it's just not there. Costs are too high, there's no obvious use case, "hallucinations" are a real impediment etc.
I'll add many of the public usecases for this (i.e those a hyperscaler would blog about as a sales promo) are seriously weak ("we reduced onboarding time by 20% with $MODEL")
I would really like to hear a proposal for how this is all supposed to come together. Does anyone have a concrete plan for the future for all this stuff?
*I'll note, this is NOT the way to sell a product to the masses, either.
Addendum: I'm not an "LLM hater" by any means. I pay for GH Copilot, and have been running local LLMs since it's been a thing (granted with limited hardware, and limited quality)--I intend to wait a bit and buy better hardware with one motivating factor being running local LLMs in a year or so when the open-source offerings stabilize"
This is one of the most finest and most accurate things that I have read in a long time.
This really could be a blog post which I encourage you to make! (I would prefer github pages but if you really want , I have a domain name on cloudflare and I am more than willing to host the static page of such blog on my own domain name for absolutely free (lets go , cloudflare!)
Its just facts. Pure facts.
""
Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now.
""
Why did I read it in a monotonous way as if a student from the future understands the current scenario.
I felt as if it was the same level of sadness in my heart as that when you listen to some video which has raining background and he reads the dark comedy (something like burialgoods oats shitposting but this time more serious and real!)
Currently saving this on wayback machine just for this comment. Internet needs to preserve this comment , no matter what.
There used to be significant alignment between engineers, founders and certain VCs: lots of excitement around building software that genuinely made things better/easier/cheaper. Each group naturally wanted a different thing out of this arrangement but each camp was on the same page.
Now I feel like everything is more top-down. The tech sector feels less like market capitalism and more like something being centrally planned: we all must chase trends that come from various industry thought-leaders. And it all must be done a very specific way (this in particular is why Chinese companies are likely going to disrupt the AI market: they’re free from this burden)
VCs are happy to throw money at something if they believe they can corner a market. It just doesn't have to do much with reality. Until of course, it becomes self fulfilling. But in this case it seems like it's not going to happen because nobody has a moat.
> I’ve found that almost no founders or friends I speak with have any vision for the future anymore.
I think in general there is a feeling that the time to get your bag is rapidly shrinking.
Once everything is built by these things there will be no reason to create anything as the platform owners (big tech) will be able to take everything for themselves and no longer have to share 70% with those pesky creators/small business/startups etc.
Founder risk has been nil for a long time either because they pay themselves six figures out the gate or because the job market has been hot enough that they can market utter failure to get another job.
There’s a lot of opportunity to make low cost software that out competes big tech just because it doesn’t demand 10000x returns on every if statement.
I’d encourage Europeans to start replacing American software vendors with small teams today. You won’t become the next American oligarch but you’ll be able to clean up millions from the incompetent Americans.
It's similar to how China dominated manufacturing in prior decades.
They have massive amounts of low cost labor and, unfortunately, the US has pretty large walls up preventing mass in-migration of white collar workers.
H1B is capped and also more of a lottery than a points based system.
If the US allowed mass white collar immigration, wages would decline materially which would make our industries more competitive for the next generation of software.
Right now the system is geared around protectionism (intended or not) and wage inflation for US local workers.
The current market wages in software are far far above what a global equilibrium would be. Though myself and I'm sure most others here have benefited from it in the short term.
To be clear, established companies with an existing market are fine for now and can do well with high wages.
But the next generation of companies that are chasing smaller markets and margins, ones that require more elbow grease to out-compete are underserved.
e.g. the entire DeepSeek team was paid less than a few Meta engineers (with 7 figure comp each)
"The firm offers 14-month pay for various positions and the highest offer is for deep learning researchers for artificial general intelligence (AGI), with a monthly salary between 80,000 yuan ($10,983) and 110,000 yuan, which could mean an annual income of up to 1.54 million yuan, the report said."
> Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
The US has been so used to being number one that not being number one equates to “collapse”.
No the US Will NOT collapse. They just won’t be number one in economic/military/technological might. Similar to how many countries like the UK, Japan, and more have not existed as the number one economic super power.
It will be (arguably already is) societally rough though. The west has been riding the asian cheap labor for decades (and the cheap colonial labor before that). People are not gonna be happy falling down the "value chain".
This is the old way of doing it, and probably the way the US is going to go with, at the detriment of its own population. - I would posit that since we are talking about digital goods, there is a better way:
Require open source / open weights of any company that used data to it doesn't own to train its models. If chinese companies do not comply, their copyright becomes void in the US, and these models are very easy to copy. Treat advances in architecture as a utility, and let the utilization of those architectures be the market for companies to compete in.
A copyright exemption would just put them at the level of deepseek officially, but they've been working around that anyway in practice. I'm not sure that change would make any difference.
When it comes to hardware who pioneered all those technologies? Definitely not China. They’ve stolen unimaginable amounts of IP and will continue to do so. But yes you’re right, they surprise everyone with how well they can scale the stolen innovation.
Possible, but if you look at the graduate students and lecturers behind many of these IPs you will find they are Chinese (or Russians or Iranians).
This is the paradox in those who are championing barring Chinese students from the US to prevent them from stealing IP, they don't see that at least 50% of this IP is generated by students from China, in a way they will be handing the CCP a gift by incentivising those students to remain in China.
>They’ve stolen unimaginable amounts of IP and will continue to do so.
All AI models are built on the back of massive amounts of "IP stealing". Either we consider IP to be valid and then all western companies in this space are just as bad, or we go with the direction the western companies are claiming and then China is not doing anything wrong.
These sour grapes comments are so goofy, and honestly a little racist. The millions of Chinese engineers working out in China are extremely talented, and to downplay their achievements like this and to chalk them all up as thieves is ridiculous. They have the skills, the man power, and the vision, and they’re eating the West’s lunch regardless of your feelings on how fair it is.
All developing markets "steal" until they've caught up with the competition. Just look at the US and how they "stole" innovations and tech from Europe.
It is entirely irrelevant who pioneered the tech. This is why no one gives a crap about xerox anymore.
Dismissing Chinese tech is foolish. They are tech leaders in many areas and moving to new ones every day. Solar, Nuclear, Batteries, EVs, Drones, Robotics etc. They have no one to copy in those fields because they have left the rest of the world behind.
By the 1890s both the US and Germany had surpassed Britain when it came to industrial output, I don’t think it was any consolation for the Brits that they had invented it (almost) all.
I’m replying to myself to address multiple other replies.
First of all, it’s really sad to see people saying it doesn’t matter the journey of how one achieves success, and all that matters is your current state.
Brushing the CCPs countless acts of IP theft under the rug is like saying it doesn’t matter that the Trump family committed financial fraud for decades. All that matters is that they’ve managed to become billionaires today. Would the Trump family be anywhere near as wealthy if they hadn’t cheated for so long? Would China be much further behind than they are now if the CCP hadn’t stolen so much IP? I see multiple people here implying those questions are irrelevant, which I absolutely disagree with. Ignoring all that history is a huge injustice to everyone else who didn’t resort to that kind of behavior.
I also want to be clear I’m not trying to make some ridiculous claim that Chinese individuals have been working independently to hack and steal IP. It’s their government and that same government is to blame for the many people like me who really look down on them despite what they’ve ultimately been able to achieve. They undoubtedly have huge numbers of brilliant citizens. When I make comments about China’s shameful history of tech IP theft, I’m talking about their government.
We are slowly but surely witnessing the collapse of Western Tech scene.
I think you're witnessing it rather getting back in touch with reality than collapsing. Multi-trillion out of jsx generator was too much from the beginning. You folks just don't know what to do with too much money you have.
> I think you're witnessing it rather getting back in touch with reality than collapsing.
You're witnessing the USA tech scene getting in back to reality. Software engineers in other western countries looked at the salaries the Tech scene was paying in the USA, and scratched their heads.
Its a collapse from fictional reality to real reality , but a collapse nonetheless.
Sometimes reality acts more weird than fiction itself. I have just now decided to call this "fictional reality"
Like yesterday when I realized that nuclear bombs weren't that far away from the creation of chemical resonance & they happened after world war I and I think , just really 5-6 years before nuclear bombs but still!
It actually gave me a lot of hope because I felt that a lot of people were focusing on AI , so I can use AI (sometimes , if I want) to focus on a passion project that I want , to maybe earn some money.
I have also thought of creating AI projects but that too for fun. I don't know two shits but I just want to know what the hype is about from a theoretical standpoint.
Correct answer, never think about the future in terms of linear extrapolations. It's a non-linear differential equation with lots of variables and expect complex feedback loops. Systems react to change.
> We are slowly but surely witnessing the collapse of Western Tech scene
Is economy a zero sum game now?
Isn't economic development supposed to be a good thing?
Can the West only exist in a world of poverty and underdevelopment?
What user experience are you talking about? Chatbot? Or software in general? Cause Tiktok beats Facebook out of water. Chatbot for English communities sure, I also prefer Claude over Deepseek in terms of project support and UI. But this is because they are focusing on Chinese communities, Doubao has much better features that is used by Chinese. It's not really comparable even if all US chatbots were accessible in China. Once LLM tech slows, I am sure Chinese chatbots would beat the American ones in terms of user experience.
So far. China has been focused on becoming a world's factory for 30 years. They started moving up the food chain fairly recently.
Give it another generation and if China will not walk off the ledge with either government or societal issues (which, granted, is where they are slowly going IMO) they will own the UX and design as well. My 2c.
A lot of consumer tech with very competitive UX is coming out of China. They are also getting very strongly into e.g. web frontend tech. I see no reason why the west would have any special advantage in this.
Software is more trivial than hardware. That’s why you see bootcamps for software but not for hardware. China can easily eclipse the US on the software front. And they have.
Let's just hope they contribute back as much as the west has contributed to spreading knowledge and knowledge-tools to the world instead of just free-riding on it and then pretending it wasn't foundational. Linux, Wikipedia, RISC-V, ...
It's not just the Chinese who are lacking the acknowledgment of these contributions.
Chinese complete supply chain is ramping up. Previously they could achieve the Frontier but hampered by tools sanctions. Now Chinese indigenous tools are catching up.
Or (speaking as an American) we could just criminalize the use of Chinese LLMs.
/s ... but... maybe not?
Wherever you sit on the political spectrum the fact that this idea will almost certainly be seriously discussed in the coming months should be concerning to you.
They effectively are banned for a lot of the commercial world already. I cannot imagine most businesses American or European would be willing to use Chinese services.
China is building an entirely independent semi conductor supply chain and if they are not competitive now, they will be in the near future. US sanctions forced them into turbo charging their efforts.
I have two SO-DIMM sticks of a domestically produced Chinese brand named FASPEED, with chips bearing logos and markings that I don't recognize from anywhere else. These sticks' mfg. datestamp is "44-24" and from what I've learned they cost little in China but come with a salted price tag when sold through channels aimed at Western customers. I'm not sure if they come in fast-enough variants to compete, and not sure about the quality or longevity otherwise. FASPEED makes SSDs, too, but I have no data on those. I also have an M.2 SSD from another Chinese brand called XINCUU which I previously had never heard about. The label of that SSD is in parity with expectations of Chinese business morals - it claims to be PCIe NVMe with "1000 MB/s speed" but is in reality a SATA device, and it does not perform even close to the ~550 MB/s limit of SATA 3.0. Both of these run unusually warm for DDR4 memory and M.2 flash storage, leading me to believe they are wholly designed and produced in China.
That vast majority of conversations with AI is irrelevant to censorship. Well, I can only speak for myself, but surely you can see questions like "phone A vs phone B", and "how do I use feature X on product Y" or almost any programming question isn't concerned with censorship.
Is censorship a thing for models? Of course. Does it matter? Probably not, unless you either specifically have chats on those specific topics, or if you are trying to create a meme.
The concern of censorship is way overblown by some people. Most users only care about "does it work?", then some "is the answer correct?", and at the bottom is "is the answer censored, and according to what ideology?". Seriously, think about these models/products like a normal person.
Based on what Altman says and leaked reports, OpenAI is actually losing money on every new user. Unlike traditional software, maintaining a SOTA AI service doesn't scale. The conundrum he faces is he can either quantize models and slash R&D to try to turn a profit now but lose the SOTA race, or keep pumping money and hope the rest bleed out. He's opted for the second, having raised 10B in 2023, 6.6B in 2024 and reports of another raise in 2025. He's probably trying desperately for the middle ground where an explosion of high price subscriptions replacing workers massively boost his revenue. So he's also reportedly projecting revenue 4x to 13B this year.
What's interesting about Baidu's AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu's AI lab. This shows Baidu's significant role in the ongoing development of AI.
Here’s a true story I find funny about scaling laws at Baidu.
From 2016-17 I did a projection using our scaling law equation with my coauthors about how many GPUs it would take to train an LLM with a step function in capability. Joel Hestness in particular did excellent experimental work to enable this.
I came out with a projection of about a $1 Billion GPU budget.
Baidu was in the middle of downsizing the US research center (SVAIL) in favor of AI in China and I was participating in the layoff of many of my colleagues while trying to keep the lights on long enough to finish our scaling law experiments, which I personally thought would change the world.
I actually wrote a report to Robin explaining the implications of scaling laws and asking for a $1 billion budget to train a Baidu LLM in 2016 and sat on it through 2017.
But I never sent it because I thought it would never have been supported in that environment. I sometimes wonder what Robin would have thought about it and how the world may have been different if Baidu had released ChatGPT.
We may be about to find out because the AI moat filled with simple algorithms and scale seems to be much more shallow than the processor and systems moat.
I have a huge amount of respect for Dario and Ilya for carrying on scaling laws at OpenAI or it may have never seen the light of day.
If there is one problem for the AI community to solve by 2030 I think it is the moat problem.
Do most people feel the way you do? This is one factor out of multitudes of factors representing Chinas rise as a super power that will eclipse the US in technological, economical and military might.
I’m excited but most people are patriotic and I feel things like this or even the whole situation with BYD producing better cars then Tesla is something people take as an attack to their identity. If not an attack it’s definitely represents an eroding of their patriotic identity.
Unfortunately Trump can’t slap a tariff on this. Maybe he can ban it like he was going to do with TikTok? The US really needs to get off its high horse and not associate its identity with being the sole economic super power in the world.
It's not about patriotism. Many people outside the US, myself included, see a problem with authoritarian superpowers per se. Although now that the US is rapidly drifting towards authoritarianism, that just seems like an inevitable future to prepare for.
Like 95% of the planet, I'm not American. Like 82% of the planet, I'm not Chinese.
BYD being better than Tesla isn't a matter of patriotism in most of the world. DeepSeek and Baidu can spend as long as they want playing musical chairs/rap battles with Anthropic and OpenAI, it makes no odds to me which wins.
America and China both have politics that have no reason to care for people like me, nor people like my friends, that they are for different reasons and differ in penalties for being an out-group doesn't matter when I'm a foreigner to both, when my antecedent are who the 13 Colonies rebelled against and more recent antecedent forced unwanted opium sales on China.
I think (hope) most folks care less about the “attack on patriotic identity” and are more concerned that what is essentially a dictatorship is rising in power significantly. History has shown dictatorships rarely end well for the general populace and the rest of the world.
Democracy has its flaws, but one of the features that most people prefer is that it can significantly change how it looks and operates to reflect the will of its people without violence.
It has nothing to do with just giving up and going 'Wellp, I guess China wins.'
China and the US are obviously very different culturally in just about every way possible. This difference makes for great competition. Someone in another topic mentioned something that seemed pretty insightful to me - in that where LLM companies failed in the US was in basically becoming clones of each other, whereas DeepSeek (and now perhaps Baidu) were going in a different way, and that way turned out to be better.
US companies will inevitably copy these strategies, one way or the other, as will Chinese companies copy what ends up working well from the US (see their latest rockets looking more than a little inspired by Starship). And the true competitiveness ensures in the end that the main people who will win will not be whichever guy ended up founding an AI company first, but you and I. It's how capitalism is supposed to work - companies beat themselves down into a race to the bottom, and society reaps the rewards. It only gets really messed up when there's no "real" competition, which is an increasingly frequent state of affairs. But that definitely will not be the case here.
Expect the same thing from India in the future as well. Their economy is advancing rapidly, and soon enough we're going to have another 1.4 billion people able to fully utilize the outliers such a population entails to similarly drive things forward in their own unique way. It's a great future for the world as a whole.
I feel like Deepseek had such good media reception, and SOTA models are so close that "GPT4x performance at y% the price" is an easy tagline that companies will be using in the coming 6 months. It's an easy goal to achieve because of diminishing returns in compute and game-able benchmarking, cherry-picking, distilling etc.
Not to say there can't be actual interesting improvements in performance/cost, but in many cases it will be more of a marketing angle.
Just tried it. Not sure exactly what model is behind the scenes but it was cringe. I provided specs for a coding task, it told me that the specs are possible but too complex so it just gave me an alternative naive way of doing it. I use LLMs as a tool so I'm trying to be very exact with my requirements and wording, this felt like it was basically negotiating the requirements with me...kinda annoyed me, lol. My suspicion is that it was trained too much on chinese forums and the data was not refined enough.
You get one free question answered without a login. You can dismiss the login prompt which appears after submitting your question and use copy/paste with keyboard shortcuts or browser debug tools to retrieve the full answer (including the part hidden with CSS rules). Either use XPath of '//div[@id="answer_text_id"]//text()' or copy the text/eventstream response for the API call to https://yiyan.baidu.com/eb/chat/conversation/v2 once the SSE session has closed.[1] Clear cookies and site data and you'll get a new session and can keep going.
It can take about 20 seconds to return all tokens so it appears likely the login prompt is there to minimise resource consumption.
I'm trying to figure out the same thing. They make claums about it being totally free, but everything is in Chinese and you appear to need a Chinese mobile number to register.
Surely, this is as inevitable as not being able to use Wechat as an American.
The models aren't what worry me anyway. China is going to kick our ass when it comes to AI integration into society and the economy.
Imagine the difficulties faced by America vs China in integrating AI into healthcare.
We are just too worried about winning this AI model sporting event even though the entire concept is flawed and doomed to failure. We actually have to figure out how to use these models for more than how many Rs are in strawberry. That appears to be the actual hard part.
Of course, none of this is helped by having wasted an entire generation of some of America's best minds on javascript programming for obscene profit.
GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion.
GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.
>GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models
It's not a failed experiment, it's a very good experiment, because it produced a very useful piece of information for the world (that there's limited return to further size scaling).
Outperform in what way? Reasoning models may be able to solve problems correctly a bigger percentage of time, but they burn many tokens to get there. So they’re much less efficient, both in latency and ultimately environmental cost.
The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.
The problem is that the benchmarks they included in the average are cherry-picked.
They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.
This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.
Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.
In a real problem you may need to get 100 things right in a chain which means a 99% chance of getting each single one correct results in only 37% change of getting the correct end result. But creating a diverse test that can correctly identify 99% correct results in complex domains sounds very hard since the answers are often nuanced in details where correctness is hard to define and determine. From working in complex domains as a human, it often is not very clear if something is right or wrong or in a somewhat undefined and underexplored grey area. Yet we have to operate in those areas and then over many iterations converge on a result that works.
Not sure how such complex domains should be benchmarked and how we objectively would compare the results.
Then again, if they've misrepresented the strength of the model overall, there might be some other shenanigans with their results. The fact that their results show their model is worse than GPT-4.5 on 2 Chinese language benchmarks, while it's so much stronger on some of the others, is a bit weird.
This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE and Open sourcing them meant there is very little to no space for most.
Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.
Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now.
With that, I realize your comment is much broader than AI so below is too domain specific but…
VC has been investing in AI as-if it were a winner takes all market, but it has been obvious that isn’t the case.
Not only that, but the massive amount of cash thrown to anyone with even marginal credentials has undermined the constraints that often lead to innovation.
There is 0 reason that Safe Superintelligence should be raising for the second time at a 30 B valuation with no product.
I really don't see how this was supposed to go, and I've never heard an explanation.
I don't see any kind of coherent vision from any of these types.
Most normal folks (i.e not SV/HN types that seem to desire to replace their marketable programming skills with LLM output) really don't "want" LLMs in any real sense.
Sure, people use them like a search engine, kids cheat on homework with them etc, but there's not this overwhelming universal desire for them like there was for, say iPhones.
I never once have heard any sort of proposed roadmap for how LLMs were supposed to work as a product.
They were just going to get, uh "really good" and take everyone's office job or something?*
Normal F500 organizations that are obviously a target for LLM use (via hyperscaler sales) are still yet to see a clear path to "revolutionizing" their workforce or whatever via LLMs-it's just not there. Costs are too high, there's no obvious use case, "hallucinations" are a real impediment etc.
I'll add many of the public usecases for this (i.e those a hyperscaler would blog about as a sales promo) are seriously weak ("we reduced onboarding time by 20% with $MODEL")
I would really like to hear a proposal for how this is all supposed to come together. Does anyone have a concrete plan for the future for all this stuff?
*I'll note, this is NOT the way to sell a product to the masses, either.
Addendum: I'm not an "LLM hater" by any means. I pay for GH Copilot, and have been running local LLMs since it's been a thing (granted with limited hardware, and limited quality)--I intend to wait a bit and buy better hardware with one motivating factor being running local LLMs in a year or so when the open-source offerings stabilize"
This is one of the most finest and most accurate things that I have read in a long time.
This really could be a blog post which I encourage you to make! (I would prefer github pages but if you really want , I have a domain name on cloudflare and I am more than willing to host the static page of such blog on my own domain name for absolutely free (lets go , cloudflare!)
Its just facts. Pure facts. "" Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now. ""
Why did I read it in a monotonous way as if a student from the future understands the current scenario. I felt as if it was the same level of sadness in my heart as that when you listen to some video which has raining background and he reads the dark comedy (something like burialgoods oats shitposting but this time more serious and real!)
Currently saving this on wayback machine just for this comment. Internet needs to preserve this comment , no matter what.
Now I feel like everything is more top-down. The tech sector feels less like market capitalism and more like something being centrally planned: we all must chase trends that come from various industry thought-leaders. And it all must be done a very specific way (this in particular is why Chinese companies are likely going to disrupt the AI market: they’re free from this burden)
I think in general there is a feeling that the time to get your bag is rapidly shrinking.
Once everything is built by these things there will be no reason to create anything as the platform owners (big tech) will be able to take everything for themselves and no longer have to share 70% with those pesky creators/small business/startups etc.
There’s a lot of opportunity to make low cost software that out competes big tech just because it doesn’t demand 10000x returns on every if statement.
I’d encourage Europeans to start replacing American software vendors with small teams today. You won’t become the next American oligarch but you’ll be able to clean up millions from the incompetent Americans.
Deleted Comment
They have massive amounts of low cost labor and, unfortunately, the US has pretty large walls up preventing mass in-migration of white collar workers.
H1B is capped and also more of a lottery than a points based system.
If the US allowed mass white collar immigration, wages would decline materially which would make our industries more competitive for the next generation of software.
Right now the system is geared around protectionism (intended or not) and wage inflation for US local workers.
The current market wages in software are far far above what a global equilibrium would be. Though myself and I'm sure most others here have benefited from it in the short term.
To be clear, established companies with an existing market are fine for now and can do well with high wages.
But the next generation of companies that are chasing smaller markets and margins, ones that require more elbow grease to out-compete are underserved.
e.g. the entire DeepSeek team was paid less than a few Meta engineers (with 7 figure comp each)
"The firm offers 14-month pay for various positions and the highest offer is for deep learning researchers for artificial general intelligence (AGI), with a monthly salary between 80,000 yuan ($10,983) and 110,000 yuan, which could mean an annual income of up to 1.54 million yuan, the report said."
The US has been so used to being number one that not being number one equates to “collapse”.
No the US Will NOT collapse. They just won’t be number one in economic/military/technological might. Similar to how many countries like the UK, Japan, and more have not existed as the number one economic super power.
- WIPO copyright exemption
- Anti-China protectionist measures
- Hard-line hardware export control
- Multi-billion dollar government contracts
Require open source / open weights of any company that used data to it doesn't own to train its models. If chinese companies do not comply, their copyright becomes void in the US, and these models are very easy to copy. Treat advances in architecture as a utility, and let the utilization of those architectures be the market for companies to compete in.
Why should people involved in some hyped company deserve all this "socialism for the rich" from the state?
This is the paradox in those who are championing barring Chinese students from the US to prevent them from stealing IP, they don't see that at least 50% of this IP is generated by students from China, in a way they will be handing the CCP a gift by incentivising those students to remain in China.
All AI models are built on the back of massive amounts of "IP stealing". Either we consider IP to be valid and then all western companies in this space are just as bad, or we go with the direction the western companies are claiming and then China is not doing anything wrong.
Dismissing Chinese tech is foolish. They are tech leaders in many areas and moving to new ones every day. Solar, Nuclear, Batteries, EVs, Drones, Robotics etc. They have no one to copy in those fields because they have left the rest of the world behind.
Deleted Comment
Not sure that matters anymore in the new world order.
Frontier tokens are largely fungible now. The details of how they came about doesn't make them any less useful.
something something software patents
First of all, it’s really sad to see people saying it doesn’t matter the journey of how one achieves success, and all that matters is your current state.
Brushing the CCPs countless acts of IP theft under the rug is like saying it doesn’t matter that the Trump family committed financial fraud for decades. All that matters is that they’ve managed to become billionaires today. Would the Trump family be anywhere near as wealthy if they hadn’t cheated for so long? Would China be much further behind than they are now if the CCP hadn’t stolen so much IP? I see multiple people here implying those questions are irrelevant, which I absolutely disagree with. Ignoring all that history is a huge injustice to everyone else who didn’t resort to that kind of behavior.
I also want to be clear I’m not trying to make some ridiculous claim that Chinese individuals have been working independently to hack and steal IP. It’s their government and that same government is to blame for the many people like me who really look down on them despite what they’ve ultimately been able to achieve. They undoubtedly have huge numbers of brilliant citizens. When I make comments about China’s shameful history of tech IP theft, I’m talking about their government.
Dead Comment
I think you're witnessing it rather getting back in touch with reality than collapsing. Multi-trillion out of jsx generator was too much from the beginning. You folks just don't know what to do with too much money you have.
You're witnessing the USA tech scene getting in back to reality. Software engineers in other western countries looked at the salaries the Tech scene was paying in the USA, and scratched their heads.
Sometimes reality acts more weird than fiction itself. I have just now decided to call this "fictional reality"
Like yesterday when I realized that nuclear bombs weren't that far away from the creation of chemical resonance & they happened after world war I and I think , just really 5-6 years before nuclear bombs but still!
It actually gave me a lot of hope because I felt that a lot of people were focusing on AI , so I can use AI (sometimes , if I want) to focus on a passion project that I want , to maybe earn some money.
I have also thought of creating AI projects but that too for fun. I don't know two shits but I just want to know what the hype is about from a theoretical standpoint.
These things are not actually useful. They hyper optimzed it for coding usecase but it still sucks balls at it.
Is economy a zero sum game now? Isn't economic development supposed to be a good thing? Can the West only exist in a world of poverty and underdevelopment?
Give it another generation and if China will not walk off the ledge with either government or societal issues (which, granted, is where they are slowly going IMO) they will own the UX and design as well. My 2c.
So to compete on software West might encounter unexpected difficulties. You need good platform docs to develop good software.
Probably not a problem for big companies.
Deleted Comment
It's not just the Chinese who are lacking the acknowledgment of these contributions.
/s ... but... maybe not?
Wherever you sit on the political spectrum the fact that this idea will almost certainly be seriously discussed in the coming months should be concerning to you.
I bet the Chinese AI will tell you the "chinese point of view" just like the one from USA tells the US one.
Also who knows what was censored or added there.
Is censorship a thing for models? Of course. Does it matter? Probably not, unless you either specifically have chats on those specific topics, or if you are trying to create a meme.
The concern of censorship is way overblown by some people. Most users only care about "does it work?", then some "is the answer correct?", and at the bottom is "is the answer censored, and according to what ideology?". Seriously, think about these models/products like a normal person.
Dead Comment
Dead Comment
https://research.baidu.com/Blog/index-view?id=89
I am excited to see Baidu catchup. It feels like they have earned it. Being very early.
From 2016-17 I did a projection using our scaling law equation with my coauthors about how many GPUs it would take to train an LLM with a step function in capability. Joel Hestness in particular did excellent experimental work to enable this.
I came out with a projection of about a $1 Billion GPU budget.
Baidu was in the middle of downsizing the US research center (SVAIL) in favor of AI in China and I was participating in the layoff of many of my colleagues while trying to keep the lights on long enough to finish our scaling law experiments, which I personally thought would change the world.
I actually wrote a report to Robin explaining the implications of scaling laws and asking for a $1 billion budget to train a Baidu LLM in 2016 and sat on it through 2017.
But I never sent it because I thought it would never have been supported in that environment. I sometimes wonder what Robin would have thought about it and how the world may have been different if Baidu had released ChatGPT.
We may be about to find out because the AI moat filled with simple algorithms and scale seems to be much more shallow than the processor and systems moat.
I have a huge amount of respect for Dario and Ilya for carrying on scaling laws at OpenAI or it may have never seen the light of day.
If there is one problem for the AI community to solve by 2030 I think it is the moat problem.
I’m excited but most people are patriotic and I feel things like this or even the whole situation with BYD producing better cars then Tesla is something people take as an attack to their identity. If not an attack it’s definitely represents an eroding of their patriotic identity.
Unfortunately Trump can’t slap a tariff on this. Maybe he can ban it like he was going to do with TikTok? The US really needs to get off its high horse and not associate its identity with being the sole economic super power in the world.
BYD being better than Tesla isn't a matter of patriotism in most of the world. DeepSeek and Baidu can spend as long as they want playing musical chairs/rap battles with Anthropic and OpenAI, it makes no odds to me which wins.
America and China both have politics that have no reason to care for people like me, nor people like my friends, that they are for different reasons and differ in penalties for being an out-group doesn't matter when I'm a foreigner to both, when my antecedent are who the 13 Colonies rebelled against and more recent antecedent forced unwanted opium sales on China.
Democracy has its flaws, but one of the features that most people prefer is that it can significantly change how it looks and operates to reflect the will of its people without violence.
China and the US are obviously very different culturally in just about every way possible. This difference makes for great competition. Someone in another topic mentioned something that seemed pretty insightful to me - in that where LLM companies failed in the US was in basically becoming clones of each other, whereas DeepSeek (and now perhaps Baidu) were going in a different way, and that way turned out to be better.
US companies will inevitably copy these strategies, one way or the other, as will Chinese companies copy what ends up working well from the US (see their latest rockets looking more than a little inspired by Starship). And the true competitiveness ensures in the end that the main people who will win will not be whichever guy ended up founding an AI company first, but you and I. It's how capitalism is supposed to work - companies beat themselves down into a race to the bottom, and society reaps the rewards. It only gets really messed up when there's no "real" competition, which is an increasingly frequent state of affairs. But that definitely will not be the case here.
Expect the same thing from India in the future as well. Their economy is advancing rapidly, and soon enough we're going to have another 1.4 billion people able to fully utilize the outliers such a population entails to similarly drive things forward in their own unique way. It's a great future for the world as a whole.
and don't start on some dictator BS. the US does/has done as many, if not more, bad things as china.
https://x.com/Baidu_Inc/status/1890292032318652719
Not to say there can't be actual interesting improvements in performance/cost, but in many cases it will be more of a marketing angle.
Comparison models: https://x.com/Baidu_Inc/status/1901094083508220035/photo/1
It can take about 20 seconds to return all tokens so it appears likely the login prompt is there to minimise resource consumption.
[1] https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...
The models aren't what worry me anyway. China is going to kick our ass when it comes to AI integration into society and the economy.
Imagine the difficulties faced by America vs China in integrating AI into healthcare.
We are just too worried about winning this AI model sporting event even though the entire concept is flawed and doomed to failure. We actually have to figure out how to use these models for more than how many Rs are in strawberry. That appears to be the actual hard part.
Of course, none of this is helped by having wasted an entire generation of some of America's best minds on javascript programming for obscene profit.
It's not a failed experiment, it's a very good experiment, because it produced a very useful piece of information for the world (that there's limited return to further size scaling).