Its fascinating to me that someone in their early 20s in the 2008 worldwide economic recession to have that much economic success.
The fact that this guy could see that massive data analysis with was a winning investment strategy and then out compete others with way more experience in financial markets is impressive.
I’d be curious in the markets he initially invested in. Was this a market inefficiency specifically in China in the late 2000s?
I’ve always assumed that quantitative analysis requires PhD level knowledge of markets and mathematics but maybe I’m being way too conservative?
Can we stop this drawn-out narrative that Deepseek is at the level of Gemini or o3? It’s brilliant in its own way but for some reason a lot of journalists think it’s still at par with American frontier models.
It’s funny, R1 came out and matched 4o/o1 at the time, you could claim it was very slightly behind but it was basically even.
It’s been 6 months? Geminis big upgrade was 2 months ago and o3 even more recent.
It’s just funny that US companies just barely got ahead the last couple months and already it’s a “drawn out narrative” that they aren’t ahead.
For all we know R2 drops tomorrow? If it’s ahead or even how are we supposed to think about the narrative?
IMO it’s not really that much of a stretch to say they’re fairly close together. I’d want to wait 6 more months where the US stayed significantly ahead before I’d be complaining about narratives. I know things move fast but that’s all the more reason to wait and see.
But didn't R1 use openai/google models to generate the data to train on? So the only reason R1 could exist is necessarily because those models predated it.
I'd reference something like https://llm-stats.com/ which suggests that the story is ... muddled. On the one hand, Deepseek is clearly not leading. On the other hand, they aren't really "behind" in any sense I care about. They'd have world-leading performance with their models this time last year.
The field is really moving too quickly to talk too certainly about "dominance" or "ahead". My observation is projects I care about on GitHub come with a Chinese README and many interesting talkers at conferences have strong Chinese accents. But I know a good researcher personally and it isn't so apparent to me if these are Chinese Chinese people or Americans of recent Chinese descent.
Americans can raise more cash. They are still pretty unbeatable on that front. So until that changes they will always be ahead no matter what happens on the tech front.
Journalists give what their readers want, and what they want is a discussion about a US-China race or "AI". There is also an equity ownership aspect as well, because tech stocks in China tend to be the primary market in the green within the larger SSE and Hang Sen, and a DeepSeek/AI story makes China oriented emerging market ETFs much more enticing. Same reason you see much more financial reporting in American business news about India now that Indian equities are now available in emerging market ETFs.
That said, Deepseek is a decent model and was the forcing function needed to give a reality check to a number of AI Startups (and has had the positive effect of making it easier for startups I've helped incubate make the case for their own domain specific foundation model strategy). It's impact shouldn't be understated.
In absolute scores, no one is leading. They all plateaued around the same level. The difference is that models are optimized in different ways. This makes R1 useful/ahead for some people but not for others.
However, on cost, R1 beats the Western models by miles.
I use Qwen 2.5, it works better for my tasks than larger models.
(But I use it for actual work, not for chatting with imaginary friends. Maybe you really do need a "frontier model" if you want to monetize imaginary friends. I woun't know or care.)
"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of
the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that
the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs
associated with prior research and ablation experiments on architectures, algorithms, or data."https://arxiv.org/pdf/2412.19437
They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.
The accuracy of this comparison is highly speculative. One should not ignore the possibility that dominant firms in the market might be inflating their cost figures to block new entrants and extract more capital from investors through such narratives. When you compare electricity prices in China with those in the U.S., such a large gap would require a truly extraordinary breakthrough to be justified.After all, these are privately held commercial firms, and they are not obligated to disclose their financials accurately.
The provided source for every concrete figure on DS in that article is "we are confident that", "we believe" or something equivalent. How is it any better researched than any other article with a conflicting set of beliefs?
I'll put my tinfoil hat on and say it plays to the current US vs China "propaganda" tune, that US is winning on all fronts, but the ice thin and have to support local tech behemoths to full extent to secure our position in this world defining struggle.
It's not US vs China. It increasingly looks like China VS a conglomerate of multi national companies with foreign born billionaire CEOs whose HQs happen to be located in the USA.
The article specifically says "it's likely this sum referred only to the final training run—a data-refinement process that transforms a model’s previous prototypes into a complete product—but many people perceived it as an insanely low budget for the entire project." The article also delves into the SemiAnalysis report, as well as denials from ex-DeepSeek employees.
Bloomberg still has not retracted (or even really commented on) the Supermicro spy chip story, preferring to hope people just forget about it if they maintain total silence. They're fine if you need to look up where the Nasdaq closed yesterday, but don't expect serious tech reporting from them.
> not retracted (or even really commented on) the Supermicro spy chip story
They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.
You’re really naive. Bloomberg is one of the better news outlets, and I would put no weight on Supermicro’s denials, as they have a pattern of lying about financials and have a sweepingly broad supply chain vulnerable to sabotage and counterfeiting.
The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.
There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.
Its important to remember, that whatever the story behind deepseek, it's hard to believe they accomplished this feat with the same or more resources then the american companies. Which is to say, its safe to assume that at most, they had fewer resources then the american counterparts but created a model that is just as good. So regardless of the narrative, they deserve respect for that, let alone for the amount of open source information and weights they provided.
Feels to me that it's Google which has done the most recently to optimize the cost/performance-ratio of these models and no one seems to be talking about it.
The fact that this guy could see that massive data analysis with was a winning investment strategy and then out compete others with way more experience in financial markets is impressive.
I’d be curious in the markets he initially invested in. Was this a market inefficiency specifically in China in the late 2000s?
I’ve always assumed that quantitative analysis requires PhD level knowledge of markets and mathematics but maybe I’m being way too conservative?
It would mean some harsh years at first, but it’s a good time to hit the market.
I remember being told I’d never be successful, or make as much money as my parents.
I only wish I hadn’t listened to those people so long.
A good market helps you become a bog standard boring wage slave, maybe get a mortgage, etc.
The outsized success folks will go out and get what they need regardless, they aren’t waiting for it to come to them.
Incredible that people would see the notion of having a moderately successful white collar career (maybe get a mortgage etc) as "boring wage slave".
This is nonsense. Other than the local mafia, almost all extremely successful folk live in extremely affluent markets.
It’s been 6 months? Geminis big upgrade was 2 months ago and o3 even more recent.
It’s just funny that US companies just barely got ahead the last couple months and already it’s a “drawn out narrative” that they aren’t ahead.
For all we know R2 drops tomorrow? If it’s ahead or even how are we supposed to think about the narrative?
IMO it’s not really that much of a stretch to say they’re fairly close together. I’d want to wait 6 more months where the US stayed significantly ahead before I’d be complaining about narratives. I know things move fast but that’s all the more reason to wait and see.
I hope that R2 releases tomorrow and you enjoy some presumed clairvoyance for a minute.
Dead Comment
The field is really moving too quickly to talk too certainly about "dominance" or "ahead". My observation is projects I care about on GitHub come with a Chinese README and many interesting talkers at conferences have strong Chinese accents. But I know a good researcher personally and it isn't so apparent to me if these are Chinese Chinese people or Americans of recent Chinese descent.
That said, Deepseek is a decent model and was the forcing function needed to give a reality check to a number of AI Startups (and has had the positive effect of making it easier for startups I've helped incubate make the case for their own domain specific foundation model strategy). It's impact shouldn't be understated.
However, on cost, R1 beats the Western models by miles.
It makes for interesting television.
For that reason, it probably won't stop anytime soon.
(But I use it for actual work, not for chatting with imaginary friends. Maybe you really do need a "frontier model" if you want to monetize imaginary friends. I woun't know or care.)
Western models have been proven to be heavily censored, under the guise of fighting antisemitism for example.
Dead Comment
>DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4
"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." https://arxiv.org/pdf/2412.19437
They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.
Typical models are now trained on clusters of roughly 20K GPUs. Even if you get a volume discount you still need cabling, switches, etc…
The minimum entry price to play in the game at this level is about 200-500 million dollars.
Meta spent something like $10B on their AI compute, for comparison.
If you believe that Deepseek was released to undercut US AI value (duh) it makes no sense to take the official line as the absolute truth.
[edit] also they seem to be saying r1 is a base model, which it is not. Very sloppy.
They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.
https://www.servethehome.com/dude-dell-hpe-ami-american-mega...
But yeah, saying the chips are everywhere is BS.
The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.
There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.
Dead Comment