"TensorWave is a cloud provider specializing in AI workloads. Their platform leverages AMD’s Instinct™ MI300X accelerators, designed to deliver high performance for generative AI workloads and HPC applications."
H100 SXM4 has 52% of the transistors MI300X has, half of the RAM and MI300X achieves *ONLY* 33% higher throughput compared to the H100. MI300X was launched 6 months ago, H100 20 months ago.
AMDs deep learning libraries are very bad the last time I checked, nobody uses amd in that space for that reason. Nvidia has a quazi monopoly, that's the main reason for the price difference IMHO.
Isn't SXM5 higher bandwidth? It's 900 GB/s of bidirectional bandwidth per GPU across 18 NVLink 4 channels. The NVL's are on PCIe 5, and even w/ NVLink only get to 600 GB/s of bandwidth across 3 NVLink bridges (across only pairs of cards)?
I haven't done a head to head and I suppose it depends on whether tensor parallelism actually scales linearly or not, but my understanding is since the NVL's are just PCIe/NVLink paired H100s, you're not really getting much if any benefit on something like vLLM.
I think the more interesting thing critique might be the slightly odd choice of Mixtral 8x7B vs say a more standard Llama2/3 70B (or just test multiple models including some big ones like 8x22B or DBRX.
Also, while I don't have a problem w/ vLLM, as TensorRT gets easier to set up, it might become a factor in comparisons (since they punted on FP8/AMP in this tests). Inferless published a shootoff a couple months ago comparing a few different inference engines: https://www.inferless.com/learn/exploring-llms-speed-benchma...
Price/perf does tell a story, but I think it's one that's mostly about Nvidia's platform dominance and profit margins more than intrinsic hardware advantages. On the spec sheet MI300X has a memory bandwidth and even raw FLOPS advantage but so far it has lacked proper software optimization/support and wide availability (has anyone besides hyperscalers and select partners been able to get them?)
fair? h100 NVL are two h100 in a single package.. which probably costs 2xh100 or more,
if so ok it's fair to compare 1 mi300x with 1 h100 NVL but then price ( and tco ) should be added to the some metrics conclusion , also the NVL is a 2xpci5.0 quad slot , so not the same thing..
I am not sure about system compatibility and if and how you can stack 8 of those in one system ( like you can do with non NVL and mi300x.. ) so it's a bit a diffent ( and more niche ) beast
Well, there's the beauty of specifying exactly how you ran your benchmark, it is easy to reproduce and disprove or confirm (assuming you got the hardware).
Also, stuff like this is hard to take the results seriously:
* To make an accurate comparison between the systems with different settings of tensor parallelism, we extrapolate throughput for the MI300X by 2.
* All inference frameworks are configured to use FP16 compute paths. Enabling FP8 compute is left for future work.
They did everything they can to make sure AMD is faster.
It's not just the query (if you're running a chatbot, which many of us are not). It's the entire context window. It's not uncommon to have a system prompt that is > 512 tokens alone.
I would like to see benchmarks for 512, 1024, 4096 and 8192 token inputs.
Including the initialization prompt and your history if you have one?
I use ChatGPT for a very simple task, to map chat messages to one of 5 supported function calls, and the function definitions alone already take up 200 tokens I think
It's not just the current prompt, but the whole conversation, if possible. Or, if you want the AI to summarise an article, the article has to fit in.
If I understood that correctly, context length is something like session storage or short term memory. If it's too small the AI starts to forget what it's talking about.
IMO the relevant benchmark for now is a mixed stream of requests with 50 (20%), 500 (50%), 2000 (10%) and 50k (20%) input tokens, ignore EOS and decode until you get around 300 output tokens.
I try to be optimistic about this. Competition is absolutely needed in this space - $NVDA market cap is insane right now, about $0.6 trillion more than the entire Frankfurt Stock Exchange.
It's more how little the Frankfurt stock Exchange is worth. And European devs keep wondering why our wages are lower than in the US for the same work. That's why.
The DAX is only 40 companies, most of which make real products rather than advertising mechanisms. Making real physical things just doesn't scale, and never will.
While I would enjoy a US tech salary, I'm not sure we want a world where all manufacturing is set aside to focus on the attention economy.
Nvidia value deserves to be much higher than any company on the DAX (maybe all of them together, as it currently is) - but how much of that current value is real rather than an AI speculation bubble?
Wages is a proxy of how valuable your work is, but not a measure of how value your work is. To support a high salary something has to happen, either the product sold is very expensive or it's being subsidized by investors. No company can pay its employees above what they are able to generate selling the product they worked on indefinitely.
Frankfurt Stock Exchange or the DAX is mostly irrelevant. Germany has a strong, family-owned Mittelstand, those companies are not publicly traded and thus not listed. Plus, we have some giants that are also not publicly listed but belong to the richest Germans (Lidl, Aldi of discount groceries, but also automotive OEM Bosch).
Same thing was said about Nvidia's crypto bubbles, and then look what happened.
Jensen isn't stupid. He's making accelerators for anything so that they'll be ready to catch the next bubble that depends on crazy compute power that can't be done efficiently on CPUs. They're so far the only semi company beating Moore's law by a large margin due to their clever scaling tech while everyone else is like "hey look our new product is 15% more efficient and 15% more IPC than the one we launched 3 years ago".
They may be overvalued now but they definitely won't crash back to their "just gaming GPUs" days.
I’m a AI Scientist and train a lot of models. Personally I think AMD is undervalued relative to Nvidia. No, chips aren’t as fast as Nvidia’s latest and yes, there are some hoops to get things working. But for most workloads in most industries (ignoring for the moment that AI is likely a poor use of capital), it will be much more cost effective and achieve about the same results.
The market (and selling price) is reflecting the perceived value of nvidia's solution vs AMDs - comprehensively including tooling, software, TCO and managability.
Also curious how many companies are dropping that much money on those kind of accelerators just to run 8x 7B param models in parallel... You're also talking about being able to train a 14B model on a single accelerator. I'd be curious to see how "full-accelerator train and inferrence" workloads would look ie: Training a 14B param model then inferrence throughput on a 4x14B workload.
AMD (and almost every other inferrence claim maker so far... intel and apple specifically) have consistently cherry picked the benchmarks to claim a win over, and ignored the remainder which all show nvidia in the lead - and they've used mid-gen comparison models as many commenters here pointed out in this article.
mi300x win in some inference workloads, h100 win in training and some others inference workloads ( fp8 inference with tensorRT-llm , rocm is young but is growing fast )
in a single system ( 8x accelerators ) LLMs, mi300x has very competitive inference TCO vs h100 .
also :
AMD Instinct MI300X Offers The Best Price To Performance on GPT-4 According To Microsoft, Red Team On-Track For 100x Perf/Watt By 2027
the market and the selling price also includes sales strategies, penetrating a sector dominated by a strong player with somewhat "smart" sales strategies
*1
and with a growing but certainly less mature product ( expecially software ), it requires suitable pricing and allocation strategies
the price of h100 reflects and reflected the fact that there is a total monopoly in the training sector,
amd is successfully attacking the inference sector, increasing its advantage with mi325 and aiming for training from 2025 with mi350 (and Infinity Fabric interconnect and other types of interconnection that are arriving for the various topologies), which will probably have an advantage over blackwell, and then fall back against rubin and come back ahead against mi400,
at least, this is what it seems, and as long as the rocm continues to improve.
Personally I am happy to see some competition in the sector and especially on open source software
This stuff is the actual reason nvidia is under antitrust investigation.
boo boo, a GTX 670 that cost you $399 in 2012 now costs $599 - grow up, do the inflation calculation, and realize you’re being a child. gamers get the best deal on bulk silicon on the planet, R&D subsidized by enterprise, fantastic blue-sky research that takes years for competitors to (not even) match, and it’s still never enough. ”Gamers” have justified every single cliche and stereotype over the last 5 years, absolutely inveterate manbabies.
(Hardware Unboxed put out a video today with the headline+caption combo “are gamers entitled”/“are GeForce gpus gross”, and that’s what passes for reasoned discourse among the most popular channels. They’ve been trading segments back and forth with GN that are just absolute “how bad is nvidia” “real bad, but what do you guys think???” tier shit, lmao.
this stuff is real shit, nvidia has been leaning on partners to maintain their segmentation, micromanaging shipment release to maintain price levels (cartel behavior), punishing customers and suppliers with “you know what will happen if you cross us”, literally putting it in writing with GPP (big mistake), playing fuck fuck games with not letting the drivers be run in a datacenter, etc. You see how that’s a little different than a gpu going from an inflation-adjusted $570 to $599 over 10 years?
(And what’s worse the competition can’t even keep that much, they’re falling off even harder now that Moores law has really kicked the bucket and they have to do architectural work every gen just to make progress, instead of getting free shrinks etc… let alone having to develop software! /gasp)
In entirely unrelated news… gigabyte suddenly has a 4070 ti super with a blower cooler. Oh, and it’s single-slot with end-fire power connector. All three forbidden features at once - very subtle, extremely law-abiding.
mi300x production is ramping , in latest earning report lisa su said 1H2024 is production capped , 2h2024 have increased production ( and still have some to sell ), thanks probably to cowos and hbm3/(e?) supply improved
large orders for those accelerators are placed months ahead
"Scalable AI infrastructure running the capable OpenAI models
These VMs, and the software that powers them, were purpose-built for our own Azure AI services production workloads. We have already optimized the most capable natural language model in the world, GPT-4 Turbo, for these VMs. ND MI300X v5 VMs offer leading cost performance for popular OpenAI and open-source models."
I'm wondering if the tensor parallel settings have any impact on the performance. My naive guess is yes but not sure.
According to the article:
"""
AMD Configuration: Tensor parallelism set to 1 (tp=1), since we can fit the entire model Mixtral 8x7B in a single MI300X’s 192GB of VRAM.
NVIDIA Configuration: Tensor parallelism set to 2 (tp=2), which is required to fit Mixtral 8x7B in two H100’s 80GB VRAM.
"""
I personally find such comparisons unfair. A good comparison should optimize for each device configuration, which means use a model within the VRAM limit and quantize to 8 bits where it boosts performance etc and avoid shortcomings of both devices unless necessary.
AMD has better seemingly better hardware - but not the production capacity to compete with Nvidia yet. Will be interesting to see margins compress when real competition catches up.
Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.
are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).
some people emit llvm ir (maaaaybe ptx) directly instead of using the C/C++ frontend to CUDA. that's absolutely the only optional part of the stack and also basically the most trivial (i.e., it's not the frontend that's hard but the target codegen).
Yes it is - but Nvidia has larger contracts _right now_. Nvidia has been investing more money in producing more GPUs for longer, so it’s only natural that they have an advantage now.
But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.
TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.
Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.
A good start for AMD. I am also enthusiastic about another non-NVidea inference option: Groq (which I sometimes use).
NVidia relies on TMSC for manufacturing. Samsung is building competing manufacturing infrastructure which is also a good thing, so Taiwan is not a single point of failure.
I suggest taking the report with a grain of salt.
The do the standard AMD comparison:
The fair comparison would be against Price tells a story. If AMD performance would be in par with Nvidia they would not sell their cards for 1/4 price.AMD has work to do.
I haven't done a head to head and I suppose it depends on whether tensor parallelism actually scales linearly or not, but my understanding is since the NVL's are just PCIe/NVLink paired H100s, you're not really getting much if any benefit on something like vLLM.
I think the more interesting thing critique might be the slightly odd choice of Mixtral 8x7B vs say a more standard Llama2/3 70B (or just test multiple models including some big ones like 8x22B or DBRX.
Also, while I don't have a problem w/ vLLM, as TensorRT gets easier to set up, it might become a factor in comparisons (since they punted on FP8/AMP in this tests). Inferless published a shootoff a couple months ago comparing a few different inference engines: https://www.inferless.com/learn/exploring-llms-speed-benchma...
Price/perf does tell a story, but I think it's one that's mostly about Nvidia's platform dominance and profit margins more than intrinsic hardware advantages. On the spec sheet MI300X has a memory bandwidth and even raw FLOPS advantage but so far it has lacked proper software optimization/support and wide availability (has anyone besides hyperscalers and select partners been able to get them?)
I don't think it should be ignored, especially when the power consumption is similar.
if so ok it's fair to compare 1 mi300x with 1 h100 NVL but then price ( and tco ) should be added to the some metrics conclusion , also the NVL is a 2xpci5.0 quad slot , so not the same thing..
I am not sure about system compatibility and if and how you can stack 8 of those in one system ( like you can do with non NVL and mi300x.. ) so it's a bit a diffent ( and more niche ) beast
What were your thoughts on Zen (1) vs Intel's offerings then? AMD offered more back for the buck then too.
Fun weekend project for anybody.
Deleted Comment
Also, stuff like this is hard to take the results seriously:
They did everything they can to make sure AMD is faster.What would be a suitable input length in your oppinion?
And why isnt this a good one: Are real-life queries shorter? Or longer?
If i count one word as a token, then in my case most of the queries are less than 128 words.
It's not just the query (if you're running a chatbot, which many of us are not). It's the entire context window. It's not uncommon to have a system prompt that is > 512 tokens alone.
I would like to see benchmarks for 512, 1024, 4096 and 8192 token inputs.
If I understood that correctly, context length is something like session storage or short term memory. If it's too small the AI starts to forget what it's talking about.
While I would enjoy a US tech salary, I'm not sure we want a world where all manufacturing is set aside to focus on the attention economy.
Nvidia value deserves to be much higher than any company on the DAX (maybe all of them together, as it currently is) - but how much of that current value is real rather than an AI speculation bubble?
Nah, then ill get my very good wagie pennies here and have plenty jobs available, plus good health insurrance and whatnot.
But there's a long list of German companies not on the DAX
(though Germany DAX really deserves to be worth less than NVidia)
Deleted Comment
Nvidia problem will sort itself out naturally in the coming months/years.
Jensen isn't stupid. He's making accelerators for anything so that they'll be ready to catch the next bubble that depends on crazy compute power that can't be done efficiently on CPUs. They're so far the only semi company beating Moore's law by a large margin due to their clever scaling tech while everyone else is like "hey look our new product is 15% more efficient and 15% more IPC than the one we launched 3 years ago".
They may be overvalued now but they definitely won't crash back to their "just gaming GPUs" days.
Also curious how many companies are dropping that much money on those kind of accelerators just to run 8x 7B param models in parallel... You're also talking about being able to train a 14B model on a single accelerator. I'd be curious to see how "full-accelerator train and inferrence" workloads would look ie: Training a 14B param model then inferrence throughput on a 4x14B workload.
AMD (and almost every other inferrence claim maker so far... intel and apple specifically) have consistently cherry picked the benchmarks to claim a win over, and ignored the remainder which all show nvidia in the lead - and they've used mid-gen comparison models as many commenters here pointed out in this article.
in a single system ( 8x accelerators ) LLMs, mi300x has very competitive inference TCO vs h100 .
also :
AMD Instinct MI300X Offers The Best Price To Performance on GPT-4 According To Microsoft, Red Team On-Track For 100x Perf/Watt By 2027
https://wccftech.com/amd-instinct-mi300x-best-price-performa...
and with a growing but certainly less mature product ( expecially software ), it requires suitable pricing and allocation strategies
1. https://www.techspot.com/news/102056-nvidia-allegedly-punish...
amd is successfully attacking the inference sector, increasing its advantage with mi325 and aiming for training from 2025 with mi350 (and Infinity Fabric interconnect and other types of interconnection that are arriving for the various topologies), which will probably have an advantage over blackwell, and then fall back against rubin and come back ahead against mi400,
at least, this is what it seems, and as long as the rocm continues to improve.
Personally I am happy to see some competition in the sector and especially on open source software
boo boo, a GTX 670 that cost you $399 in 2012 now costs $599 - grow up, do the inflation calculation, and realize you’re being a child. gamers get the best deal on bulk silicon on the planet, R&D subsidized by enterprise, fantastic blue-sky research that takes years for competitors to (not even) match, and it’s still never enough. ”Gamers” have justified every single cliche and stereotype over the last 5 years, absolutely inveterate manbabies.
(Hardware Unboxed put out a video today with the headline+caption combo “are gamers entitled”/“are GeForce gpus gross”, and that’s what passes for reasoned discourse among the most popular channels. They’ve been trading segments back and forth with GN that are just absolute “how bad is nvidia” “real bad, but what do you guys think???” tier shit, lmao.
https://i.imgur.com/98x0F1H.png
this stuff is real shit, nvidia has been leaning on partners to maintain their segmentation, micromanaging shipment release to maintain price levels (cartel behavior), punishing customers and suppliers with “you know what will happen if you cross us”, literally putting it in writing with GPP (big mistake), playing fuck fuck games with not letting the drivers be run in a datacenter, etc. You see how that’s a little different than a gpu going from an inflation-adjusted $570 to $599 over 10 years?
(And what’s worse the competition can’t even keep that much, they’re falling off even harder now that Moores law has really kicked the bucket and they have to do architectural work every gen just to make progress, instead of getting free shrinks etc… let alone having to develop software! /gasp)
In entirely unrelated news… gigabyte suddenly has a 4070 ti super with a blower cooler. Oh, and it’s single-slot with end-fire power connector. All three forbidden features at once - very subtle, extremely law-abiding.
https://videocardz.com/newz/gigabyte-unveils-geforce-rtx-407...
and literally gamers can’t help but think this whole ftc case is all about themselves anyway…
large orders for those accelerators are placed months ahead
meanwhile mi300x on microsoft are fully booked...
https://techcommunity.microsoft.com/t5/azure-high-performanc...
"Scalable AI infrastructure running the capable OpenAI models These VMs, and the software that powers them, were purpose-built for our own Azure AI services production workloads. We have already optimized the most capable natural language model in the world, GPT-4 Turbo, for these VMs. ND MI300X v5 VMs offer leading cost performance for popular OpenAI and open-source models."
According to the article: """ AMD Configuration: Tensor parallelism set to 1 (tp=1), since we can fit the entire model Mixtral 8x7B in a single MI300X’s 192GB of VRAM.
NVIDIA Configuration: Tensor parallelism set to 2 (tp=2), which is required to fit Mixtral 8x7B in two H100’s 80GB VRAM. """
Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.
lol completely made up.
are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).
source: i work at one of these "mega corps". hell if you don't believe me go look at how many CUDA kernels pytorch has https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/n....
> Everybody thinks it’s CUDA that makes Nvidia the dominant player.
it 100% does
thats just a question of negotiating with tsmc or their few competitors
(also didn't tsmc start production of some factories in the US and/or EU?)
I mean, nvidia use tsmc, so does amd.
But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.
TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.
Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.
NVidia relies on TMSC for manufacturing. Samsung is building competing manufacturing infrastructure which is also a good thing, so Taiwan is not a single point of failure.