AMD's MI300X Outperforms Nvidia's H100 for LLM Inference

The salt is in the plain sight.

The do the standard AMD comparison:

  8x AMD MI300X (192GB, 750W) GPU  
  8x H100 SXM5 (80GB, 700W) GPU

The fair comparison would be against

  8x H100 NVL (188GB, <800W) GPU

Price tells a story. If AMD performance would be in par with Nvidia they would not sell their cards for 1/4 price.

nabla9 · 2 years ago

                 MTr
  ------------------
  H100 SXM5   80,000 
  MI300X     153,000
  H100 NVL   160,000

H100 SXM4 has 52% of the transistors MI300X has, half of the RAM and MI300X achieves *ONLY* 33% higher throughput compared to the H100. MI300X was launched 6 months ago, H100 20 months ago.

AMD has work to do.

fleischhauf · 2 years ago

AMDs deep learning libraries are very bad the last time I checked, nobody uses amd in that space for that reason. Nvidia has a quazi monopoly, that's the main reason for the price difference IMHO.

lhl · 2 years ago

Isn't SXM5 higher bandwidth? It's 900 GB/s of bidirectional bandwidth per GPU across 18 NVLink 4 channels. The NVL's are on PCIe 5, and even w/ NVLink only get to 600 GB/s of bandwidth across 3 NVLink bridges (across only pairs of cards)?

I haven't done a head to head and I suppose it depends on whether tensor parallelism actually scales linearly or not, but my understanding is since the NVL's are just PCIe/NVLink paired H100s, you're not really getting much if any benefit on something like vLLM.

I think the more interesting thing critique might be the slightly odd choice of Mixtral 8x7B vs say a more standard Llama2/3 70B (or just test multiple models including some big ones like 8x22B or DBRX.

Also, while I don't have a problem w/ vLLM, as TensorRT gets easier to set up, it might become a factor in comparisons (since they punted on FP8/AMP in this tests). Inferless published a shootoff a couple months ago comparing a few different inference engines: https://www.inferless.com/learn/exploring-llms-speed-benchma...

Price/perf does tell a story, but I think it's one that's mostly about Nvidia's platform dominance and profit margins more than intrinsic hardware advantages. On the spec sheet MI300X has a memory bandwidth and even raw FLOPS advantage but so far it has lacked proper software optimization/support and wide availability (has anyone besides hyperscalers and select partners been able to get them?)

ebalit · 2 years ago

But the price should be a factor. Your fair comparison would match a ~60k$ setup to a 20k$ according to prices we can find online.

I don't think it should be ignored, especially when the power consumption is similar.

fvv · 2 years ago

fair? h100 NVL are two h100 in a single package.. which probably costs 2xh100 or more,

if so ok it's fair to compare 1 mi300x with 1 h100 NVL but then price ( and tco ) should be added to the some metrics conclusion , also the NVL is a 2xpci5.0 quad slot , so not the same thing..

I am not sure about system compatibility and if and how you can stack 8 of those in one system ( like you can do with non NVL and mi300x.. ) so it's a bit a diffent ( and more niche ) beast

sangnoir · 2 years ago

> Price tells a story. If AMD performance would be in par with Nvidia they would not sell their cards for 1/4 price

What were your thoughts on Zen (1) vs Intel's offerings then? AMD offered more back for the buck then too.

winux-arch · 2 years ago

Price tells the story. Yes but for electric prices not card prize and here their much more close to each other!

resource_waste · 2 years ago

Thx! Anyone who says Nivida isnt king, needs a reality check.

* To make an accurate comparison between the systems with different settings of tensor parallelism, we extrapolate throughput for the MI300X by 2. * All inference frameworks are configured to use FP16 compute paths. Enabling FP8 compute is left for future work.

The market (and selling price) is reflecting the perceived value of nvidia's solution vs AMDs - comprehensively including tooling, software, TCO and managability.

Also curious how many companies are dropping that much money on those kind of accelerators just to run 8x 7B param models in parallel... You're also talking about being able to train a 14B model on a single accelerator. I'd be curious to see how "full-accelerator train and inferrence" workloads would look ie: Training a 14B param model then inferrence throughput on a 4x14B workload.

AMD (and almost every other inferrence claim maker so far... intel and apple specifically) have consistently cherry picked the benchmarks to claim a win over, and ignored the remainder which all show nvidia in the lead - and they've used mid-gen comparison models as many commenters here pointed out in this article.

fvv · 2 years ago

mi300x win in some inference workloads, h100 win in training and some others inference workloads ( fp8 inference with tensorRT-llm , rocm is young but is growing fast )

in a single system ( 8x accelerators ) LLMs, mi300x has very competitive inference TCO vs h100 .

also :

AMD Instinct MI300X Offers The Best Price To Performance on GPT-4 According To Microsoft, Red Team On-Track For 100x Perf/Watt By 2027

https://wccftech.com/amd-instinct-mi300x-best-price-performa...

lostmsu · 2 years ago

wccftech is an untrustworthy source.

fvv · 2 years ago

the market and the selling price also includes sales strategies, penetrating a sector dominated by a strong player with somewhat "smart" sales strategies *1

and with a growing but certainly less mature product ( expecially software ), it requires suitable pricing and allocation strategies

1. https://www.techspot.com/news/102056-nvidia-allegedly-punish...

fvv · 2 years ago

the price of h100 reflects and reflected the fact that there is a total monopoly in the training sector,

amd is successfully attacking the inference sector, increasing its advantage with mi325 and aiming for training from 2025 with mi350 (and Infinity Fabric interconnect and other types of interconnection that are arriving for the various topologies), which will probably have an advantage over blackwell, and then fall back against rubin and come back ahead against mi400,

at least, this is what it seems, and as long as the rocm continues to improve.

Personally I am happy to see some competition in the sector and especially on open source software

paulmd · 2 years ago

This stuff is the actual reason nvidia is under antitrust investigation.

boo boo, a GTX 670 that cost you $399 in 2012 now costs $599 - grow up, do the inflation calculation, and realize you’re being a child. gamers get the best deal on bulk silicon on the planet, R&D subsidized by enterprise, fantastic blue-sky research that takes years for competitors to (not even) match, and it’s still never enough. ”Gamers” have justified every single cliche and stereotype over the last 5 years, absolutely inveterate manbabies.

(Hardware Unboxed put out a video today with the headline+caption combo “are gamers entitled”/“are GeForce gpus gross”, and that’s what passes for reasoned discourse among the most popular channels. They’ve been trading segments back and forth with GN that are just absolute “how bad is nvidia” “real bad, but what do you guys think???” tier shit, lmao.

https://i.imgur.com/98x0F1H.png

this stuff is real shit, nvidia has been leaning on partners to maintain their segmentation, micromanaging shipment release to maintain price levels (cartel behavior), punishing customers and suppliers with “you know what will happen if you cross us”, literally putting it in writing with GPP (big mistake), playing fuck fuck games with not letting the drivers be run in a datacenter, etc. You see how that’s a little different than a gpu going from an inflation-adjusted $570 to $599 over 10 years?

(And what’s worse the competition can’t even keep that much, they’re falling off even harder now that Moores law has really kicked the bucket and they have to do architectural work every gen just to make progress, instead of getting free shrinks etc… let alone having to develop software! /gasp)

In entirely unrelated news… gigabyte suddenly has a 4070 ti super with a blower cooler. Oh, and it’s single-slot with end-fire power connector. All three forbidden features at once - very subtle, extremely law-abiding.

https://videocardz.com/newz/gigabyte-unveils-geforce-rtx-407...

and literally gamers can’t help but think this whole ftc case is all about themselves anyway…

fvv · 2 years ago

mi300x production is ramping , in latest earning report lisa su said 1H2024 is production capped , 2h2024 have increased production ( and still have some to sell ), thanks probably to cowos and hbm3/(e?) supply improved

large orders for those accelerators are placed months ahead

meanwhile mi300x on microsoft are fully booked...

https://techcommunity.microsoft.com/t5/azure-high-performanc...

"Scalable AI infrastructure running the capable OpenAI models These VMs, and the software that powers them, were purpose-built for our own Azure AI services production workloads. We have already optimized the most capable natural language model in the world, GPT-4 Turbo, for these VMs. ND MI300X v5 VMs offer leading cost performance for popular OpenAI and open-source models."

m_a_g · 2 years ago

"TensorWave is a cloud provider specializing in AI workloads. Their platform leverages AMD’s Instinct™ MI300X accelerators, designed to deliver high performance for generative AI workloads and HPC applications."

I suggest taking the report with a grain of salt.

epolanski · 2 years ago

Well, there's the beauty of specifying exactly how you ran your benchmark, it is easy to reproduce and disprove or confirm (assuming you got the hardware).

scotty79 · 2 years ago

As easy as getting yourself 8 H100 and 8 MI300X.

Fun weekend project for anybody.

Deleted Comment

impulser_ · 2 years ago

If they used Nvidia's chip would this somehow make the blog post better?

aurareturn · 2 years ago

For one, they didn't use TensorRT in the test.

Also, stuff like this is hard to take the results seriously:

They did everything they can to make sure AMD is faster.

qeternity · 2 years ago

Why the hell are we doing 128 input token benchmarks in 2024. This is not representative of most workloads, and prefill perf is incredibly important.

ta12653421 · 2 years ago

For understanding:

What would be a suitable input length in your oppinion?

And why isnt this a good one: Are real-life queries shorter? Or longer?

If i count one word as a token, then in my case most of the queries are less than 128 words.

I think today 512 tokens is a minimum.

It's not just the query (if you're running a chatbot, which many of us are not). It's the entire context window. It's not uncommon to have a system prompt that is > 512 tokens alone.

I would like to see benchmarks for 512, 1024, 4096 and 8192 token inputs.

Gasp0de · 2 years ago

Including the initialization prompt and your history if you have one? I use ChatGPT for a very simple task, to map chat messages to one of 5 supported function calls, and the function definitions alone already take up 200 tokens I think

stefs · 2 years ago

It's not just the current prompt, but the whole conversation, if possible. Or, if you want the AI to summarise an article, the article has to fit in.

If I understood that correctly, context length is something like session storage or short term memory. If it's too small the AI starts to forget what it's talking about.

rfoo · 2 years ago

IMO the relevant benchmark for now is a mixed stream of requests with 50 (20%), 500 (50%), 2000 (10%) and 50k (20%) input tokens, ignore EOS and decode until you get around 300 output tokens.

spacecadet · 2 years ago

In most cases thats not enough

sva_ · 2 years ago

I try to be optimistic about this. Competition is absolutely needed in this space - $NVDA market cap is insane right now, about $0.6 trillion more than the entire Frankfurt Stock Exchange.

Rinzler89 · 2 years ago

It's more how little the Frankfurt stock Exchange is worth. And European devs keep wondering why our wages are lower than in the US for the same work. That's why.

CapeTheory · 2 years ago

The DAX is only 40 companies, most of which make real products rather than advertising mechanisms. Making real physical things just doesn't scale, and never will.

While I would enjoy a US tech salary, I'm not sure we want a world where all manufacturing is set aside to focus on the attention economy.

Nvidia value deserves to be much higher than any company on the DAX (maybe all of them together, as it currently is) - but how much of that current value is real rather than an AI speculation bubble?

braiamp · 2 years ago

Wages is a proxy of how valuable your work is, but not a measure of how value your work is. To support a high salary something has to happen, either the product sold is very expensive or it's being subsidized by investors. No company can pay its employees above what they are able to generate selling the product they worked on indefinitely.

dailykoder · 2 years ago

Okay, so you are saying I should move to america, where apparently a lot of people struggle hard to even get a job?

Nah, then ill get my very good wagie pennies here and have plenty jobs available, plus good health insurrance and whatnot.

raverbashing · 2 years ago

Yes

But there's a long list of German companies not on the DAX

(though Germany DAX really deserves to be worth less than NVidia)

Refusing23 · 2 years ago

stock value doesnt reflect a company's income or ability to pay their workers

littlecranky67 · 2 years ago

Frankfurt Stock Exchange or the DAX is mostly irrelevant. Germany has a strong, family-owned Mittelstand, those companies are not publicly traded and thus not listed. Plus, we have some giants that are also not publicly listed but belong to the richest Germans (Lidl, Aldi of discount groceries, but also automotive OEM Bosch).

threeseed · 2 years ago

We are in the middle of an LLM bubble.

Nvidia problem will sort itself out naturally in the coming months/years.

chx · 2 years ago

As someone put it in: we are in the 3D glasses phase of AI. Remember when all TVs came with one?

Same thing was said about Nvidia's crypto bubbles, and then look what happened.

Jensen isn't stupid. He's making accelerators for anything so that they'll be ready to catch the next bubble that depends on crazy compute power that can't be done efficiently on CPUs. They're so far the only semi company beating Moore's law by a large margin due to their clever scaling tech while everyone else is like "hey look our new product is 15% more efficient and 15% more IPC than the one we launched 3 years ago".

They may be overvalued now but they definitely won't crash back to their "just gaming GPUs" days.

Oof, I really didn't intend to start a flamewar.

CarRamrod · 2 years ago

Those are rookie numbers

mistymountains · 2 years ago

I’m a AI Scientist and train a lot of models. Personally I think AMD is undervalued relative to Nvidia. No, chips aren’t as fast as Nvidia’s latest and yes, there are some hoops to get things working. But for most workloads in most industries (ignoring for the moment that AI is likely a poor use of capital), it will be much more cost effective and achieve about the same results.

tgtweak · 2 years ago

michaelnny · 2 years ago

I'm wondering if the tensor parallel settings have any impact on the performance. My naive guess is yes but not sure.

According to the article: """ AMD Configuration: Tensor parallelism set to 1 (tp=1), since we can fit the entire model Mixtral 8x7B in a single MI300X’s 192GB of VRAM.

NVIDIA Configuration: Tensor parallelism set to 2 (tp=2), which is required to fit Mixtral 8x7B in two H100’s 80GB VRAM. """

renonce · 2 years ago

I personally find such comparisons unfair. A good comparison should optimize for each device configuration, which means use a model within the VRAM limit and quantize to 8 bits where it boosts performance etc and avoid shortcomings of both devices unless necessary.

huntertwo · 2 years ago

AMD has better seemingly better hardware - but not the production capacity to compete with Nvidia yet. Will be interesting to see margins compress when real competition catches up.

Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.

almostgotcaught · 2 years ago

> their own custom stack to interact with GPUs

lol completely made up.

are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).

source: i work at one of these "mega corps". hell if you don't believe me go look at how many CUDA kernels pytorch has https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/n....

> Everybody thinks it’s CUDA that makes Nvidia the dominant player.

it 100% does

pastaguy1 · 2 years ago

Can you explain the cuda-less stack a little more or provide a source?

some people emit llvm ir (maaaaybe ptx) directly instead of using the C/C++ frontend to CUDA. that's absolutely the only optional part of the stack and also basically the most trivial (i.e., it's not the frontend that's hard but the target codegen).

> but not the production capacity to compete with Nvidia yet.

thats just a question of negotiating with tsmc or their few competitors

(also didn't tsmc start production of some factories in the US and/or EU?)

I mean, nvidia use tsmc, so does amd.

Yes it is - but Nvidia has larger contracts _right now_. Nvidia has been investing more money in producing more GPUs for longer, so it’s only natural that they have an advantage now.

But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.

TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.

Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.

mark_l_watson · 2 years ago

A good start for AMD. I am also enthusiastic about another non-NVidea inference option: Groq (which I sometimes use).

NVidia relies on TMSC for manufacturing. Samsung is building competing manufacturing infrastructure which is also a good thing, so Taiwan is not a single point of failure.