I'm genuinely rooting for AMD to develop a competitive alternative to NVIDIA. Currently, NVIDIA dominates the machine learning landscape, and there doesn't seem to be a justifiable reason for the price discrepancy between the RTX 4090 and the A100.
Its not so black and white. The A100 is exponentially more difficult to assemble even though it is older. The silicon is far more specialized. You pay a price for such a "fat" node and server hardware guarantees.
At the same time, the cost is outrageous. Its not a low volume product.
Also, a 48GB 4090 (or 3090) would be trivial. So would a 48GB 7900. Its not done for purely anticompetitive reasons, one that AMD and Nvidia are unfortunately happy to go along with.
I'm always surprised that people bring up cost as a main factor determining price, especially here on HN. Maybe it would be on a commoditized market with many competitors, which the GPUarket clearly isn't.
As most economics 101 lecture would tell you, price is determined by supply and demand and in this case Nvidia essentially maximizing their profit based on the demand and segmentation of the market.
> Also, a 48GB 4090 (or 3090) would be trivial. So would a 48GB 7900. Its not done for purely anticompetitive reasons, one that AMD and Nvidia are unfortunately happy to go along with.
Is this why Intel started taking dGPU production more seriously in recent years?
One might speculate he’s perhaps pivoting to Intel. They’re not well-developed in application terms but that’s a piece he can develop, and actually with OneAPI that’s a lot of potential bang for the buck. And intel has actual ML accelerators and has some relatively powerful GPGPU stuff and is currently in a position of being forced to offer a lot of bang for buck to drive adoption, all of which makes sense for what he’s trying to do.
But AMD wants you to basically write and debug their runtime for them and nah not worth it, after fighting the installer on the official system config and then filing a couple bugs for the demo apps reproducibly crashing the kernel it’s just not worth the time.
ROCm is unserious even when you’re operating on supported hardware. This is the experience most people have with it.
>a justifiable reason for the price discrepancy between the RTX 4090 and the A100
1. they're a publicly listed company operating in a free market in an industry they themselves helped develop who's purpose is to make returns for their investors, not be liked by the public, they don't need to justify their pricing to their buyers. They're not selling essentials for survival like insulin, baby formula, or housing, they can charge as much as the market will bear for their consumer electronics products. Don't like the pricing? Don't buy it. Simple. Buy from the competition instead or older generations off the second hand market that fit your budget.
2. the price justification is that cutting edge silicon is and will always be in short supply, and buyers of the silicon in A100 form, like datacenters, use it to make money, therefore it's an investment that will yield returns, therefore they can justify spending way more money to outbid the gamers who buy the same silicon in RTX 4090 form and don't use it to make money but use it to play games therefore for them the product is worth less and it makes Nvidia smaller margins than what selling it to datacenters can. It's basic price segmentation that's been going on for decades.
I also don't like the GPU pricing situation but that's the market reality I can't change and downvoting the messenger won't change it either. My 2 cents.
It's because you're responding as if OP said it wasn't justifiable legally. They're just saying that AMD is, in their estimation, making a bad business decision.
Yeah but the issue is that they can make more money if they actually did affordable ML hardware and didn't purposefully gimp the gaming cards.
LLms running on standalone affordable boxes could become a ubiquitous home accessory, not to mention the potential for things like being able to leverage learning from widespread use to retrain the models, like Tesla does with their fleet.
If you want to go an run inference/train on one of these fresh and sweet LLMs using bitsandbytes/PEFT, which is really where the excitement is at, you gotta use CUDA pretty much. This is the story now. And the story for the innovation before was the same. Use CUDA, or wait for AMD to catch up a year late and with a worse version of everything.
I mean, sure, you could compile your stuff to XLA or just, I dunno, set up 800 of these cards and train the whole thing on ROCM. But then would you really, really use AMD instead of some TPUs?
AMD made Machine Learning either impossible, unsupported or a chore on most of their hardware for years.
Their stack sucks and no one wanted to implement it, and in fact they rarely supported most of their own GPUs.
Yes you have GPUs. But we need also drivers. And software. People have been yelling at AMD about this for years and years.
Instead, AMD has made it clear that Machine Learning is not a priority for the company. Hence, they deserve the lack of traction. Investing in AMD hardware for ML has literally been a mistake at every point in recent history. Imagine if you bought a bunch of (insert last Gen card which is no longer supported by their stack) how dumb you'd look.
Releasing a GPGPU card now?
Honestly, why bother?
No one is gonna buy it.
Can someone who works at AMD please print this out, roll it up, and smack some senior managers on the nose with it?
NVIDIA is about to walk off with a trillion dollars because nobody at AMD “gets it”.
With no meaningful competition, NVIDIA will gouge as hard as they can. Such as charging $50K for a card that’s not too different to a 4090 but with more memory.
AMD took aim at HPC and shipped Frontier. The ROCm stack is quite HPC themed because that was the driving project. Porting to run AI models is a work in progress but the back end is the same compiler stack that was written for graphics (largely games consoles iiuc) then upgraded for HPC, it'll get there.
NVIDIA is about to walk off with a trillion dollars because nobody at AMD “gets it”.
I think they "get it" OK. Whether or not they can formulate a viable strategy and execute it is one question, but they get the idea that "AI is important" and they know where they stand vis-a-vis NVIDIA.
This is another reason I'm willing to invest some time and money in working with AMD products for AI/ML. History has shown us their ability to go toe-to-toe with a seemingly unassailable industry titan before, and they came out in pretty good shape then.
Been playing with Stable Diffusion on 6750XT 12GB for a while now with minimal issues (low memory issues sometimes on higher res, but it's less of a problem using grid upscaler).
Also good news was that running olive and downloading the ONNX converter to run stable diffusion shows something like 120% speedup on Nvidia and 50% speedup on AMD from the numbers I see. Some UIs are beginning to support ONNX format. ONNX being the format that has Microsoft's support behind it for AI models to my understanding.
There's more to machine learning and AI than just training the latest LLM's though. Speaking for myself, I am very interested in supporting AMD and the ROCm ecosystem, whatever AMD's past sins may be. I'm building a machine now which will be based on a high-end consumer GPU from AMD. Not going for something like this as it's almost certainly way out of my budget, but perhaps in the future.
So basically, I'm betting that AMD has had (or is having) a change of heart and is genuinely committed to AI/ML on their products. Time may ultimately prove me wrong, but so be it if that proves to be the case.
And FWIW, one reason for my optimism is that, whatever you think about the state of ROCm today, they are clearly investing heavily into the platform and continually working on it. You can see that just from looking at the activity on their Github repos:
There is constant activity and has been for some time, which I take as a good sign. Yes, it's just one signal among many that one could consider, but I think it's an important one.
Yeah but you see, we have been here before. "This time we are serious about ML/AI..." And if you went and bought an AMD card then, you'd have been wrong.
My example about LLMs was just to show that AMD is simply not part of the conversation. Three month before you could have made the same point about another approach.
And still, if you'd actually have to risk money, you and me both know you'd never invest in AMD hardware for AI or start developing on it something high stakes.
I mean, look at geohot he tried and just gave up entirely and AMD.
This is a naive question, but how hard would it be for AMD to make their cards/firmware CUDA compatible? Feels like that's what they would need to do to sell hardware in the space, other than banking on sufficiently severe shortages.
Cuda runs in the layer above firmware. Compiling cuda to amdgpu isa could be done but might invite lawsuits.
There's a language called HIP which is a fairly close approximation to cuda. You can probably convert one to the other with determination and regex. The GPUs themselves are fundamentally different in ways that hopefully don't matter to your application (warp synchronisation is the big one in my opinion, but I suspect cuda applications ignore it and just live with the race conditions).
The issue with AMD and AI is, as always, the software stack. Even if the hardware is great ROCM simply doesn't have industry traction and accessiblity.
Doesn't have the traction for now. Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware. Google has already invested in tpus and it wouldn't surprise me at all if they got together to fund ROCm development or even went so far as to develop their own NN asics.
Cuda is great, but it's not strictly necessary for much of the latest AI / ML developments.
ROCm is so terrible the cloud providers rolled out their own chips rather than use AMD which has perfectly good GPUs and the worst software stack ever.
ROCm still doesn't support consumer GPUs; that means people building random things (as opposed to more serious work things) won't be using their stack, so none of the innovation will be there.
It may be possible to use it with consumer GPUs anyway, but many won't try because it's not officially supported.
Intel, AMD, Google, Amazon, etc should team up to create some sort of standards/consortium around an open source CUDA alternative, something that anyone who can fabricate chips could use, and the consortium could have their own team of devs/researchers to make improvements / next gen versions of their CUDA alternative.
Something like the way chrome vs chromium is, or even a foundation like the linux foundation, where you have multiple distros contributing packages/etc back into the ecosystem.
> Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware.
I think cloud providers love exclusivity(Nvidia MSRP is significantly higher than it is available to clouds) and based on pricing compared to competitors like lambdalabs they have highest profit margin on GPU instances. Also based on availability, they likely have the highest utilisation. They definitely wouldn't want to commoditize the space. Google already has TPU that they could scale and sell to everyone but it would make the margins significantly smaller if they do it.
One thing AMD can do is working with ggml to make llama.cpp running on AMD GPUs. Compile modern ML framework is a quite complex due to number of OPs. However, running LLMs does not need a lot of OPs. Just einsum & relu & softmax. Having LLM works with llama.cpp could be done by a team within a week or so.
I agree that AMD should be dedicating an engineer to making sure all the top popular ML/AI projects can run on their hardware, but maybe they also need to spin up a wiki listing compatibility...
AMD GPU acceleration via CLBlast was merged back in mid-May in llama.cpp:master - it works and gives a boost (although not all AMD GPUs have been tuned for CLBLast - this is something that AMD should be doing tbt: https://github.com/RadeonOpenCompute/ROCm/issues/2161)
There is also a hipBLAS fork, which is slightly faster (~10% on my old Radeon VII) which maybe someone at AMD should be supporting to make its way into master: https://github.com/ggerganov/llama.cpp/pull/1087
I'll also note that exllama merged ROCm support and it runs pretty impressively - it runs 2X faster than the hipBLAS llama.cpp, and in fact, on exllama, my old Radeon VII manages to run inference >50% faster than my old (roughly equal class/FP32 perf) GTX 1080 Ti (GCN5 has 2X fp16, and 2X memory bandwidth, so there's probably more headroom even) for a relatively easy port. That's really impressive: https://github.com/turboderp/exllama/pull/7
(It's worth noting that for the latter, all you need to do is install the ROCm version of PyTorch and "it just works," which is refreshing: https://pytorch.org/get-started/locally/)
More like a dozen of engineers over a couple of weeks. Make sure popular LLMs can run on their hardware, as well as Stable Diffusion and other popular projects, and then they will see consumers flock to their hardware.
With consumers I mean something like what "gamers" used to be during all these past decades (and still are), those who won't be using it for business cases (what Quadro used to be) but for their hobby; this, but oriented towards AI, which currently is limited to LLM, image generation and ASR.
If they focus on this, the community will start helping, maybe even cleaning up their mess of repositories they have on GitHub.
There are two benefits over Nvidia: They have more VRAM and they have an open source software stack.
They just need to get the basics working for all those hobbyists, those who want to run little projects on their home hardware.
MI300A looks really nice, if only one anyone could buy it and do a DIY "Mac Studio"-like linux machine. Imagine just swapping it for the next generation in 2 or 3 years and keeping all the rest: peripherals, case, motherboard and etc.
And the x86/GPU version would be an awesome workstation. Just add M.2 and ethernet (and USB-C with video) and we are in business.
As for a Hackintosh, I'd imagine an Nvidia Grace or Grace Hopper as a good option, even though the 500+W TDP would require a MacPro-sized heatsink. And it can have up to 960GB of RAM.
edit: got a lot of details confused between the MI300 and the Grace.
Honestly, I find the x86+GPU parts more interesting. One could make a very Apple-like PC with a SoM and very little glue around it (a bit like what you can do with some top-of-the-line Xeons that have HBM chiplets). And with between 1 and 6 x86 chiplets, the Mi300 could have between 8 and 96 Zen 4 cores.
These things are so interesting it's a shame they aren't cheap.
But TBH the hybrid design is less interesting than you think, just because nothing really takes advantage of it. Hence Intel canceled their datacenter APU in favor of a pure Falcon Shores GPU due to a lack of interest from customers.
AMD halo strix is the future APU with a 256 bit wide interface. Boggles my mind with a multi-year GPU shortage that AMD didn't bring a wider memory interface to iGPUs. Obviously they can do it since the ps5 and Xbox X have been shopping for some time.
Looks like someone ported llama to apples metal v3 already and are getting 5 tok/s on a 65b model.
> just because nothing really takes advantage of it.
This is precisely why AMD, Intel, and Nvidia should think about making workstation-class machines with the lowest end of these - because until more people have one to play with, there won't be much to do with them.
An AMD Ryzen APU + RAM soldered onto the motherboard is very similar to what Apple's doing.
Afaik, because the memory controller is part of the CPU, the CPU-RAM connection on the consumer chips is entirely passive - just copper traces on the motherboard with no ICs inbetween?
IIRC the APUs aren't exactly typical APUs with shared memory, but more a CPU with a few GPU cores attached, operating independently memory whise (but I might be wrong).
AMD ROCm vs Nvidia CUDA has been discussed to death, but I'm curious how AMD fares compared to some of the AI training accelerator vendors. I think it would be much more damning if AMD were worse than some upstart, because the upstart wouldn't have the huge resource advantage and decade long head start of Nvidia. From my limited experience it seems like Google TPU and Cerebras are much nicer to use for AI training, from the standpoint of driver and software stability, documentation, and ecosystem support.
Perhaps that's not a fair comparison. From what I know AMD and NVIDIA use GPGPU cores (now with AI-focused instructions) plus separate AI-specific accelerator blocks. Conceptually, GPGPU + NPU on one die. NPUs can be much simpler than general-purpose GPUs. So AMD's driver and software stack likely needs to be an order of magnitude more complex than the NPU vendors' in order to accommodate other non-AI use cases. But to an end user it doesn't really matter why it sucks, only that it does.
...Not precisely. AMD implements the "AI" matrix instructions in the shaders themselves, not as big seperate blocks like Nvidia. But unlike Nvidia, the instructions are different on the consumer (RDNA3) and server line. I dont know anything about building rocm, but supporting rdna must indeed make things more difficult.
At the same time, the cost is outrageous. Its not a low volume product.
Also, a 48GB 4090 (or 3090) would be trivial. So would a 48GB 7900. Its not done for purely anticompetitive reasons, one that AMD and Nvidia are unfortunately happy to go along with.
48Gb 3090: https://www.nvidia.com/en-us/design-visualization/rtx-a6000/
As most economics 101 lecture would tell you, price is determined by supply and demand and in this case Nvidia essentially maximizing their profit based on the demand and segmentation of the market.
Is this why Intel started taking dGPU production more seriously in recent years?
https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...
One might speculate he’s perhaps pivoting to Intel. They’re not well-developed in application terms but that’s a piece he can develop, and actually with OneAPI that’s a lot of potential bang for the buck. And intel has actual ML accelerators and has some relatively powerful GPGPU stuff and is currently in a position of being forced to offer a lot of bang for buck to drive adoption, all of which makes sense for what he’s trying to do.
But AMD wants you to basically write and debug their runtime for them and nah not worth it, after fighting the installer on the official system config and then filing a couple bugs for the demo apps reproducibly crashing the kernel it’s just not worth the time.
ROCm is unserious even when you’re operating on supported hardware. This is the experience most people have with it.
Deleted Comment
1. they're a publicly listed company operating in a free market in an industry they themselves helped develop who's purpose is to make returns for their investors, not be liked by the public, they don't need to justify their pricing to their buyers. They're not selling essentials for survival like insulin, baby formula, or housing, they can charge as much as the market will bear for their consumer electronics products. Don't like the pricing? Don't buy it. Simple. Buy from the competition instead or older generations off the second hand market that fit your budget.
2. the price justification is that cutting edge silicon is and will always be in short supply, and buyers of the silicon in A100 form, like datacenters, use it to make money, therefore it's an investment that will yield returns, therefore they can justify spending way more money to outbid the gamers who buy the same silicon in RTX 4090 form and don't use it to make money but use it to play games therefore for them the product is worth less and it makes Nvidia smaller margins than what selling it to datacenters can. It's basic price segmentation that's been going on for decades.
I also don't like the GPU pricing situation but that's the market reality I can't change and downvoting the messenger won't change it either. My 2 cents.
LLms running on standalone affordable boxes could become a ubiquitous home accessory, not to mention the potential for things like being able to leverage learning from widespread use to retrain the models, like Tesla does with their fleet.
I mean, sure, you could compile your stuff to XLA or just, I dunno, set up 800 of these cards and train the whole thing on ROCM. But then would you really, really use AMD instead of some TPUs?
AMD made Machine Learning either impossible, unsupported or a chore on most of their hardware for years. Their stack sucks and no one wanted to implement it, and in fact they rarely supported most of their own GPUs.
Yes you have GPUs. But we need also drivers. And software. People have been yelling at AMD about this for years and years.
Instead, AMD has made it clear that Machine Learning is not a priority for the company. Hence, they deserve the lack of traction. Investing in AMD hardware for ML has literally been a mistake at every point in recent history. Imagine if you bought a bunch of (insert last Gen card which is no longer supported by their stack) how dumb you'd look.
Releasing a GPGPU card now? Honestly, why bother? No one is gonna buy it.
NVIDIA is about to walk off with a trillion dollars because nobody at AMD “gets it”.
With no meaningful competition, NVIDIA will gouge as hard as they can. Such as charging $50K for a card that’s not too different to a 4090 but with more memory.
I think they "get it" OK. Whether or not they can formulate a viable strategy and execute it is one question, but they get the idea that "AI is important" and they know where they stand vis-a-vis NVIDIA.
This is another reason I'm willing to invest some time and money in working with AMD products for AI/ML. History has shown us their ability to go toe-to-toe with a seemingly unassailable industry titan before, and they came out in pretty good shape then.
https://www.forbes.com/sites/iainmartin/2023/05/31/lisa-su-s...
ROCm is getting there, slowly.
So basically, I'm betting that AMD has had (or is having) a change of heart and is genuinely committed to AI/ML on their products. Time may ultimately prove me wrong, but so be it if that proves to be the case.
And FWIW, one reason for my optimism is that, whatever you think about the state of ROCm today, they are clearly investing heavily into the platform and continually working on it. You can see that just from looking at the activity on their Github repos:
https://github.com/orgs/ROCmSoftwarePlatform/repositories
There is constant activity and has been for some time, which I take as a good sign. Yes, it's just one signal among many that one could consider, but I think it's an important one.
My example about LLMs was just to show that AMD is simply not part of the conversation. Three month before you could have made the same point about another approach.
And still, if you'd actually have to risk money, you and me both know you'd never invest in AMD hardware for AI or start developing on it something high stakes.
I mean, look at geohot he tried and just gave up entirely and AMD.
There's a language called HIP which is a fairly close approximation to cuda. You can probably convert one to the other with determination and regex. The GPUs themselves are fundamentally different in ways that hopefully don't matter to your application (warp synchronisation is the big one in my opinion, but I suspect cuda applications ignore it and just live with the race conditions).
Cuda is great, but it's not strictly necessary for much of the latest AI / ML developments.
It may be possible to use it with consumer GPUs anyway, but many won't try because it's not officially supported.
https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...https://developer.nvidia.com/cuda-gpus
Something like the way chrome vs chromium is, or even a foundation like the linux foundation, where you have multiple distros contributing packages/etc back into the ecosystem.
I think cloud providers love exclusivity(Nvidia MSRP is significantly higher than it is available to clouds) and based on pricing compared to competitors like lambdalabs they have highest profit margin on GPU instances. Also based on availability, they likely have the highest utilisation. They definitely wouldn't want to commoditize the space. Google already has TPU that they could scale and sell to everyone but it would make the margins significantly smaller if they do it.
AMD GPU acceleration via CLBlast was merged back in mid-May in llama.cpp:master - it works and gives a boost (although not all AMD GPUs have been tuned for CLBLast - this is something that AMD should be doing tbt: https://github.com/RadeonOpenCompute/ROCm/issues/2161)
There is also a hipBLAS fork, which is slightly faster (~10% on my old Radeon VII) which maybe someone at AMD should be supporting to make its way into master: https://github.com/ggerganov/llama.cpp/pull/1087
I'll also note that exllama merged ROCm support and it runs pretty impressively - it runs 2X faster than the hipBLAS llama.cpp, and in fact, on exllama, my old Radeon VII manages to run inference >50% faster than my old (roughly equal class/FP32 perf) GTX 1080 Ti (GCN5 has 2X fp16, and 2X memory bandwidth, so there's probably more headroom even) for a relatively easy port. That's really impressive: https://github.com/turboderp/exllama/pull/7
(It's worth noting that for the latter, all you need to do is install the ROCm version of PyTorch and "it just works," which is refreshing: https://pytorch.org/get-started/locally/)
More like a dozen of engineers over a couple of weeks. Make sure popular LLMs can run on their hardware, as well as Stable Diffusion and other popular projects, and then they will see consumers flock to their hardware.
With consumers I mean something like what "gamers" used to be during all these past decades (and still are), those who won't be using it for business cases (what Quadro used to be) but for their hobby; this, but oriented towards AI, which currently is limited to LLM, image generation and ASR.
If they focus on this, the community will start helping, maybe even cleaning up their mess of repositories they have on GitHub.
There are two benefits over Nvidia: They have more VRAM and they have an open source software stack.
They just need to get the basics working for all those hobbyists, those who want to run little projects on their home hardware.
As for a Hackintosh, I'd imagine an Nvidia Grace or Grace Hopper as a good option, even though the 500+W TDP would require a MacPro-sized heatsink. And it can have up to 960GB of RAM.
edit: got a lot of details confused between the MI300 and the Grace.
These things are so interesting it's a shame they aren't cheap.
https://videocardz.com/newz/intel-arrow-lake-p-with-320eu-gp...
(Sorry, I cannot find the AMD rumor link atm)
But TBH the hybrid design is less interesting than you think, just because nothing really takes advantage of it. Hence Intel canceled their datacenter APU in favor of a pure Falcon Shores GPU due to a lack of interest from customers.
Looks like someone ported llama to apples metal v3 already and are getting 5 tok/s on a 65b model.
This is precisely why AMD, Intel, and Nvidia should think about making workstation-class machines with the lowest end of these - because until more people have one to play with, there won't be much to do with them.
Afaik, because the memory controller is part of the CPU, the CPU-RAM connection on the consumer chips is entirely passive - just copper traces on the motherboard with no ICs inbetween?
Perhaps that's not a fair comparison. From what I know AMD and NVIDIA use GPGPU cores (now with AI-focused instructions) plus separate AI-specific accelerator blocks. Conceptually, GPGPU + NPU on one die. NPUs can be much simpler than general-purpose GPUs. So AMD's driver and software stack likely needs to be an order of magnitude more complex than the NPU vendors' in order to accommodate other non-AI use cases. But to an end user it doesn't really matter why it sucks, only that it does.
Intel takes this approach too.
Deleted Comment