Run CUDA, unmodified, on AMD GPUs

A lot of people think AMD should support these translation layers but I think it's a bad idea. CUDA is not designed to be vendor agnostic and Nvidia can make things arbitrarily difficult both technically and legally. For example I think it would be against the license agreement of cuDNN or cuBLAS to run them on this. So those and other Nvidia libraries would become part of the API boundary that AMD would need to reimplement and support.

Chasing bug-for-bug compatibility is a fool's errand. The important users of CUDA are open source. AMD can implement support directly in the upstream projects like pytorch or llama.cpp. And once support is there it can be maintained by the community.

eslaught · a year ago

Are you aware of HIP? It's officially supported and, for code that avoids obscure features of CUDA like inline PTX, it's pretty much a find-and-replace to get a working build:

https://github.com/ROCm/HIP

Don't believe me? Include this at the top of your CUDA code, build with hipcc, and see what happens:

https://gitlab.com/StanfordLegion/legion/-/blob/master/runti...

It's incomplete because I'm lazy but you can see most things are just a single #ifdef away in the implementation.

currymj · a year ago

if you're talking about building anything, that is already too hard for ML researchers.

you have to be able to pip install something and just have it work, reasonably fast, without crashing, and also it has to not interfere with 100 other weird poorly maintained ML library dependencies.

SushiHippie · a year ago

AMD has hipify for this, which converts cuda code to hip.

https://github.com/ROCm/HIPIFY

jph00 · a year ago

Inline PTX is hardly an obscure feature. It's pretty widely used in practice, at least in the AI space.

pjmlp · a year ago

How does it run CUDA Fortran?

blitzar · a year ago

It would be good if AMD did something, anything.

Support this, reimplement that, support upstream efforts, dont really care. Any of those would cost a couple of million and be worth a trillion dollars to AMD shareholders.

chatmasta · a year ago

Is it weird how the comments here are blaming AMD and not Nvidia? Sure, the obvious argument is that Nvidia has no practical motivation to build an open platform. But there are counterexamples that suggest otherwise (Android). And there is a compelling argument that long term, their proprietary firmware layer will become an insufficient moat to their hardware dominance.

Who’s the root cause? The company with the dominant platform that refuses to open it up, or the competitor who can’t catch up because they’re running so far behind? Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard. No matter what they do, they’ll need to have a compatibility layer. And in that case maybe it makes sense for them to invest in the best one that emerges from the community.

slashdave · a year ago

ROCm counts as "something"

oezi · a year ago

A couple of million doesn't get you anything in corporate land

fngjdflmdflg · a year ago

>Nvidia can make things arbitrarily difficult both technically and legally.

I disagree. AMD can simply not implement those APIs, similar to how game emulators implement the most used APIs first and sometimes never bother implementing obscure ones. It would only matter that NVIDIA added eg. patented APIs to CUDA if those APIs were useful. In which case AMD should have a way to do them anyway. Unless NVIDIA comes up with a new patented API which is both useful and impossible to implement in any other way, which would be bad for AMD in any event. On the other hand, if AMD start supporting CUDA and people start using AMD cards, then developers will be hesitant to use APIs that only work on NVIDIA cards. Right now they are losing billions of dollars on this. Then again they barely seem capable of supporting RocM on their cards, much less CUDA.

You have a fair point in terms of cuDNN and cuBLAS but I don't know that that kind of ToS is actually binding.

selimnairb · a year ago

Patented API? I thought Google v. Oracle settled this? Making an implementation of an API spec is fair use, is it not?

apatheticonion · a year ago

Agreed. Rather than making CUDA the standard; AMD should push/drive an open standard that can be run on any hardware.

We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal.

All of the big tech companies are obsessed with ring-fencing developers behind the thin veil of "innovation" - where really it's just good for business (I swear it should be regulated because it's really bad for consumers).

A CUDA translation layer is okay for now but it does risk CUDA becoming the standard API. Personally, I am comfortable with waiting on an open standard to take over - ROCm has serviced my needs pretty well so far.

Just wish GPU sharing with VMs was as easy as CPU sharing.

naasking · a year ago

> AMD should push/drive an open standard that can be run on any hardware.

AMD has always been notoriously bad at the software side, and they frequently abandon their projects when they're almost usable, so I won't hold my breath.

ChoGGi · a year ago

"We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal."

I'll definitely agree with you on Sync and Vulkan, but dlss and xess are both better than fsr.

https://youtube.com/watch?v=el70HE6rXV4

amy-petrik-214 · a year ago

we actually also saw this historically with openGL. openGL comes from an ancient company whispered about by the elderly programmers (30 + year old) known as SGI. Originally it was CLOSED SOURCE and SGI called it "SGI-GL" for a computer codename IRIS which was cool looking with bright popping color plastic and faux granite keyboard. Good guy SGI open sourced SGI-GL to become what we called "openGL" (get it, now it's open), and then it stuck.

That's all to say NVIDIA could pull a SGI and open their stuff, but they're going more sony style and trying to monopolize. Oh, and SGI also wrote another ancient lore library known as "STL" or the "SGI Template Library" which is like the original boost template metaprogramming granddaddy

gjulianm · a year ago

OpenCL was released in 2009. AMD has had plenty of time to push and drive that standard. But OpenCL had a worse experience than CUDA, and AMD wasn't up to the task in terms of hardware, so it made no real sense to go for OpenCL.

pjmlp · a year ago

Vulkan only matters on Android (from version 10 onwards) and GNU/Linux.

Zero impact on Switch, Playstation, XBox, Windows, macOS, iOS, iPadOS, Vision OS.

imtringued · a year ago

AMD shouldn't push on anything. They have the wrong incentives. They should just make sure that software runs on their GPUs and nothing else.

Karol Herbst is working on Rusticl, which is mesa's latest OpenCL implementation and will pave the way for other things such as SYCL.

Dead Comment

Const-me · a year ago

> Nvidia can make things arbitrarily difficult both technically and legally

Pretty sure APIs are not copyrightable, e.g. https://www.law.cornell.edu/supremecourt/text/18-956

> against the license agreement of cuDNN or cuBLAS to run them on this

They don’t run either of them, they instead implement an equivalent API on top of something else. Here’s a quote: “Open-source wrapper libraries providing the "CUDA-X" APIs by delegating to the corresponding ROCm libraries. This is how libraries such as cuBLAS and cuSOLVER are handled.”

dralley · a year ago

I believe it was decided that they are copyrightable but that using them for compatibility purposes is fair use.

Dead Comment

Wowfunhappy · a year ago

> CUDA is not designed to be vendor agnostic and Nvidia can make things arbitrarily difficult [...] technically.

(Let's put the legal questions aside for a moment.)

nVidia changes GPU architectures every generation / few generations, right? How does CUDA work across those—and how can it have forwards compatibility in the future—if it's not designed to be technologically agnostic?

saagarjha · a year ago

PTX is meant to be portable across GPU microarchitectures. That said, Nvidia owns the entire spec, so they can just keep adding new instructions that their GPUs now support but AMD GPUs don't.

andy_ppp · a year ago

One way is to make sure the hardware team does certain things to support easy transition to new architectures, we have seen this with Apple Silicon for example!

rjurney · a year ago

Not having a layer like this has left AMD completely out of the AI game that has made NVDA the world's most valuable company.

HarHarVeryFunny · a year ago

Well, they kinda have it with their hipify tool, although this is for porting CUDA code to AMD's HIP which supports both AMD and NVIDIA. This supports CUDA C code and libraries with AMD equivalents like cuDNN, cuBLAS, cuRAND, but doesn't support porting of CUDA C inline PTX assembler. AMD have their own inline GCN assembler, but seem to discourage it's use.

There are also versions of PyTorch, TensorFlow and JAX with AMD support.

PyTorch's torch.compile can generate Triton (OpenAI's GPU compiler) kernels, with Triton also supporting AMD.

ChoGGi · a year ago

Self-inflicted wounds hurt the most.

magic_hamster · a year ago

CUDA is the juice that built Nvidia in the AI space and allowed them to charge crazy money for their hardware. To be able to run CUDA on cost effective AMD hardware can be a big leap forward, allow more people to research, and break away from Nvidia's stranglehold over VRAM. Nvidia will never open source their own platform unless their hand is forced. I think we all should support this endeavor and contribute where possible.

amelius · a year ago

Like supporting x86 was a bad idea as well?

modeless · a year ago

Before starting, AMD signed an agreement with Intel that gave them an explicit license to x86. And x86 was a whole lot smaller and simpler back then in 1982. A completely different and incomparable situation.

karolist · a year ago

Was there a large entity steering x86 spec alone with a huge feature lead against their competition, free to steer the spec in any ways they choose? Also, hardware is not opensource software, you get big players onboard and they will be able to implement the spec they want every gen, software has more moving parts and unaligned parties involved.

viraptor · a year ago

Isn't cuDNN a much better case for reimplementing than CUDA? It has much more choice in how things actually happen and cuDNN itself chooses different implementations at runtime + does fusing. It seems way more generic and the reimplementation would allow using the best AMD-targeted kernel rather than one the original has.

ckitching · a year ago

AMD have "MIOpen" which is basically cuDNN-for-AMD. Ish.

raxxorraxor · a year ago

I really hope they will do what you suggested. With some innovative product placement, GPUs with a lot of memory for example, they could dethrone nvidia if it doesn't change strategy.

That said, easier said than done. You need very specialized developers to build a CUDA equivalent and have people start using it. AMD could do it with a more open development process leveraging the open source community. I believe this will happen at some point anyway by AMD or someone else. The market just gets more attractive by the day and at some point the high entry barrier will not matter much.

So why should AMD skimp on their ambitions here? This would be a most sensible investment, few risks and high gains if successful.

Dead Comment

Sparkyte · a year ago

That is why an open standard should be made so it isn't locked to a particular piece of hardware and then allow modular support for different hardware to interface with supported drivers.

anigbrowl · a year ago

Given AMDs prior lack of interest I'll take whatever options there are. My daily driver has a Vega 10 GPU and it's been quite frustrating not to be able to easily leverage it for doing basic ML tasks, to the point that I've been looking at buying an external nvidia GPU instead just to try out some of the popular Python libraries.

dietr1ch · a year ago

How's this situation different than the one around Java, Sun/Oracle and Google?

dboreham · a year ago

The judge might not be a coder next time.

DeepYogurt · a year ago

Ya, honestly better to leave that to third parties who can dedicate themselves to it and maybe offer support or whatever. Let AMD work on good first party support first.

hot_gril · a year ago

Providing support in pytorch sounds obvious, but I have very little experience here. Why is this still a problem?

neutrinobro · a year ago

Cries in OpenCL

Dead Comment

koolala · a year ago

CUDA v1...CUDA v2... CUDA v... CUDA isnt commonly assosiated with a version number...

Uehreka · a year ago

…yes it is? https://developer.nvidia.com/cuda-toolkit-archive

The main cause of Nvidia's crazy valuation is AMD's unwillingness to invest in making its GPUs as useful as Nvidia's for ML.

Maybe AMD fears antitrust action, or maybe there is something about its underlying hardware approach that would limit competitiveness, but the company seems to have left billions of dollars on the table during the crypto mining GPU demand spike and now during the AI boom demand spike.

ClassyJacket · a year ago

I like to watch YouTube retrospectives on old failed tech companies - LGR has some good ones.

When I think of AMD ignoring machine learning, I can't help imagine a future YouTuber's voiceover explaining how this caused their downfall.

There's a tendency sometimes to think "they know what they're doing, they must have good reasons". And sometimes that's right, and sometimes that's wrong. Perhaps there's some great technical, legal, or economic reason I'm just not aware of. But when you actually look into these things, it's surprising how often the answer is indeed just shortsightedness.

They could end up like BlackBerry, Blockbuster, Nokia, and Kodak. I guess it's not quite as severe, since they will still have a market in games and therefore may well continue to exist, but it will still be looked back on as a colossal mistake.

Same with Toyota ignoring electric cars.

I'm not an investor, but I still have stakes in the sense that Nvidia has no significant competition in the machine learning space, and that sucks. GPU prices are sky high and there's nobody else to turn to if there's something about Nvidia you just don't like or if they decide to screw us.

hedora · a year ago

In fairness to AMD, they bet on crypto, and nvidia bet on AI. Crypto was the right short term bet.

Also, ignoring is a strong word: I’m staring at a little << $1000, silent 53 watt mini-PC with an AMD SoC. It has an NPU comparable to an M1. In a few months, with the ryzen 9000 series, NPUs for devices of its class will bump from 16 tops to 50 tops.

I’m pretty sure the linux taint bit is off, and everything just worked out of the box.

daedrdev · a year ago

Toyota is extremely strong in the hybrid car market, and with ravenous competition for electric cars and slowing demand Toyota may have made the right decision after all

robertlagrant · a year ago

There's also just the idea of endeavour - Nvidia tried something, and it worked. Businesses (or rather their shareholders) take risks with their capital sometimes, and it doesn't always work. But in this case it did.

_boffin_ · a year ago

If you haven’t heard of this book, you might like it. Dealers of lightening

karolist · a year ago

I think this could be cultural differences, AMD's software department is underfunded and doing poorly for a long time now.

* https://www.levels.fyi/companies/amd/salaries/software-engin...

* https://www.levels.fyi/companies/nvidia/salaries/software-en...

And it's probably better now. Nvidia was paying much more long before, also their stock growing attracts even more talent.

1024core · a year ago

> I think this could be cultural differences, AMD's software department is underfunded and doing poorly for a long time now.

Rumor is that ML engineers (that AMD really needs) are expensive; and AMD doesn't want to give them more money than the rest of the SWEs they have (for pissing off the existing SWEs). So AMD is caught in a bind: can't pay to get top MLE talent and can't just sit by and watch NVDA eat its lunch.

DaoVeles · a year ago

So nothing has changed since the era of ATI.

dist-epoch · a year ago

There are stories from credible sources that AMD software engineers had to buy AMD GPUs with their own money to use in CI machines.

Dead Comment

anticensor · a year ago

AMD fears anti-collusion action, remember, CEOs of the two are just barely far enough of kinship to not be automatically considered colluding with each other.

gukov · a year ago

The companies' CEO's are related. My conspiracy theory is that they don't want to step on each other's toes. Not sure if that works with fiduciary duty, though.

arendtio · a year ago

I searched for it and found this (in case someone else might want to read it):

https://www.tomshardware.com/news/jensen-huang-and-lisa-su-f...

anticensor · a year ago

It does not conflict. Fiduciary duty for a for-profit organisation is not "profit at all costs", it's "you have to care about the company (care), you have to do good business (good faith) and you can't actively waste investors' and shareholders' money to intentionally lose out (loyalty)".

If they are found colluding due to nepotism, both will get a very swift revocation of business licence and a huge prison term. Remember they are just one step of kinship away from presumed collusion.