LibreCUDA – Launch CUDA code on Nvidia GPUs without the proprietary runtime

Cool to see one of these in C, particularly if it can be binary compatible. Why not s/libreCuInit/cuInit?

If you are interested in open source runtimes, tinygrad has them in Python for both AMD and NVIDIA, speaking directly to the kernel through ioctls and poking the command queues.

https://github.com/tinygrad/tinygrad/blob/master/tinygrad/ru...

JonChesterfield · a year ago

That's interesting. This looks like you've bypassed the rocm userspace stack entirely. I've been looking for justification to burn libhsa.so out of the dependency graph for running llvm compiled kernels on amdgpu for ages now. I didn't expect roct to be similarly easy to drop but that's a clear sketch of how to build a statically linked freestanding x64 / gcn blob. Excellent.

(I want a reference implementation of run-simple-stuff which doesn't fall over because of bugs in libhsa so that I know whatever bug I'm looking at is in my compiler / the hardware / the firmware)

georgehotz · a year ago

We didn't just bypass all of ROCm, we bypassed HSA!

The HSA parsing MEC firmware running on the GPUs is riddled with bugs, fortunately you can bypass 90% of it using PM4, which is pretty much direct sets of the GPU registers. That's what tinygrad does.

AMD's software is a really sad state. They don't have consumer GPUs in CI, they have no fuzz testing, and instead of root causing bugs they seem to just twiddle things until the application works.

Between our PM4 backend and disabling CWSR, our AMD GPUs are now pretty stable.

mike64_t · a year ago

Binary compatibility is possible, but not my main concern just yet. The CUDA api is missing length parameters left and right, often for highly problematic things such as "how long is this ELF file" and "how many parameters does this kernel need". I will definitely write wrapper headers at some point, but I don't want those hacks in the actual source code...

ZoomerCretin · a year ago

Incredible! Any plans to support SASS instructions for Nvidia GPUs, or only PTX?

georgehotz · a year ago

We'll get there as we push deeper into assemblies. RDNA3 probably first, since it's documented and a bit simpler.

I think the point of open cuda is to run it on non NVIDIA gpus. Once you have to buy NVIDIA gpus what’s the point. If we had true you competition I think it would be far easier to buy devices with more vram and thus we might be able to run llama 405b someday locally.

Once you already bought the NVIDIA cards what’s the point

kelnos · a year ago

Some people believe being able to build on fully-open software stacks has value in and of itself. (I happen to be one of those people.)

Another benefit could be support for platforms that nvidia doesn't care to release CUDA SDKs for.

IgorPartola · a year ago

Hear hear. Yes practically if you need to run a workload on a closed source system or if that’s your only option to get the performance then you have to do what you have to do. But in the long run open source wins because once an open source alternative exists it is just the better option.

As a bonus, with open source platforms you are much less subject to whims of company licensing. If tomorrow Nvidia decided to change their licensing strategy and pricing, how many here will be affected by it? OSS doesn’t do that. And even if the project goes in a random direction you don’t like, someone likely forks it to keep going in the right direction (see pfsense/opnsense).

lmpdev · a year ago

The point might not necessarily be for consumers

Linus wasn’t writing Linux for consumers (arguably the Linux kernel team still isn’t), he needed a Unix-like kernel on a platform which didn’t support it

Nvidia is placed with CUDA in a similar way to how Bell was with Unix in the late 1980s. I’m not sure if a legal “CUDA Wars” is possible in the way the Unix Wars was, but something needs to give

Nvidia has a monopoly and many organisations and projects will come about to rectify it, I think this is one example

The most interesting thing to see moving forward is where the most just place is to draw the line for Nvidia they deserve remuneration for CUDA, but the question is how much? The axe of the Leviathan (US government) is slowly swinging towards them, and I expect Nvidia to pre-emptively open up CUDA just enough to keep them (and most of us) happy

After a certain point for a technology so low in the “stack” of the global economy, more powerful actors than Nvidia will have to step in and clear the IP bottleneck

Tech giants are powerful and influence people more than the government, but I think people forget how powerful the government can be when push comes to shove over such an important piece of technology

—————

PS my comparison of CUDA to Unix isn’t perfect, mostly as Nvidia has a hardware monopoly as it stands, but as they don’t fab it themselves it’s just a design/information at the end of the day. There’s nothing physically preventing other companies producing CUDA hardware, just obvious legal and business obstacles

Perhaps a better comparison would be Texas Instruments trying to monopolise integrated circuits (they never tried). But if Fairchild Semiconductors hadn’t’ve independently discovered ICs, we might have seen a much slower logistic curve than we have had with Moore’s law (assuming competition is proportional to innovation)

talldayo · a year ago

> I expect Nvidia to pre-emptively open up CUDA just enough to keep them (and most of us) happy

Besides how they've "opened" their drivers by moving all the proprietary code on-GPU, I don't expect this to happen at all. Nvidia has no incentive to give away their IP, and the antitrust cases that people are trying to build against them border on nonsense. Nvidia monopolizes CUDA like Amazon monopolizes AWS, their "abuse" is the specialization they offer to paying customers... which harms the market how?

What really makes me lament the future is the fact that we had a chance to kill CUDA. Khronos wanted OpenCL to be a serious competitor, and if it wasn't specifically for the fact that Apple and AMD stopped funding it we might have a cross-platform GPU compute layer that outperforms CUDA. Today's Nvidia dominance is a result of the rest of the industry neglecting their own GPGPU demand.

Nvidia only "wins" because their adversaries would rather fight each other than work together to beat a common competitor. It's an expensive lesson for the industry about adopting open standards when people ask you to, or you suffer the consequences of having nothing competitive.

segmondy · a year ago

Some of us are running llama 405B locally already. All my GPUs are ancient Nvidai GPUs. IMO, the point of an open cuda is to force Nvidia to stop squeezing us. You get more performance for the buck for AMD. If I could run cuda on AMD, I would have bought new AMD gpus instead. Have enough people do that and Nvidia might take note and stop squeezing us for cash.

oaththrowaway · a year ago

What are you using P100s or something?

smokel · a year ago

> the point of an open cuda is to force Nvidia to stop squeezing us

Nobody is forcing you to buy GPUs.

Your logic is flawed in the sense that enough people could also simply write alternatives to Torch, which, by the way, is already open source.

londons_explore · a year ago

The NVidia software stack has the "no use in datacenters" clause. Is this a workaround for that?

mike64_t · a year ago

It seems to me at least, yes. You still need ptxas, but this piece of Software technically isn't deployed in the datacenter, if you AOT compile your kernels. Its usage seems more than fine, especially considering you could just run it on a system without Nvidia GPUs or old Tesla GPUs while still targeting eg. sm_89. If using ptxas compiled kernels in the datacenter counts as indirect datacenter usage, I don't know. Also, technically you are never presented with the GeForce software license during the CUDA download and installation process, which sparks the question if it is even applicable. In this case, all you would need is the open source driver and you could stuff as many consumer GPUs in your datacenter as you want. However, it technically governs all software downloadable from nvidia.com. I'm no legal expert if this matters, but I would assume consumers would be fine, but companies may be held to a higher standard of seeking out licenses which might govern what they are about to use.

why_only_15 · a year ago

Specifically the clause is that you cannot use their consumer cards (e.g. RTX 4090) in datacenters.

paulmd · a year ago

use the open kernel driver, which is MIT/GPL and thus cannot impose usage restrictions.

it's worth noting that "NVIDIA software stack" is an imprecise term. the driver is the part that has the datacenter usage term, and the open-kernel-driver bypasses that. the CUDA stack itself does not have the datacenter driver clause, the only caveat is that you can't run it on third-party hardware. So ZLUDA/GpuOcelot is still verboten, if you are using the CUDA libraries.

https://docs.nvidia.com/cuda/eula/index.html

Q6T46nT668w6i3m · a year ago

CUDA is ubiquitous in science and an open source alternative to the CUDA runtime is useful, even if the use is limited to verifying expected behavior.

jedberg · a year ago

Step 1: Run on NVIDIA gpus until it works just as well as real CUDA.

Step 2: Port to other GPUs.

At least I assume that is the plan.

chii · a year ago

> Step 2: Port to other GPUs.

why not do this first? because the existing closed sourced CUDA already runs well on nvidia chips. Replicating it with an open stack, while ideologically useful, is going to sap resources away from the porting of it to other GPUs (where the real value can be had - by stopping the nvidia monopoly on ai chips).

kstenerud · a year ago

I think the point of Linux is to run it on non-Intel CPUs. Once you have to buy Intel CPUs what's the point.

lambdaone · a year ago

You have it exactly backwards. The original goal of Linux was to create a Unix-like operating system on Linus Torvald's own Intel 80386 PC. Once the original Linux had been created, it was then ported to other CPUs. The joy of a portable operating system is that you can run it on any CPU, including Intel CPUs.

Deleted Comment

btbuildem · a year ago

> Once you already bought the NVIDIA cards what’s the point

Good luck getting a multi-user GPU setup going, for example.

It super sucks when the hardware is capable, but licensing doesn't "allow" it.

jokoon · a year ago

I guess this framework was made by amd engineers.

Anyway I wonder why amd never challenged nvidia on that market... It smells a bit like amd and nvidia secretly agreed to not compete against each other.

Opencl exists but is abandoned.

heavyset_go · a year ago

The closed platform is not without its pitfalls.

actionfromafar · a year ago

Yeah like running Linux on a MacBook…