Readit News logoReadit News
georgehotz · a year ago
Cool to see one of these in C, particularly if it can be binary compatible. Why not s/libreCuInit/cuInit?

If you are interested in open source runtimes, tinygrad has them in Python for both AMD and NVIDIA, speaking directly to the kernel through ioctls and poking the command queues.

https://github.com/tinygrad/tinygrad/blob/master/tinygrad/ru...

https://github.com/tinygrad/tinygrad/blob/master/tinygrad/ru...

JonChesterfield · a year ago
That's interesting. This looks like you've bypassed the rocm userspace stack entirely. I've been looking for justification to burn libhsa.so out of the dependency graph for running llvm compiled kernels on amdgpu for ages now. I didn't expect roct to be similarly easy to drop but that's a clear sketch of how to build a statically linked freestanding x64 / gcn blob. Excellent.

(I want a reference implementation of run-simple-stuff which doesn't fall over because of bugs in libhsa so that I know whatever bug I'm looking at is in my compiler / the hardware / the firmware)

georgehotz · a year ago
We didn't just bypass all of ROCm, we bypassed HSA!

The HSA parsing MEC firmware running on the GPUs is riddled with bugs, fortunately you can bypass 90% of it using PM4, which is pretty much direct sets of the GPU registers. That's what tinygrad does.

AMD's software is a really sad state. They don't have consumer GPUs in CI, they have no fuzz testing, and instead of root causing bugs they seem to just twiddle things until the application works.

Between our PM4 backend and disabling CWSR, our AMD GPUs are now pretty stable.

mike64_t · a year ago
Binary compatibility is possible, but not my main concern just yet. The CUDA api is missing length parameters left and right, often for highly problematic things such as "how long is this ELF file" and "how many parameters does this kernel need". I will definitely write wrapper headers at some point, but I don't want those hacks in the actual source code...
ZoomerCretin · a year ago
Incredible! Any plans to support SASS instructions for Nvidia GPUs, or only PTX?
georgehotz · a year ago
We'll get there as we push deeper into assemblies. RDNA3 probably first, since it's documented and a bit simpler.
daft_pink · a year ago
I think the point of open cuda is to run it on non NVIDIA gpus. Once you have to buy NVIDIA gpus what’s the point. If we had true you competition I think it would be far easier to buy devices with more vram and thus we might be able to run llama 405b someday locally.

Once you already bought the NVIDIA cards what’s the point

kelnos · a year ago
Some people believe being able to build on fully-open software stacks has value in and of itself. (I happen to be one of those people.)

Another benefit could be support for platforms that nvidia doesn't care to release CUDA SDKs for.

IgorPartola · a year ago
Hear hear. Yes practically if you need to run a workload on a closed source system or if that’s your only option to get the performance then you have to do what you have to do. But in the long run open source wins because once an open source alternative exists it is just the better option.

As a bonus, with open source platforms you are much less subject to whims of company licensing. If tomorrow Nvidia decided to change their licensing strategy and pricing, how many here will be affected by it? OSS doesn’t do that. And even if the project goes in a random direction you don’t like, someone likely forks it to keep going in the right direction (see pfsense/opnsense).

lmpdev · a year ago
The point might not necessarily be for consumers

Linus wasn’t writing Linux for consumers (arguably the Linux kernel team still isn’t), he needed a Unix-like kernel on a platform which didn’t support it

Nvidia is placed with CUDA in a similar way to how Bell was with Unix in the late 1980s. I’m not sure if a legal “CUDA Wars” is possible in the way the Unix Wars was, but something needs to give

Nvidia has a monopoly and many organisations and projects will come about to rectify it, I think this is one example

The most interesting thing to see moving forward is where the most just place is to draw the line for Nvidia they deserve remuneration for CUDA, but the question is how much? The axe of the Leviathan (US government) is slowly swinging towards them, and I expect Nvidia to pre-emptively open up CUDA just enough to keep them (and most of us) happy

After a certain point for a technology so low in the “stack” of the global economy, more powerful actors than Nvidia will have to step in and clear the IP bottleneck

Tech giants are powerful and influence people more than the government, but I think people forget how powerful the government can be when push comes to shove over such an important piece of technology

—————

PS my comparison of CUDA to Unix isn’t perfect, mostly as Nvidia has a hardware monopoly as it stands, but as they don’t fab it themselves it’s just a design/information at the end of the day. There’s nothing physically preventing other companies producing CUDA hardware, just obvious legal and business obstacles

Perhaps a better comparison would be Texas Instruments trying to monopolise integrated circuits (they never tried). But if Fairchild Semiconductors hadn’t’ve independently discovered ICs, we might have seen a much slower logistic curve than we have had with Moore’s law (assuming competition is proportional to innovation)

talldayo · a year ago
> I expect Nvidia to pre-emptively open up CUDA just enough to keep them (and most of us) happy

Besides how they've "opened" their drivers by moving all the proprietary code on-GPU, I don't expect this to happen at all. Nvidia has no incentive to give away their IP, and the antitrust cases that people are trying to build against them border on nonsense. Nvidia monopolizes CUDA like Amazon monopolizes AWS, their "abuse" is the specialization they offer to paying customers... which harms the market how?

What really makes me lament the future is the fact that we had a chance to kill CUDA. Khronos wanted OpenCL to be a serious competitor, and if it wasn't specifically for the fact that Apple and AMD stopped funding it we might have a cross-platform GPU compute layer that outperforms CUDA. Today's Nvidia dominance is a result of the rest of the industry neglecting their own GPGPU demand.

Nvidia only "wins" because their adversaries would rather fight each other than work together to beat a common competitor. It's an expensive lesson for the industry about adopting open standards when people ask you to, or you suffer the consequences of having nothing competitive.

segmondy · a year ago
Some of us are running llama 405B locally already. All my GPUs are ancient Nvidai GPUs. IMO, the point of an open cuda is to force Nvidia to stop squeezing us. You get more performance for the buck for AMD. If I could run cuda on AMD, I would have bought new AMD gpus instead. Have enough people do that and Nvidia might take note and stop squeezing us for cash.
oaththrowaway · a year ago
What are you using P100s or something?
smokel · a year ago
> the point of an open cuda is to force Nvidia to stop squeezing us

Nobody is forcing you to buy GPUs.

Your logic is flawed in the sense that enough people could also simply write alternatives to Torch, which, by the way, is already open source.

londons_explore · a year ago
The NVidia software stack has the "no use in datacenters" clause. Is this a workaround for that?
mike64_t · a year ago
It seems to me at least, yes. You still need ptxas, but this piece of Software technically isn't deployed in the datacenter, if you AOT compile your kernels. Its usage seems more than fine, especially considering you could just run it on a system without Nvidia GPUs or old Tesla GPUs while still targeting eg. sm_89. If using ptxas compiled kernels in the datacenter counts as indirect datacenter usage, I don't know. Also, technically you are never presented with the GeForce software license during the CUDA download and installation process, which sparks the question if it is even applicable. In this case, all you would need is the open source driver and you could stuff as many consumer GPUs in your datacenter as you want. However, it technically governs all software downloadable from nvidia.com. I'm no legal expert if this matters, but I would assume consumers would be fine, but companies may be held to a higher standard of seeking out licenses which might govern what they are about to use.
why_only_15 · a year ago
Specifically the clause is that you cannot use their consumer cards (e.g. RTX 4090) in datacenters.
paulmd · a year ago
use the open kernel driver, which is MIT/GPL and thus cannot impose usage restrictions.

it's worth noting that "NVIDIA software stack" is an imprecise term. the driver is the part that has the datacenter usage term, and the open-kernel-driver bypasses that. the CUDA stack itself does not have the datacenter driver clause, the only caveat is that you can't run it on third-party hardware. So ZLUDA/GpuOcelot is still verboten, if you are using the CUDA libraries.

https://docs.nvidia.com/cuda/eula/index.html

Q6T46nT668w6i3m · a year ago
CUDA is ubiquitous in science and an open source alternative to the CUDA runtime is useful, even if the use is limited to verifying expected behavior.
jedberg · a year ago
Step 1: Run on NVIDIA gpus until it works just as well as real CUDA.

Step 2: Port to other GPUs.

At least I assume that is the plan.

chii · a year ago
> Step 2: Port to other GPUs.

why not do this first? because the existing closed sourced CUDA already runs well on nvidia chips. Replicating it with an open stack, while ideologically useful, is going to sap resources away from the porting of it to other GPUs (where the real value can be had - by stopping the nvidia monopoly on ai chips).

kstenerud · a year ago
I think the point of Linux is to run it on non-Intel CPUs. Once you have to buy Intel CPUs what's the point.
lambdaone · a year ago
You have it exactly backwards. The original goal of Linux was to create a Unix-like operating system on Linus Torvald's own Intel 80386 PC. Once the original Linux had been created, it was then ported to other CPUs. The joy of a portable operating system is that you can run it on any CPU, including Intel CPUs.

Deleted Comment

btbuildem · a year ago
> Once you already bought the NVIDIA cards what’s the point

Good luck getting a multi-user GPU setup going, for example.

It super sucks when the hardware is capable, but licensing doesn't "allow" it.

jokoon · a year ago
I guess this framework was made by amd engineers.

Anyway I wonder why amd never challenged nvidia on that market... It smells a bit like amd and nvidia secretly agreed to not compete against each other.

Opencl exists but is abandoned.

heavyset_go · a year ago
The closed platform is not without its pitfalls.
actionfromafar · a year ago
Yeah like running Linux on a MacBook…
wackycat · a year ago
I have limited experience with CUDA but will this help solve the CUDA/CUDNN dependency version nightmare that comes with running various ML libraries like tensorflow or onnx?
bstockton · a year ago
My experience, over 10 years building models with libraries using CUDA under the hood, this problem has nearly gone away in the past few years. Setting up CUDA on new machines and even getting multi GPU/nodes configuration working with NCCL and pytorch DDP, for example, is pretty slick. Have you experienced this recently?
jokethrowaway · a year ago
yes, especially if you are trying to run various different projects you don't control

some will need specific versions of cuda

right now I masked cuda from upgrades in my system and I'm stuck on an old version to support some projects

I also had plenty of problems with gpu-operator to deploy on k8s: that helm chart is so buggy (or maybe just not great at handling some corner cases? no clue) I ended up swapping kubernetes distribution a few times (no chance to make it work on microk8s, on k3s it almost works) and eventually ended up installing drivers + runtime locally and then just exposing through containerd config

trueismywork · a year ago
That's torches bad software distribution problem. No one can solve it apart from torch distributors
amelius · a year ago
By the way, can anyone explain why libcudnn takes on the order of gigabytes on my harddrive?
lldb · a year ago
Primarily because it has specialized functions for various matrix sizes which are selected at runtime.
JonChesterfield · a year ago
Very nice! That's essentially all I want from a cuda runtime. It should be possible to run llvm libc unit tests against this, which might then justify a corresponding amd library that does the same direct to syscall approach.
shmerl · a year ago
Since ZLUDA was taken down (by request from AMD of all parties), it would be better to have some ZLUDA replacement as a general purpose way of breaking CUDA lock-in. I.e. something not tied to Nvidia hardware.
KeplerBoy · a year ago
That's a problem on a different level of the CUDA stack.

Having a compiler that takes a special C++ or python dialect and compiles it to GPU suitable llvm-ir and then to a GPU binary is one thing (and there's progress on that side: triton, numba, soonish mojo), being able to launch that binary without going through the nvidia driver is another problem.

codedokode · a year ago
Cannot Vulcan compute be used to execute code on GPU without relying on proprietary libs? Why not?
nsajko · a year ago
> [...] there's progress [...]

Don't forget about Julia!

shmerl · a year ago
Yeah, the latter one is more useful for effective lock-in breaking.
SushiHippie · a year ago
> At this point, one more hostile corporation does not make much difference. I plan to rebuild ZLUDA starting from the pre-AMD codebase. Funding for the project is coming along and I hope to be able to share the details in the coming weeks. It will have a different scope and certain features will not come back. I wanted it to be a surprise, but one of those features was support for NVIDIA GameWorks. I got it working in Batman: Arkham Knight, but I never finished it, and now that code will never see the light of the day:

So if I understand it correctly there is something in the works

https://github.com/vosen/ZLUDA

shmerl · a year ago
Ah, that's good. Hopefully it will get back on track then.

Deleted Comment

snihalani · a year ago
For a non cuda n00b, what problem does this solve?
queuebert · a year ago
Two obvious problems that come to mind are

1. Replacing the extremely bloated official packages with lightweight distribution that provides only the common functionality.

2. Paving the way for GPU support on *BSD.

einpoklum · a year ago
It doesn't solve problem (1.) ; even when complete, this will replace the CUDA driver and its associated library - which is a very small part of CUDA. As for (2.) - this is just CUDA, not GPU use in general. I wonder whether nouveau is relevant for BSDs (I have no idea...)
heyoni · a year ago
Like anything open source it allows you to know and see exactly what your machine is doing. I don’t want to speculate too much but I remember there being discussions around whether or not nvidia could embed licensing checks and such at the firmware level.
samstave · a year ago
> licensing checks and such at the firmware level.

Could you imaging an age where the NVIDIA firmware does LLM/AI/GPU license checking before it does operations on your vectors? (Hello Oracle on SUN e650, My old Friend) ((Worse would be a DRM check against deep-faking or other Globalist WEF Guardrails))

((oracle had(has) an age olde function where if you bought a license for a single proc and threw it inot a dual proc sun enterprise server with an extra proc or so - it knew you have several hundred K to spend on an additional e650 so why not have an extra ~$80K for an additional oracle proc license. Rather than make the app actually USE the additional proc - as there were no changes to oracles garbage FU Maxwell))

KeplerBoy · a year ago
What's a CUDA elf file?

Is it binary SASS code, so one would still need a open source ptxas alternative?

mike64_t · a year ago
Yes, the Nvidia SASS ISAs are not documented and emitting them is non trivial due to Nvidia GPUs not handling pipeline hazards in Hardware and requires the compiler to correctly schedule instructions to avoid race conditions. The only available code that does this can be found in MESA, but even they say "//this is bs and we know it" in a comment above their instruction latencies, which you also can't easily figure out.

Replacing ptxas is highly non trivial. I will attempt to do so, but it increasingly looks like ptxas is here to stay. I started working on a nvcc + cuda SDK replacement which already works surprisingly well for a day of work.

However, ptxas is in my sight. But I know this is something that to my knowledge nobody that wasn't fed Nvidia documentation under license has ever successfully accomplished.

snvzz · a year ago
Moving to HiP on LibreCUDA should probably be the first step for projects that are dependent on CUDA to gain platform freedom.