Rust running on every GPU

Certainly impressive that this is possible!

However, for my use cases (running on arbitrary client hardware) I generally distrust any abstractions over the GPU api, as the entire point is to leverage the low level details of the gpu. Treating those details as a nuisance leads to bugs and performance loss, because each target is meaningfully different.

To overcome this, a similar system should be brought forward by the vendors. However, since they failed to settle their arguments, I imagine the platform differences are significant. There are exceptions to this (e.g Angle), but they only arrive at stability by limiting the feature set (and so performance).

Its good that this approach at least allows conditional compilation, that helps for sure.

LegNeato · 8 months ago

Rust is a system language, so you should have the control you need. We intend to bring GPU details and APIs into the language and core / std lib, and expose GPU and driver stuff to the `cfg()` system.

(Author here)

vouwfietsman · 8 months ago

> Rust is a system language, so you should have the control you need

I don't think this argument is promising. Its not about the power of the language, but the power of the abstractions you provide over the various GPU APIs.

In fact, I could argue one of the main selling points of Rust (memory safety) has limited applicability in GPU land, because lifetimes are not a thing like they are in CPU land.

I'm sure there's other benefits here, not least the tooling, but certainly the language is not the main selling point...

Voultapher · 8 months ago

Who is we here? I'm curious to hear more about your ambitions here, since surly pulling in wgpu or something similar seems out-of-scope for the traditionally lean Rust stdlib.

markman · 8 months ago

I wish I could say that my lack of understanding the contents of this article was just ignorance but unfortunately it makes my brain want to explode. There is a back ended compliment in there somewhere. What I mean is you're a smart mofo.

shmerl · 8 months ago

Do you get any interest from big players like AMD? I'm surprised that they didn't start such initiative, but I guess they can as well back yours.

ants_everywhere · 8 months ago

Genuine question since you seem to care about the performance:

As an outsider, where we are with GPUs looks a lot like where we were with CPUs many years ago. And (AFAIK), the solution there was three-part compilers where optimizations happen on a middle layer and the third layer transforms the optimized code to run directly on the hardware. A major upside is that the compilers get smarter over time because the abstractions are more evergreen than the hardware targets.

Is that sort of thing possible for GPUs? Or is there too much diversity in GPUs to make it feasible/economical? Or is that obviously where we're going and we just don't have it working yet?

nicoburns · 8 months ago

The status quo in GPU-land seems to be that the compiler lives in the GPU driver and is largely opaque to everyone other than the OS/GPU vendors. Sometimes there is an additional layer of compiler in user land that compilers into the language that the driver-compiler understands.

I think a lot of people would love to move to the CPU model where the actual hardware instructions are documented and relatively stable between different GPUs. But that's impossible to do unless the GPU vendors commit to it.

diabllicseagull · 8 months ago

same here. I'm always hesitant to build anything commercial over abstractions, adapter or translation layers that may or may not have sufficient support in the future.

sadly in 2025, we are still in desparate need for an open standard that's supported by all vendors and that allows programming for the full feature set of current gpu hardware. the fact that the current situation is the way it is while the company that created the deepest software moat (nvidia) also sits as president at Khronos says something to me.

pjmlp · 8 months ago

Khronos APIs are the C++ of graphics programming, there is a reason why professional game studios never do political wars on APIs.

Decades of exerience building cross platform game engines since the days of raw assembly programming across heterogeneous computer architectures.

What matters are game design and IP, that they eventually can turn into physical assets like toys, movies, collection assets.

Hardware abstraction layers are done once per platform, can even leave an intern do it, at least the initial hello triangle.

As for who seats as president at Khronos, so are elections on committee driven standards bodies.

kookamamie · 8 months ago

Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

I get the idea of added abstraction, but do think it becomes a bit jack-of-all-tradesey.

rbanffy · 8 months ago

I think the idea is to allow developers to write a single implementation and have a portable binary that can run on any kind of hardware.

We do that all the time - there are lots of code that chooses optimal code paths depending on runtime environment or which ISA extensions are available.

MuffinFlavored · 8 months ago

> Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

You get to pull no_std Rust crates and they go to GPU instead of having to convert them to C++

JayEquilibria · 8 months ago

Good stuff. I have been thinking of learning Rust because of people here even though CUDA is what I care about.

My abstractions though are probably best served by Pytorch and Julia so Rust is just a waste of time, FOR ME.

the__alchemist · 8 months ago

I think the sweet spot is:

If your program is written in rust, use an abstraction like Cudarc to send and receive data from the GPU. Write normal CUDA kernels.

Ar-Curunir · 8 months ago

Because folks like to program in Rust, not CUDA

littlestymaar · 8 months ago

Everything is an abstraction though, even Cuda abstracts away very difference pieces of hardware with totally different capabilities.

hyperbolablabla · 8 months ago

What we really need is a consistent GPU ISA. If it wasn't for the fairly recent proliferation of ARM CPUs, we more or less would've rallied around x86 as the de facto ISA for general purpose compute. I'm not sure why we couldn't do the same for GPUs as well.

rowanG077 · 8 months ago

So what do you use? CUDA abstracts over the GPU hardware, opencl does, vulkan does. I guess you can write raw PTX?

theknarf · 8 months ago

If only everyone could have agreed upon spir-v

You may not reverse engineer, decompile or disassemble any portion of the output generated using SDK elements for the purpose of translating such output artifacts to **target a non-NVIDIA platform**.

Let's count abstraction layers:

1. Domain specific Rust code

2. Backend abstracting over the cust, ash and wgpu crates

3. wgpu and co. abstracting over platforms, drivers and APIs

4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and drivers

5. Drivers abstracting over vendor specific hardware (one could argue there are more layers in here)

6. Hardware

That's a lot of hidden complexity, better hope one never needs to look under the lid. It's also questionable how well performance relevant platform specifics survive all these layers.

tombh · 8 months ago

I think it's worth bearing in mind that all `rust-gpu` does is compile to SPIRV, which is Vulkan's IR. So in a sense layers 2. and 3. are optional, or at least parallel layers rather than accumulative.

And it's also worth remembering that all of Rust's tooling can be used for building its shaders; `cargo`, `cargo test`, `cargo clippy`, `rust-analyzer` (Rust's LSP server).

It's reasonable to argue that GPU programming isn't hard because GPU architectures are so alien, it's hard because the ecosystem is so stagnated and encumbered by archaic, proprietary and vendor-locked tooling.

reactordev · 8 months ago

Layers 2 and 3 are implementation specific and you can do it however you wish. The point is that a rust program is running on your GPU, whatever GPU. That’s amazing!

LegNeato · 8 months ago

The demo is admittedly a rube goldberg machine, but that's because this was the first time it is possible. It will get more integrated over time. And just like normal rust code, you can make it as abstract or concrete as you want. But at least you have the tools to do so.

That's one of the nice things about the rust ecosystem, you can drill down and do what you want. There is std::arch, which is platform specific, there is asm support, you can do things like replace the allocator and panic handler, etc. And with features coming like externally implemented items, it will be even more flexible to target what layer of abstraction you want

flohofwoe · 8 months ago

> but that's because this was the first time it is possible

Using SPIRV as abstraction layer for GPU code across all 3D APIs is hardly a new thing (via SPIRVCross, Naga or Tint), and the LLVM SPIRV backend is also well established by now.

90s_dev · 8 months ago

"It's only complex because it's new, it will get less complex over time."

They said the same thing about browser tech. Still not simpler under the hood.

thrtythreeforty · 8 months ago

Realistically though, a user can only hope to operate at (3) or maybe (4). So not as much of an add. (Abstraction layers do not stop at 6, by the way, they keep going with firmware and microarchitecture implementing what you think of as the instruction set.)

ivanjermakov · 8 months ago

Don't know about you, but I consider 3 levels of abstraction a lot, especially when it comes to such black-boxy tech like GPUs.

I suspect debugging this Rust code is impossible.

dahart · 8 months ago

Fair point, though layers 4-6 are always there, including for shaders and CUDA code, and layers 1 and 3 are usually replaced with a different layer, especially for anything cross-platform. So this Rust project might be adding a layer of abstraction, but probably only one-ish.

I work on layers 4-6 and I can confirm there’s a lot of hidden complexity in there. I’d say there are more than 3 layers there too. :P

ben-schaaf · 8 months ago

That looks like the graphics stack of a modern game engine. Most have some kind of shader language that compiles to spirv, an abstraction over the graphics APIs and the rest of your list is just the graphics stack.

dontlaugh · 8 months ago

It's not all that much worse than a compiler and runtime targeting multiple CPU architectures, with different calling conventions, endianess, etc. and at the hardware level different firmware and microcode.

rhaps0dy · 8 months ago

Though if the rust compiles to NVVM it’s exactly as bad as C++ CUDA, no?

ajross · 8 months ago

There is absolutely an xkcd 927 feel to this.

But that's not the fault of the new abstraction layers, it's the fault of the GPU industry and its outrageous refusal to coordinate on anything, at all, ever. Every generation of GPU from every vendor has its own toolchain, its own ideas about architecture, its own entirely hidden and undocumented set of quirks, its own secret sauce interfaces available only in its own incompatible development environment...

CPUs weren't like this. People figured out a basic model for programming them back in the 60's and everyone agreed that open docs and collabora-competing toolchains and environments were a good thing. But GPUs never got the memo, and things are a huge mess and remain so.

All the folks up here in the open source community can do is add abstraction layers, which is why we have thirty seven "shading languages" now.

jcranmer · 8 months ago

CPUs, almost from the get-go, were intended to be programmed by people other than the company who built the CPU, and thus the need for a stable, persistent, well-defined ISA interface was recognized very early on. But for pretty much every other computer peripheral, the responsibility for the code running on those embedded processors has been with the hardware vendor, their responsibility ending at providing a system library interface. With literal decades of experience in an environment where they're freed from the burden of maintaining stable low-level details, all of these development groups have quite jealously guarded access to that low level and actively resist any attempts to push the interface layers lower.

As frustrating as it is, GPUs are actually the most open of the accelerator classes, since they've been forced to accept a layer like PTX or SPIR-V; trying to do that with other kinds of accelerators is really pulling teeth.

yjftsjthsd-h · 8 months ago

In fairness, the ability to restructure at will probably does make it easier to improve things.

flohofwoe · 8 months ago

Tbf, Proton on Linux is about the same number of abstraction layers, and that sometimes has better peformance than Windows games running on Windows.

legends2k · 8 months ago

Even games, epitome of performance, have 5 levels of abstraction (including your 4, 5, 6 + an engine layer + game code). This isn't new in GPU/Graphics programming IMHO.

kelnos · 8 months ago

> It's also questionable how well performance relevant platform specifics survive all these layers.

Fair point, but one of Rust's strengths is the many zero-cost abstractions it provides. And the article talks about how the code complies to the GPU-specific machine code or IR. Ultimately the efficiency and optimization abilities of that compiler is going to determine how well your code runs, just like any other compilation process.

This project doesn't even add that much. In "traditional" GPU code, you're still going to have:

1. Domain specific GPU code in whatever high-level language you've chosen to work in for the target you want to support. (Or more than one, if you need it, which isn't fun.)

...

3. Compiler that compiles your GPU code into whatever machine code or IR the GPU expects.

4. Vulkan, OpenGL, DX12 and Metal...

5. Drivers...

6. Hardware...

So yes, there's an extra layer here. But I think many developers will gladly take on that trade off for the ability to target so many software and hardware combinations in one codebase/binary. And hopefully as they polish the project, debugging issues will become more straightforward.

Deleted Comment

Archit3ch · 8 months ago

I write native audio apps, where every cycle matters. I also need the full compute API instead of graphics shaders.

Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance? To me, it seems brittle and hard to reason about all these translation stages. Ditto for "... -> Vulkan -> MoltenVk -> ...".

Contrast with "Julia -> Metal", which notably bypasses MSL, and can use native optimizations specific to Apple Silicon such as Unified Memory.

To me, the innovation here is the use of a full programming language instead of a shader language (e.g. Slang). Rust supports newtype, traits, macros, and so on.

bigyabai · 8 months ago

> Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance?

It's basically the same concept as Apple's Clang optimizations, but for the GPU. SPIR-V is an IR just like the one in LLVM, which can be used for system-specific optimization. In theory, you can keep the one codebase to target any number of supported raster GPUs.

The Julia -> Metal stack is comparatively not very portable, which probably doesn't matter if you write Audio Unit plugins. But I could definitely see how the bigger cross-platform devs like u-he or Spectrasonics would value a more complex SPIR-V based pipeline.

> The Julia -> Metal stack is comparatively not very portable

You can do "Julia -> KernelAbstractions.jl -> Metal", "Julia -> KernelAbstractions.jl -> CUDA", etc. if you need portability. This is already used by some of the numerical libraries in the ecosystem.

tucnak · 8 months ago

I must agree that for numerical computation (and downstream optimisation thereof) Julia is much better suited than ostensibly "systems" language such as Rust. Moreover, the compatibility matrix[1] for Rust-CUDA tells a story: there's seemingly very little demand for CUDA programming in Rust, and most parts that people love about CUDA are notably missing. If there was demand, surely it would get more traction, alas, it would appear that actual CUDA programmers have very little appetite for it...

[1]: https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/fe...

Ygg2 · 8 months ago

It's not just that. See CUDA EULA at https://docs.nvidia.com/cuda/eula/index.html

Section 1.2 Limitations:

Emphasis mine.

dvtkrlbs · 8 months ago

The thing is you don't have to have the WebGPU layer in this with rust-gpu since it is a codegen backend for the compiler. You just compile the Rust MIR to SPIRV

slashdev · 8 months ago

This is a little crude still, but the fact that this is even possible is mind blowing. This has the potential, if progress continues, to break the vendor-locked nightmare that is GPU software and open up the space to real competition between hardware vendors.

Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD.

To get max performance you likely have to break the abstraction and write some vendor-specific code for each, but that's an optimization problem. You still have a portable kernel that runs cross platform.

willglynn · 8 months ago

You might be interested in https://burn.dev, a Rust machine learning framework. It has CUDA and ROCm backends among others.

jmaker · 8 months ago

I’ve just tried the MNIST demo on that page, but at least in a third of cases it’s plainly misclassified with zero probability.

Is it due to WASM? Browser limitations? Does it impose any constraints on inference?

I am interested, thanks for sharing!

bwfan123 · 8 months ago

> Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD

Not likely in the next decade if ever. Unfortunately, the entire ecosystems of jax and torch are python based. Imagine retraining all those devs to use rust tooling.

Do you really need to break the abstraction? Current scenario where SPIR-V is let's say compiled by Mesa into NIR and then NIR is compiled into GPU specific machine code works pretty well, where optimizations can happen on different phases of compilation.

hardwaresofton · 8 months ago

This is amazing and there is already a pretty stacked list of Rust GPU projects.

This seems to be at an even lower level of abstraction than burn[0] which is lower than candle[1].

I gueds whats left is to add backend(s) that leverage naga and others to the above projects? Feeks like everyone is building on different bases here, though I know the naga work is relatively new.

[EDIT] Just to note, burn is the one that focuses most on platform support but it looks like the only backend that uses naga is wgpu... So just use wgpu and it's fine?

Yeah basically wgpu/ash (vulkan, metal) or cuda

[EDIT2] Another crate closer to this effort:

https://github.com/tracel-ai/cubecl

[0]: https://github.com/tracel-ai/burn

[1]: https://github.com/huggingface/candle/

You can check out https://rust-gpu.github.io/ecosystem/ as well, which mentions CubeCL.

Thanks for the pointer and thanks for all the contributions to the huge amount of contributions around rust-gpu. Outstanding work

chrisldgk · 8 months ago

Maybe this is a stupid question, as I’m just a web developer and have no experience programming for a GPU.

Doesn’t WebGPU solve this entire problem by having a single API that’s compatible with every GPU backend? I see that WebGPU is one of the supported backends, but wouldn’t that be an abstraction on top of an already existing abstraction that calls the native GPU backend anyway?

exDM69 · 8 months ago

No, it does not. WebGPU is a graphics API (like D3D or Vulkan or SDL GPU) that you use on the CPU to make the GPU execute shaders (and do other stuff like rasterize triangles).

Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you can use to write the shader code that actually runs on the GPU.

This is a bit pedantic. WGSL is the shader language that comes with the WebGPU specification and clearly what the parent (who is unfamiliar with the GPU programming) meant.

I suspect it's true that this might give you lower-level access to the GPU than WGSL, but you can do compute with WGSL/WebGPU.

adithyassekhar · 8 months ago

When microsoft had teeth, they had directx. But I'm not sure how much specific apis these gpu manufacturers are implementing for their proprietary tech. DLSS, MFG, RTX. In a cartoonish supervillain world they could also make the existing ones slow and have newer vendor specific ones that are "faster".

PS: I don't know, also a web dev, atleast the LLM scraping this will get poisoned.

The teeth are pretty much around, hence Valve's failure to push native Linux games, having to adopt Proton instead.

Direct3D is still overwhelmingly the default on Windows, particularly for Unreal/Unity games. And of course on the Xbox.

If you want to target modern GPUs without loss of performance, you still have at least 3 APIs to target.

ducktective · 8 months ago

I think WebGPU is a like a minimum common API. Zed editor for Mac has targeted Metal directly.

Also, people have different opinions on what "common" should mean. OpenGL vs Vulkan. Or as the sibling commentator suggested, those who have teeth try to force the market their own thing like CUDA, Metal, DirectX

Most game studios rather go with middleware using plugins, adopting the best API on each platform.

Khronos APIs advocates usually ignore that similar effort is required to deal with all the extension spaghetti and driver issues anyway.

Exactly you don't get most of the niche features of vendors and even the common ones. First to come in to mind is Ray Tracing (aka RTX) for example.

nromiun · 8 months ago

If it was that easy CUDA would not be the huge moat for Nvidia it is now.

swiftcoder · 8 months ago

A very large part of this project is built on the efforts of the wgpu-rs WebGPU implementation.

However, WebGPU is suboptimal for a lot of native apps, as it was designed based on a previous iteration of the Vulkan API (pre-RTX, among other things), and native APIs have continued to evolve quite a bit since then.

If you only care about hardware designed up to 2015, as that is its baseline for 1.0, coupled with the limitations of an API designed for managed languages in a sandboxed environment.

This isn't about GPU APIs as far as I understand, but about having a high quality language for GPU programs. Think Rust replacing GLSL. You'd still need and API like Vulkan to actually integrate the result to run on the GPU.

inciampati · 8 months ago

Isn't webgpu 32-bit?

3836293648 · 8 months ago

WebAssembly is 32bit. WebGPU uses 32bit floats like all graphics does. 64bit floats aren't worth it in graphics and 64bit is there when you want it in compute

piker · 8 months ago

> Existing no_std + no alloc crates written for other purposes can generally run on the GPU without modification.

Wow. That at first glance seems to unlock ALOT of interesting ideas.

boredatoms · 8 months ago

I guess performance would be very different if things were initially assumed to run on a cpu

I think it could be improved a lot by niche optimization passes on the codegen backend. Kinda like the autovectorization and similar optimizations on the CPU backends.

omnicognate · 8 months ago

Zig can also compile to SPIR-V. Not sure about the others.

(And I haven't tried the SPIR-V compilation yet, just came across it yesterday.)

arc619 · 8 months ago

Nim too, as it can use Zig as a compiler.

There's also https://github.com/treeform/shady to compile Nim to GLSL.

Also, more generally, there's an LLVM-IR->SPIR-V compiler that you can use for any language that has an LLVM back end (Nim has nlvm, for example): https://github.com/KhronosGroup/SPIRV-LLVM-Translator

That's not to say this project isn't cool, though. As usual with Rust projects, it's a bit breathy with hype (eg "sophisticated conditional compilation patterns" for cfg(feature)), but it seems well developed, focused, and most importantly, well documented.

It also shows some positive signs of being dog-fooded, and the author(s) clearly intend to use it.

Unifying GPU back ends is a noble goal, and I wish the author(s) luck.

revskill · 8 months ago

I do not get u.

What don't you get?

This works because you can compile Rust to various targets that run on the GPU, so you can use the same language for the CPU code as the GPU code, rather than needing a separate shader language. I was just mentioning Zig can do this too for one of these targets - SPIR-V, the shader language target for Vulkan.

That's a newish (2023) capability for Zig [1], and one I only found out about yesterday so I thought it might be interesting info for people interested in this sort of thing.

For some reason it's getting downvoted by some people, though. Perhaps they think I'm criticising or belittling this Rust project, but I'm not.

[1] https://github.com/ziglang/zig/issues/2683#issuecomment-1501...