NVIDIA Transitions Fully Towards Open-Source Linux GPU Kernel Modules

There is little meaning for NVIDIA to open-source only the driver portion of their cards, since they heavily rely on proprietary firmware and userspace lib (most important!) to do the real job. Firmware is a relatively small issue - this is mostly same for AMD and Intel, since encapsulation reduces work done on driver side and open-sourcing firmware could allow people to do some really unanticipated modification which might heavily threaten even commercial card sale. Nonetheless at least for AMD they still keep a fair share of work done by driver compared to Nvidia. Userspace library is the worst problem, since they handle a lot of GPU control related functionality and graphics API, which is still kept closed-source.

The best thing we can hope is improvement on NVK and RedHat's Nova Driver can put pressure on NVIDIA releasing their user space components.

gpderetta · 2 years ago

It is meaningful because, as you note, it enables a fully opensource userspace driver. Of course the firmware is still proprietary and it increasingly contains more and more logic.

sscarduzio · 2 years ago

Which in a way is good because the hardware will more and more perform identically on Linux as on Windows.

matheusmoreira · 2 years ago

Doesn't seem like a bad tradeoff so long as the proprietary stuff is kept completely isolated with no access to any other parts of my system.

bayindirh · 2 years ago

The GLX libraries are the elephant(s) in the room. Open source kernel modules mean nothing without these libraries. On the other hand AMD and Intel uses "pltform GLX" natively, and with great success.

pabs3 · 2 years ago

The firmware is also signed, so you can't even do reverse engineering to replace it.

paulmd · 2 years ago

the open kernel driver also fundamentally breaks the limitation about geforce gpus not being licensed for use in the datacenter. that provision is a driver provision and CUDA does not follow the same license as the driver... really the only significant limitation is that you aren't allowed to use the CUDA toolkit to develop for non-NVIDIA hardware, and some license notice requirements if you redistribute the sample projects or other sample sourcecode. and yeah they paid to develop it, it's proprietary source code, that's reasonable overall.

https://docs.nvidia.com/cuda/eula/index.html

ctrl-f "datacenter": none

so yeah, I'm not sure where the assertion of "no progress" and "nothing meaningful" and "this changes nothing" come from, other than pure fanboyism/anti-fans. before you couldn't write a libre CUDA userland even if you wanted to - the kernel side wasn't there. And now you can, and this allows retiming and clock-up of supported gpus even with nouveau-style libre userlands. Which of course don't grow on trees, but it's still progress.

honestly it's kinda embarrassing that grown-ass adults are still getting their positions from what is functionally just some sick burn in a 2004 viral video or whatever, to the extent they actively oppose the company moving in the direction of libre software at all. but I think with the "linus torvalds" citers, you just can't reason those people out of a position that they didn't reason themselves into. Not only is it an emotionally-driven (and fanboy-driven) mindset, but it's literally not even their own position to begin with, it's just something they're absorbing from youtube via osmosis.

Apple debates and NVIDIA debates always come down to the anti-fans bringing down the discourse. It's honestly sad. https://paulgraham.com/fh.html

it also generally speaks to the long-term success and intellectual victory of the GPL/FSF that people see proprietary software as somehow inherently bad and illegitimate... even when source is available, in some cases. Like CUDA's toolchain and libraries/ecosystem is pretty much the ideal example of a company paying to develop a solution that would not otherwise have been developed, in a market that was (at the time) not really interested until NVIDIA went ahead and proved the value. You don't get to ret-con every single successful software project as being retroactively open-source just because you really really want to run it on a competitor's hardware. But people now have this mindset that if it's not libre then it's somehow illegitimate.

Again, most CUDA stuff is distributed as source, if you want to modify and extend it you can do so, subject to the terms of the CUDA license... and that's not good enough either.

AshamedCaptain · 2 years ago

I really don't know where this crap about "Moving everything to the firmware" is coming from. The kernel part of the nvidia driver has always been small, and this is the only thing they are open-sourcing (they have been announcing it for months now......). The immense majority of the user-space driver is still closed and no one has seen any indications that this may change.

I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware. If you look at the opensource drivers you can even confirm by yourself that the firmware does practically nothing -- the size of the binary blobs of AMD cards are minuscule for example, and long are the times of ATOMBIOS. The drivers are literally generating bytecode-level binaries for the shader units in the GPU, what do you expect the firmware could even do at this point? Re-optimize the compiler output?

There was an example of a GPU that did move everything to the firmware -- the videocore on the raspberry pi, and it was clearly a completely distinct paradigm, as the "driver" would almost literally pass through OpenGL calls to a mailbox, read by the secondary ARM core (more powerful than the main ARM core!) that was basically running the actual driver as "firmware". Nothing I see on nvidia indicates a similar trend, otherwise RE-ing it would be trivial, as happened with the VC.

ploxiln · 2 years ago

https://lwn.net/Articles/953144/

> Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.

> There are a number of disadvantages too, though. The firmware provides no stable ABI, and a lot of the calls it provides are not documented. The firmware files themselves are large, in the range of 20-30MB, and two of them are required for any given device. That significantly bloats a system's /boot directory and initramfs image (which must provide every version of the firmware that the kernel might need), and forces the Nouveau developers to be strict and careful about picking up firmware updates.

phendrenad2 · 2 years ago

There IS meaning because this makes it easier to install Nvidia drivers. At least, it reduces the number of failure modes. Now the open-source component can be managed by the kernel team, while the closed-source portion can be changed as needed, not dictated by kernel API changes.

matheusmoreira · 2 years ago

Why is the user space component required? Won't they provide sysfs interfaces to control the hardware?

cesarb · 2 years ago

It's something common to all modern GPUs, not just NVIDIA: most of the logic is in a user space library loaded by the OpenGL or Vulkan loader into each program. That library writes a stream of commands into a buffer (plus all the necessary data) directly into memory accessible to the GPU, and there's a single system call at the end to ask the operating system kernel to tell the GPU to start reading from that command buffer. That is, other than memory allocation and a few other privileged operations, the user space programs talk directly to the GPU.

An "OS-agnostic" component: this is the component of each kernel module that is independent of operating system. A "kernel interface layer": this is the component of each kernel module that is specific to the Linux kernel version and configuration.

How is the NVIDIA driver situation on Linux these days? I built a new desktop with an AMD GPU since I didn't want to deal with all the weirdness of closed source or lacking/obsolete open source drivers.

jcranmer · 2 years ago

I built my new-ish computer with an AMD GPU because I trusted in-kernel drivers better than out-of-kernel DKMS drivers.

That said, my previous experience with the DKMS driver stuff hasn't been bad. If you use Nvidia's proprietary driver stack, then things should generally be fine. The worst issues are that Nvidia has (historically, at least; it might be different for newer cards) refused to implement some graphics features that everybody else uses, which means that you basically need entirely separate codepaths for Nvidia in window managers, and some of them have basically said "fuck no" to doing that.

mepian · 2 years ago

The current stable proprietary driver is a nightmare on Wayland with my 3070, constant flickering and stuttering everywhere. Apparently the upcoming version 555 is much better, I'm sticking with X11 until it comes out. I never tried the open-source one yet, not sure if it supports my GPU at all.

bcrescimanno · 2 years ago

The 555 version is the current version. It was officially released on June 27.

https://www.phoronix.com/news/NVIDIA-555.58-Linux-Driver

gmokki · 2 years ago

I switched to Wayland 10 years ago when it became an option ok Fedora. First thing I had to do was to drop NVIDIA and switch to Intel GPU, and past 5 years to AMD GPU. Makes a big difference if the upstream kernel is supported.

Maybe NVIDIA drivers have kind of worked on 12 month old kernels that Ubuntu on average uses.

misterbishop · 2 years ago

this is resolved in 555 (currently running 555.58.02). my asus zephyrus g15 w/ 3060 is looking real good on Fedora 40. there's still optimizations needed around clocking, power, and thermals. but the graphics presentation layer has no issues on wayland. that's with hybrid/optimus/prime switching, which has NEVER worked seamlessly for me on any laptop on linux going back to 2010. gnome window animations remain snappy and not glitchy while running a game. i'm getting 60fps+ running baldurs gate 3 @ 1440p on the low preset.

llmblockchain · 2 years ago

I have a 3070 on X and it has been great.

anon291 · 2 years ago

I've literally never had an issue in decades of using NVIDIA and linux. They're closed source, but the drivers work very consistently for me. NVIDIA's just the only option if you want something actually good and to run ML workloads as well.

sqeaky · 2 years ago

> but the drivers work very consistently for me

The problem with comments like this is that you never know if you will be me or you on your graphics card or laptop.

I have tried nvidia a few times and kept getting burnt. AMD just works. I don't get the fastest ML machine, but I am just a tinkerer there and OpenCL works fine for my little toy apps and my 7900XTX blazes through every wine game.

If you need it professionally than you need it, warts an all. For any casual user that 10% extra gaming performance needs to weighed against reliability.

pizza234 · 2 years ago

Up to a couple of years ago, before permanently moving to AMD GPUs, I couldn't even boot Ubuntu with an Nvida GPU. This was because Ubuntu booted by default with Nouveau, which didn't support a few/several series (I had at least two different series).

The cards worked fine with binary drivers once the system was installed, but AFAIR, I had to integrate the binary driver packages in the Ubuntu ISO in order to boot.

I presume that now, the situation is much better, but necessiting binary drivers can be a problem in itself.

resoluteteeth · 2 years ago

Are you using wayland or are you still on x11? My experience was that the closed source drivers were fine with x11 but a nightmare with wayland.

bobajeff · 2 years ago

I did when my card stopped being supported by all the distros because it was too old while the legacy driver didn't fully work the same.

Keyframe · 2 years ago

Me too. Now I have a laptop with discrete nvidia and an eGPU with 3090 in it, a desktop with 4090, another laptop with another discrete nvidia.. all switching combinations work, acceleration works, game performance is on par with windows (even with proton to within a small percentage or even sometimes better). All out of the box with stock Ubuntu and installing driver from Nvidia site.

The only "trick" is I'm still on X11 and probably will stay. Note that I did try wayland on few occasions but I steered away (mostly due to other issues with it at the time).

isatty · 2 years ago

Likewise. Rock solid for decades in intel + nvidia proprietary drivers even when doing things like hot plugging for passthroughs.

l33tman · 2 years ago

Same here, been using the nvidia binary drivers on a dozen computers with various other HW and distros for decades with never any problems whatsoever.

drdaeman · 2 years ago

3090 owner here.

Wayland is even worse mess than it normally is. Used to flicker real bad before 555.58.02, less so with the latest driver - but still has some glitches with games. A bunch of older Electron apps still fail to render anything and require hardware acceleration disabled. I gave up trying to make it all work - can't get rid of all the flicker and drawing issues, plus Wayland seems to be a real pain in the ass with HiDPI displays.

X11 sort of works, but I had to entirely disable DPMS or one of my monitors never comes back online after going to sleep. I thought it was my KVM messing up, but that happened even with a direct connection... no idea what's going on there.

CUDA works fine, save for the regular version compatibility hiccups.

senectus1 · 2 years ago

4070ti super here, X11 is fine, i have zero issues.

Wayland is mostly fine, though i get some windowframe glitches when maxing them to the monitor and a another issue that i'm pretty sure is wayland but it has obnly happened a couple of times and it locks the whole device up. I cant prove it yet.

adrian_b · 2 years ago

I am not using Wayland and I do not have any intention to use it, therefore I do not care for any problems caused by Wayland not supporting NVIDIA and demanding that NVIDIA must support Wayland.

I am using only Linux or FreeBSD on all my laptop, desktop or server computers.

On desktop and server computers I did not ever have the slightest difficulty with the NVIDIA proprietary drivers, either for OpenGL or for CUDA applications or for video decoding/encoding or for multiple monitor support, with high resolution and high color depth, on either Gentoo/Funtoo Linux or FreeBSD, during the last two decades. I also have AMD GPUs, which I use for compute applications (because they are older models, which still had FP64 support). For graphics applications they frequently had annoying bugs, unlike NVIDIA (however my AMD GPUs have been older models, preceding RDNA, which might be better supported by the open-source AMD drivers).

The only computers on which I had problems with NVIDIA on Linux were those laptops that used the NVIDIA Optimus method of coexistence with the Intel integrated GPUs. Many years ago I have needed a couple of days to properly configure the drivers and additional software so that the NVIDIA GPU was selected when desired, instead of the Intel iGPU. I do not know if any laptops with NVIDIA Optimus still exist. The laptops that I bought later had video outputs directly from the NVIDIA GPU, so there was no difference between them and desktops and the NVIDIA drivers worked flawlessly.

Both on Gentoo/Funtoo Linux and FreeBSD I never had to do anything else but to give the driver update command and everything worked fine. Moreover, NVIDIA has always provided a nice GUI application "NVIDIA X Server Settings", which provides a lot of useful information and which makes very easy any configuration tasks, like setting the desired positions of multiple monitors. A few years ago there was nothing equivalent for the AMD or Intel GPU drivers, but that might have changed meanwhile.

tadasv · 2 years ago

great. rtx 4090 works out of the box after installing drivers from non-free. That's on debian bookworm.

littlecranky67 · 2 years ago

I got my nvidia 1060 back then during the crypto crysis when the price of AMD GPUs were inflated due to miners. Hesitant and scepital about Linux support, I upgraded the same machine with that GPU since 2016 von Ubuntu 14.04, to 18.04 and now 24.04 - without any nvidia driver issues anytime whatsoever. When I read about issues with nvidias drivers, it is mostly people with rare distro or rolling release ones, with changing kernel versions very frequently and failure to recompile with the binary drivers. For LTS distros you will likely have no issues.

jppittma · 2 years ago

4070 worked out of the box on my arch system. I used the closed source drivers and X11 and I've not encountered a single problem.

My prediction is that it will continue to improve if only because people want to run nvidia on workstations.

tgsovlerkhgsel · 2 years ago

My experience with an AMD iGPU on Linux was so bad that my next laptop will be Intel. Horrible instability to the point where I could reliably crash my machine by using Google Maps for a few minutes, on both Chrome and Firefox. It got fixed eventually - with the next Ubuntu release, so I had a computer where I was afraid to use anything with WebGL for half a year.

mathfailure · 2 years ago

Depends on the version of drivers: 550 version results into black screen (you have to kill and restart X server) after waking up from sleep. 535 version doesn't have this bug. Don't know about 555.

Also tearing is a bitch. Still. Even with ForceCompositionPipeline.

art0rz · 2 years ago

I've been running Arch with KDE under Wayland on two different laptops both with NVIDIA GPUs using proprietary drivers for years and have not run into issues. Maybe I'm lucky? It's been flawless for me.

lyu07282 · 2 years ago

The experiences always vary quite a lot, it depends so much on what you do with it. For example discord doesn't support screen sharing with Wayland, it's just one small example but those can add up over time. Another example is display rotation which was broken in kde for a long time (recently fixed).

DaoVeles · 2 years ago

I have never had an issue with them. That said I typically go mid range on cards so they are usually hardened architecture due to a year or two of being in the high end.

devwastaken · 2 years ago

KDE plasma 6 + Nvidia beta 555 works well. Have to make .desktop files to launch some applications explicitly Wayland.

green-salt · 2 years ago

Whatever pop_os uses has been quite stable for my 4070.

tormeh · 2 years ago

Pop uses X by default because of Nvidia.

segmondy · 2 years ago

plug, install then play, I got 3 different Nvidia GPU sets and all running without any issue, nothing crazy to do but follow installation instructions.

anonym29 · 2 years ago

To some of us, running any closed source software in userland qualifies as quite crazy indeed.

shanoaice · 2 years ago

bradyriddle · 2 years ago

I remember Nvidia getting hacked pretty bad a few years ago. IIRC, the hackers threatened to release everything they had unless they open sourced their drivers. Maybe they got what they wanted.

[0] https://portswigger.net/daily-swig/nvidia-hackers-allegedly-...

justinclift · 2 years ago

For Nvidia, the most likely reason they've strongly avoided Open Sourcing their drivers isn't anything like that.

It's simply a function of their history. They used to have high priced professional level graphics cards ("Nvidia Quadro") using exactly the same chips as their consumer graphics cards.

The BIOS of the cards was different, enabling different features. So people wanting those features cheaply would buy the consumer graphics cards and flash the matching Quadro BIOS to them. Worked perfectly fine.

Nvidia naturally wasn't happy about those "lost sales", so began a game of whack-a-mole to stop BIOS flashing from working. They did stuff like adding resistors to the boards to tell the card whether it was a Geforce or Quadro card, and when that was promptly reverse engineered they started getting creative in other ways.

Meanwhile, they couldn't really Open Source their drivers because then people could see what the "Geforce vs Quadro" software checks were. That would open up software countermeasures being developed.

---

In the most recent few years the professional cards and gaming cards now use different chips. So the BIOS tricks are no longer relevant.

Which means Nvidia can "safely" Open Source their drivers now, and they've begun doing so.

Note that this is a copy of my comment from several months ago, as it's just as relevant now as it was then: https://news.ycombinator.com/item?id=38418278

SuperNinKenDo · 2 years ago

Very interesting, thanks for the perspective. I suspect all the recent loss of face they experienced with the transition to Wayland happening around the time that this motivation evaporated also probably plays a part too though.

I swore off ever again buying Nvidia, or any laptops that come with Nvidia, after all this. Maybe in 10 years they'll have managed to right the brand perceptions of people like myself.

1oooqooq · 2 years ago

interesting timing to recall that story. now the same trick is used for h100 vs whatever the throttled-for-embargo-wink-wink Chinese version is called.

but those companies are really adverse to open sourcing because they can't be sure they own all the code. it's decades of copy pasting reference implementations after all

CamperBob2 · 2 years ago

The explanation could also be as simple as fear of patent trolls.

dralley · 2 years ago

I doubt it. It's probably a matter of constantly being prodded by their industry partners (i.e. Red Hat), constantly being shamed by the community, and reducing the amount of maintenance they need to do to keep their driver stack updated and working on new kernels.

The meat of the drivers is still proprietary, this just allows them to be loaded without a proprietary kernel module.

chillfox · 2 years ago

Nvidia has historically given zero fucks about the opinions of their partners.

So my guess is it's to do with LLMs. They are all in on AI, and having more of their code be part of training sets could make tools like ChatGPT/Claude/Copilot better at generating code for Nvidia GPUs.

p_l · 2 years ago

I suspect it's mainly the reduced maintenance and reduction of workload needed to support, especially with more platforms coming to be supported (not so long ago there was no ARM64 nvidia support, now they are shipping their own ARM64 servers!)

What really changed the situation is that Turing architecture GPUs bring new, more powerful management CPU, which has enough capacity to essentially run the OS-agnostic parts of driver that used to be provided as blob on linux.

kabes · 2 years ago

It's hard to believe one of the highest valued companies in the world cares about being shamed for not having open source drivers.

nicce · 2 years ago

Kernel modules are not user-space drivers which are still proprietary.

Ooops. Missed that part.

Re-reading that story is kind of wild. I don't know how valuable what they allegedly got would be (silicon, graphics and chipset files) but the hackers accused Nvidia of 'hacking back' and encrypting their data.

Reminds me of a story I heard about Nvidia hiring a private military to guard their cards after entire shipments started getting 'lost' somewhere in asia.

porphyra · 2 years ago

Much of the black magic has been moved from the drivers to the firmware anyway.

nicman23 · 2 years ago

they did release it. a magic drive i have seen, but totally do not own, has it

creata · 2 years ago

Huh. Sway and Wayland was such a nightmare on Nvidia that it convinced me to switch to AMD. I wonder if it's better now.

(IIRC the main issue was https://gitlab.freedesktop.org/xorg/xserver/-/issues/1317 , which is now complete.)

snailmailman · 2 years ago

Better as of extremely recently. Explicit sync fixes most of the issues with flickering that I’ve had on Wayland. I’ve been using the latest (beta?) driver for a while because of it.

I’m using Hyprland though so explicit sync support isn’t entirely there for me yet. It’s actively being worked on. But in the last few months it’s gotten a lot better

JasonSage · 2 years ago

> Better as of extremely recently.

Yup. Anecdotally, I see a lot of folks trying to run wine/games on Wayland reporting flickering issues that are gone as of version 555, which is the most recent release save for 560 coming out this week. It's a good time to be on the bleeding edge.

joecool1029 · 2 years ago

It's buggy still with sway on nvidia. I really thought the 555 driver would wrinkle out last of the issues but it still has further to go. Switched to kde plasma 6 on wayland since then and it's been great, not buggy at all.

XorNot · 2 years ago

Easy Linux use is what keeps me firmly on AMD. This move may earn them a customer.

modzu · 2 years ago

why switch to amd and not just switch to X? :D

whalesalad · 2 years ago

once you go Wayland you usually don’t go back :)

account42 · 2 years ago

Why not both?

sillywalk · 2 years ago

From the github repo[0]:

Most of NVIDIA's kernel modules are split into two components:

When packaged in the NVIDIA .run installation package, the OS-agnostic component is provided as a binary:

[0] https://github.com/NVIDIA/open-gpu-kernel-modules

That was the "classic" drivers.

The new open source ones effectively move majority of the OS-agnostic component to run as blob on-GPU.

arghwhat · 2 years ago

Not quite - it moves some logic to the GSP firmware, but the user-space driver is still a significant portion of code.

The exciting bits there is the work on NVK.

hypeatei · 2 years ago

jcalvinowens · 2 years ago

Throwing the tarball over the wall and saying "fetch!" is meaningless to me. Until they actually contribute a driver to the upstream kernel, I'll be buying AMD.

aseipp · 2 years ago

You can just use Nouveau and NVK for that if you just need workstation graphics (and the open-gpu-modules source code/separate GSP release has been a big uplift to Nouveau too, at least.)

Nouveau is great, and I absolutely admire what the community around it has been able to achieve. But I can't imagine choosing that over AMD's first class upstream driver support today.

neop1x · 2 years ago

IIRC hardware video decoding of HEVC didn't work for me with nouveau

einpoklum · 2 years ago

The title of this statement is misleading:

NVIDIA is not transitioning to open-source drivers for its GPUs; most or all user-space parts of the drivers (and most importantly for me, libcuda.so) are closed-source; and as I understand from others, most of the logic is now in a binary blob that gets sent to the GPU.

Now, I'm sure this open-sourcing has its uses, but for people who want to do something like a different hardware backend for CUDA with the same API, or to clear up "corners" of the API semantics, or to write things in a different-language without going through the C API - this does not help us.

floam · 2 years ago

NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

NVIDIA Transitions Towards Fully Open-Source GPU Kernel Modules?

slashdave · 2 years ago

Not much point in a "partially" open-source kernel module.

But “fully towards” is pretty ambiguous, like an entire partial implementation.

Anyhow I read the article, I think they’re saying fully as in exclusively, like there eventually will not be both a closed source and open source driver co-maintained. So “fully open source” does make more sense. The current driver situation IS partially open source, because their offerings currently include open and closed source drivers and in the future the closed source drivers may be deprecated?

j4hdufd8 · 2 years ago

haven't read it but probably the former

throwadobe · 2 years ago

"towards" basically negates the "fully" before it for all real intents and purposes