A few years back I wrote an N64 emulator graphics module for the Raspberry Pi because I was unhappy with the poor performance of the existing ones and I would have really wanted this. Drawcall overhead for Broadcom's official drivers is extreme: you can't have more than a single-digit number of drawcalls per frame and still maintain 60 FPS. I ended up having to go to ubershaders for everything just to maintain 30 FPS. I'm certain the hardware was capable of much more, but the drivers were holding it back.
Only partially true on the Pi4: The complete OpenGL stack is now open source using Mesa (it was optional on previous models). Video decoding is slowly moving over from the closed MMAL stack to KMS and V4L2. The boot firmware is still closed though.
Do you mean hardware module? Have you got any blogs or docs about this?
Came across the same thing doing a baremetal kernel (fun!) and this never crossed my mind!
I wonder whether there will compositor support this driver, to take advantage in the desktop environment. I had made several attempts to get a smoother desktop experience[1] on RPi 3 -
•LXDE + Openbox on Raspbian + X server
•Xfce4 + VC4 + X server + Arch Linux ARM + USB SSD
•Enlightenment + Wayland + Arch Linux ARM + USB SSD
Although Elightenment on Wayland with OpenGL was the smoothest of them all, it's not usable(frequent crashes with RPi) and since the frame buffer was limited to 2048x2048 none of them supported my 2560x1080 monitor.
Xfce4 + VC4 on Arch Linux is more usable, but is still not as stable as default Raspbian. I didn't see any productivity merits in continuing this adventure and decided to reclaim the memory from GPU to revert into headless[Arch+SSD] for a motion eye setup processing 3 720p camera streams simultaneously with average of ~ 50% CPU on all 4 cores when not watching the feed live(but motion active).
I found arch to be quite stable on mine, though I didn't use a desktop most of the time. When I did, i3 was by far the fastest, so maybe give that a try.
I agree reg Arch ARM, may it's just VC4 that's causing the issue in desktop environment. I'll give i3 a try if I pursue this, but IMHO RPi < Pi4 are best suited for headless operations.
Very nice, this can speed up some of the old Pis still hanging around quite significantly if software authors start making use of Vulkan on ARM.
I wonder how Nvidia is looking at this with their terrible anti-open source mindset. I hooe an engineer of theirs with experience from their company writing a video driver doesn't get the author any repercussions.
Just makes me wonder whether there could be any conflict of interest here and who owns the copyright. I know some of the big corps assume ownership of whatever their employees produce even outside working hours. Or Maybe the project was signed off ?
How hard would it be to reverse engineer CUDA and make something like WINE that translates all CUDA operations into OpenCL or directly into third-party GPU instructions?
Obviously wouldn't be as fast as on NVIDIA hardware but it would potentially be much faster than the CPU versions of those pieces of software.
That’s actually it with the CUDA stuff. Nvidia made a massive investment, and is still doing so, so the work got done. OpenCL just doesn’t have the same cash backing it up.
Oh also this is basically what AMDs HIP does. The issue is that the project hasn’t been moving very fast.
Prepare for loads of reverse-engineering, as there are binary-only CUDA kernels in applications. This does tend to break when new GPUs come out, but a software update takes care of that. I assume some do that for obfuscation, while others are certainly using it for performance, see e.g. https://github.com/bryancatanzaro/nervana-lib-gpu-performanc... for some practical-ish examples.
It's easier to replace the CUDA kernels with probably designed OpenCL kernels instead.
Edit: For third party: Thats what the driver does, compiles OpenGL and OpenCL code into GPU machine code. In case of mesa based on reverse engineering.
Exactly like iPhone story on performance decreasing "for battery life" for older devices, each time on next device model rolled out.
I wonder whether there will compositor support this driver, to take advantage in the desktop environment. I had made several attempts to get a smoother desktop experience[1] on RPi 3 -
•LXDE + Openbox on Raspbian + X server
•Xfce4 + VC4 + X server + Arch Linux ARM + USB SSD
•Enlightenment + Wayland + Arch Linux ARM + USB SSD
Although Elightenment on Wayland with OpenGL was the smoothest of them all, it's not usable(frequent crashes with RPi) and since the frame buffer was limited to 2048x2048 none of them supported my 2560x1080 monitor.
Xfce4 + VC4 on Arch Linux is more usable, but is still not as stable as default Raspbian. I didn't see any productivity merits in continuing this adventure and decided to reclaim the memory from GPU to revert into headless[Arch+SSD] for a motion eye setup processing 3 720p camera streams simultaneously with average of ~ 50% CPU on all 4 cores when not watching the feed live(but motion active).
[1]https://abishekmuthian.com/getting-smoother-desktop-experien...
I wonder how Nvidia is looking at this with their terrible anti-open source mindset. I hooe an engineer of theirs with experience from their company writing a video driver doesn't get the author any repercussions.
Obviously wouldn't be as fast as on NVIDIA hardware but it would potentially be much faster than the CPU versions of those pieces of software.
The issue here is manpower, and not obfuscation in any way.
That’s actually it with the CUDA stuff. Nvidia made a massive investment, and is still doing so, so the work got done. OpenCL just doesn’t have the same cash backing it up.
Oh also this is basically what AMDs HIP does. The issue is that the project hasn’t been moving very fast.
If it's your software, yes. But not if you're trying to fix an already-written framework or run other peoples' machine learning models.
Zink is an OpenGL Implementation on top of Vulkan and that might actually be a good way to do full OpenGL on these devices.