This has already been done multiple times without using LLMs. For HIP, there are tools like hipify-clang, hipify-perl, and hipify (the Python-based tool in PyTorch). For SYCL, there is SYCLomatic.
The devil is in the details, though; at some point, all projects encounter non-portable code due to different instruction sets. For example, if the hardware does not support Warpgroup Level Multiply-and-Accumulate or a specific minifloat format, it is actively harmful to translate the code 'as is.' These platforms require software redesign, which is not something that LLMs are currently capable of handling.
You would need to translate CUDA already compiled for an Nvidia GPU to e.g. OpenCL that runs fast on an AMD GPU, which is close AGI level in difficulty.
I can understand AMD not actively working on this solution since it gains them little while costing a hefty amount of developer time. Especially considering the fact that nVidia will be able to arbitrarily break this implementation with every update. What I don't understand is why they would take it down completely, is there some safety concern or are we in the 11 competing standards xkcd again?
This move primarily makes AMD legal look frightened of Nvidia, which seems a bad thing to put out there on every axis.
https://github.com/vosen/ZLUDA/releases
https://github.com/vosen/ZLUDA/forks?include=active&page=1&p...
It seems the last commit to master was 9e56862.
Seems like the lock in here actually isn’t that powerful. Fundamentally it’s math implemented in a C-like language.
The devil is in the details, though; at some point, all projects encounter non-portable code due to different instruction sets. For example, if the hardware does not support Warpgroup Level Multiply-and-Accumulate or a specific minifloat format, it is actively harmful to translate the code 'as is.' These platforms require software redesign, which is not something that LLMs are currently capable of handling.
Deleted Comment