> Using Microsoft Olive and DirectML instead of the PyTorch pathway results in the AMD 7900 XTX going form a measly 1.87 iterations per second to 18.59 iterations per second!
So the headline should be Microsoft Olive vs. PyTorch and not AMD vs. Nvidia.
The results of the usual benchmarks are inconclusive between the 7900 XTX and the 4080, Nvidia is only somewhat more expensive, yet CUDA is much more popular than anything AMD is allowed to support. So I’d say this makes sense as an AMD vs Nvidia comparison as well.
I’m not sure which customer willing to spend $1000-1200 to do ML workflows isn’t willing to spend $1600 to get another 20%+ of performance and have the fastest card available.
I’m not saying people have unlimited budgets but it just seems like the choice most people in that price range would make.
If it’s completely down to Olive and DirectML then nvidia should be able to use them for similar performance improvements. If not, then AMD is still a defining factor. A quick search didn’t bring up anything definitive on the question though, so I guess we’ll have to wait for someone to try it out ( or someone with faster Google-fu than me)
Been watching this quite closely.
As far as I summarise, the 7900XTX is the first (and only) desktop GPU from AMD that _might_ be worth buying. (They own the console gaming space, but thats a different story).
Not Nvidia beating due to the CUDA issue, but a massive leap in the right direction.
Intel is also making _some_ progress with its ARC range.
Its going to be happy days for us users if/when AMD/Intel are competitive, and cut some of that monopoly margin off Nvidias pricing, but a way to go yet.
Well the problem is that Automatic1111 is not fast...
Other diffusers based UIs with PyTorch Triton will net you 40%+ performance.
Facebook AITemplate inference in VoltaML will be at least twice as fast as A1111 on a 3080, with support for LORAs, controlnet and such. This supports AMD Instinct cards too.
What I am getting at is that people dont really care about A1111 performance on a 3080 because, for the most part, its fast enough.
SegmentAnything is the big one missing from other UIs, but IMO most of the other extensions are pretty niche, especially with how hackable diffusers is compared to the A1111/Comfy SAI backend.
The comments point out that AMD in the table performing well required the use of Microsoft Olive, and someone in the article comments implies that if you use Microsoft Olive with Nvidia instead of Pytorch with Nvidia, then you'll see the Nvidia jump in performance as well, largely rendering the supposed leap by AMD not relevant. Is that true? Can folks chime in?
I've been running pytorch and rocm (5.6 has support for gfx1100 if you compile it yourself) for at least 3 months at 18 it/s on a 7900 XTX. This has been possible for quite a while.
Could someone fill me in on what's actually new here, other than the specific technology used?
Wait, why are they comparing Microsoft Olive on AMD to Pytorch on Nvidia? Nvidia supposedly shipped support for Olive recently, there should be no problem getting a head-to-head comparison: https://www.tomshardware.com/news/nvidia-geforce-driver-prom...
So the headline should be Microsoft Olive vs. PyTorch and not AMD vs. Nvidia.
I’m not sure which customer willing to spend $1000-1200 to do ML workflows isn’t willing to spend $1600 to get another 20%+ of performance and have the fastest card available.
I’m not saying people have unlimited budgets but it just seems like the choice most people in that price range would make.
Not Nvidia beating due to the CUDA issue, but a massive leap in the right direction.
Intel is also making _some_ progress with its ARC range.
Its going to be happy days for us users if/when AMD/Intel are competitive, and cut some of that monopoly margin off Nvidias pricing, but a way to go yet.
Edit: of course I am talking about gaming here since you mentioned consoles
For me the problem is technology and not raw performance
DLSS and RT are massive for someone with a 4k screen but now 4k gaming hardware outside league of legends lol
Unless you need RT for gaming then most of AMDs cards are better value.
Nobody "needs" RT for gaming. It's still in gimmicky phase and not worth neither performance hit nor the way it looks on screen.
doing?
Other diffusers based UIs with PyTorch Triton will net you 40%+ performance.
Facebook AITemplate inference in VoltaML will be at least twice as fast as A1111 on a 3080, with support for LORAs, controlnet and such. This supports AMD Instinct cards too.
What I am getting at is that people dont really care about A1111 performance on a 3080 because, for the most part, its fast enough.
You also have to keep in mind some latest gen AMD GPUs don’t even officially support ROCm on Linux. That’s absurd.
AMD has a choice to invest more staff into ML support, they’re choosing not to.
Could someone fill me in on what's actually new here, other than the specific technology used?
This is a very strange comparison.
I mean, they announced it with a more than 2x speedup for SD in May:
https://blogs.nvidia.com/blog/2023/05/23/microsoft-build-nvi...