Xilinx routinely had more I/O (SerDes, 100/200/400G MACs on-die) and at times now more HBM bandwidth than contemporary GPUs. Also deterministic latency and perfectly acceptable DSP primitives.
The gap has always been the software.
Of course NVidia wasn’t such an obvious hit either, the flubbed the tablet market due to yield issues and ultimately it really only went exponential in 2014. I invested heavily in NVidia 2007-2014 because of the CUDA edge they had, but sold my $40K of stock at my cost-basis.
I currently do DSP for radar, and implemented the same system on FPGA and in CUDA 2020-2023. I know as a fact that the FFT performance of an $9000 FPGA was equal to a $16000 A100 that also needed a $10000 computer in 2022 (the types on FPGA were fixed point instead of float so no apples-to-apples but definitely application equivalent)
Most of those efforts stem from the underlying notion that “…this is all a problem with the tooling!”
This approaches the problem space from a very software-centric lens. Fundamentally, gateware design isn’t software. It’s wiring together logic gates if you really boil it down to fundamentals. Treating it as a tooling problem is to misconstrue how much you know. Plainly: no open source toolchain is going to have insight into Xilinx’s internal fanout or propagation delay specs. You’re reliant on Xilinx to encode these into their tools for you.
As a result: “Vendor tools are God in FPGA land. You don’t go against God.” (Quoted from the staff FPGA engineer on my team.)
Even once you get that LED blinking, changing a clock speed for that blinking LED should be near instantaneous but more likely requires a rebuilding the whole project. Fundamentally the vendors don’t view their chips as something designed to run programs, and this legacy hardware design mentality plagues their whole business.
Something important here: Xilinx could and should have been where NVidia is today. They were certainly aware of the competitive accelerated computing market as early as 2005, and fundamentally failed to make a software architecture competitive with CUDA.
Before CUDA even existed I interned at Xilinx working on the beginnings of their HLS C compiler. My (decade older) fraternity brother led the C compiler team at Altera. We almost went into making a spreadsheet compiler for FPGA (my masters thesis) together but 2007 ended up being a terrible year to sell accelerated computing to Wall Street.
Deleted Comment
And that was before Claude Code.
It hits the request per minute limit instantly and then you wait a minute.
I was excited, then I read this:
> Send up to 1,000 messages per day—enough for 3–4 hours of uninterrupted vibe coding.
I don't mind paying for services I use. But it's hard to take this seriously when the first paragraph claim is contradicting the fine prints.
In San Francisco we just call it “a Waymo”