I tried to do a clockless fully async bus interface in around 1988 in a chip I was designing at Masscomp for a fast data acquisition system. Never got built, but it was fun trying, and it would've been really fast. "Lower design complexity" though: hahaha! Nope.
I worked with Alain Martin at Caltech, and I always loved the idea of asynchronous circuits. When I became an FPGA engineer, I realized the big problem with both FPGAs and asynchronous logic: the tooling doesn't generalize well to other domains, so you have to be a narrow specialist to make progress.
If someone could convert synchronous verilog to async circuits under the hood, they may see huge gains in speed and power use for their circuits, but that is a huge uphill climb.
There is an FPGA company, Achronix, that claimed to do this. Their FPGA architecture was apparently asynchronous, and they had tools that compiled synchronous designs onto it. Don't know how good their tech was, but they got bought by Intel and are still making it AFAIR.
Their async tech worked fine, but FPGAs likely don't get the same benefit as ASICs from async logic. Not to mention debugging is hard, so if you're not committed to a design, you may not want to put in the effort.
The Achronix folks are going strong today (still independent), but with a much more conventional FPGA. The on-chip network may be async, but they hide it well. I hope they have lots of success in the future.
Achronix is still an independent FPGA company
No longer focused on asynchronous FPGA technology
Currently shipping high performance synchronous FPGA technology on 7nm
Learn more about the Speedster7t FPGA: https://www.achronix.com/product/speedster7t-fpgas
It's a classic idea. There were some early asynchronous mainframes built from discrite logic. It might come back. It's an idea that comes around when you can't make the clock speed any higher.
It's one of those things from the department of "we can make it a little faster at the cost of much greater complexity, higher cost, and lower reliability". That's appropriate to weapons systems and auto racing.
Having a common clock reference (per core) is essential for reducing latency between components. If you have to poll or await some other component arbitrarily, there will necessarily be extra overhead and delays in these areas. There will also need to be extra logic area dedicated to these activities. Make no mistake, just because there's no central clock, doesnt mean you are magically off the hook. You still need to logically serialize the instruction stream(s).
Even for low power applications, you would probably use less battery getting the work done quickly in a clocked CPU and then falling back to a lower power state ASAP. Allowing the pipeline effects to take hold in a modern clocked CPU should quickly offset any relative overhead. Heterogenous compute architecture is also an excellent and proven approach.
Certainly, there are many things that happen in a CPU that should not necessarily be bound by a synchronous clock domain (e.g. ripple adder). But, for these areas where async cpu a clear win, would we actually see any gains in practice using real software? Feels like there's a lot of other strategic factors that wash out any specific wins.
My understanding--which seems to coincide with this article and which Wikipedia seems to agree with (not that that necessarily means much for this)--is that in an asynchronous circuit latency would be lower, not higher, as the clock is required to wait for the worst-case performance while a clock-less system can proceed immediately once only the required inputs have arrived (or even attempt to speculate on partial inputs, something which would offer no value if you would have to end up waiting for the next tick anyway).
This is correct. It happens at multiple levels. Oversimplified:
* An async add operation takes variable time based on the number of carries, whereas a sync one is set to the worst-case.
* The clock for an ALU is set for the worst-case even when doing something faster (e.g. an ADD rather than a NAND)
* If you have multiple logic stages handled in one clock cycle, the problem is compounded. The clock is set by the slowest stage for all components in the system.
* If your system is doing nothing, you're still clocking it. Clocks are adjusted, but not at a nanosecond-by-nanosecond level.
All-in-all async gives a nice power boost and a nice performance boost (not enough of a boost to displace an entrenched ecosystem, mind you, but a nice boost nonetheless).
True, but asynchronous circuits need a lot of extra gates and signals for detecting when an operation is completed and notifying the next stage that it can proceed.
It is very difficult to estimate which of the 2 approaches will need less area and power for some given requirements.
It is likely that for a sufficiently complex device an asynchronous implementation will use less power, but the effort to design bug-free complex asynchronous logic is much higher than for synchronous designs, which is probably the main reason why very few commercial asynchronous devices have existed.
We had the pleasure of hosting Dr. Manohar at a CIRCT weekly discussion session earlier this year. He presented much more recent work if anyone is interested. The talk and discussion was recorded here: https://sifive.zoom.us/rec/play/Bg99_niHh9OG_8uE_nhaz6otxvA0...
Has there been much progress? I remember hearing a lot about asynchronous logic circuits back in the 90s, but don't hear about much in the way of breakthroughs since then.
Asynchronous would work better, but we're unlikely to get there -- too big a change.
It's like:
* having ECC everywhere
* having a single display standard (as opposed to HDMI/DisplayPort/USB-C/DVI/VGA/...)
* some kind of architecture where a single bad expansion card (USB, PCIe, etc.) can't crash a whole computer
... and so on
On one hand, no brainer. On the other hand, it hasn't happened.
NVidia is breaking ground on the move to SIMD/MIMD-style architectures, as predicted at the same time, and only because it gives a 30x boost in performance. Async will probably net us a 50% performance boost or something.
I don't think async can make things faster but it can make them more energy efficient and the incentives for that is still close to none as our economic models reward waste until all EROEI is depleted.
But you need to add the ability to switch things off dynamically, meaning cores on CPU/GPU; so far the industry has solved this with little.big but that requires all software to change, it's going to take time that we unfortunately do not have as hardware is closing the ownership model.
I will raise an import distinction: asynchronous logic != dynamic logic.
There can be dynamic synchronous logic, and vice versa.
Dynamic vs. static determines whether the circuit as such needs to be driven by any constant pacing input, whether embedded clock, or external clock, vs. not needing it to arrive to a settled state (to latch.)
If you are to speak strictly, asynchronous vs. synchronous determines whether that pacing input is external, or recovered from input.
Domino is _one_ version of asynchronous, but that's using a different notion of Asynchronous than the article. Because of the ambiguity, we talk today of clock-less logic, which comes in variants, most notably delay-insensitive and quasi-delay-insensitive. The latter is faster, but less immune to noise (has has terrible timing analysis issues).
Mini-MIPS isn't that different from a conventional out-of-order superscalar microarchitecture. The article even says:
However, the MiniMIPS pipeline structure can execute instructions out-of-order with respect to each other because instructions that take different times to execute are not artificially synchronized by a clock signal.
If someone could convert synchronous verilog to async circuits under the hood, they may see huge gains in speed and power use for their circuits, but that is a huge uphill climb.
The Achronix folks are going strong today (still independent), but with a much more conventional FPGA. The on-chip network may be async, but they hide it well. I hope they have lots of success in the future.
It's one of those things from the department of "we can make it a little faster at the cost of much greater complexity, higher cost, and lower reliability". That's appropriate to weapons systems and auto racing.
Even for low power applications, you would probably use less battery getting the work done quickly in a clocked CPU and then falling back to a lower power state ASAP. Allowing the pipeline effects to take hold in a modern clocked CPU should quickly offset any relative overhead. Heterogenous compute architecture is also an excellent and proven approach.
Certainly, there are many things that happen in a CPU that should not necessarily be bound by a synchronous clock domain (e.g. ripple adder). But, for these areas where async cpu a clear win, would we actually see any gains in practice using real software? Feels like there's a lot of other strategic factors that wash out any specific wins.
* An async add operation takes variable time based on the number of carries, whereas a sync one is set to the worst-case.
* The clock for an ALU is set for the worst-case even when doing something faster (e.g. an ADD rather than a NAND)
* If you have multiple logic stages handled in one clock cycle, the problem is compounded. The clock is set by the slowest stage for all components in the system.
* If your system is doing nothing, you're still clocking it. Clocks are adjusted, but not at a nanosecond-by-nanosecond level.
All-in-all async gives a nice power boost and a nice performance boost (not enough of a boost to displace an entrenched ecosystem, mind you, but a nice boost nonetheless).
> If you have to poll or await some other component arbitrarily, there will necessarily be extra overhead and delays in these areas.
You don't poll. You have a lot of small input-clocked domains which work at a speed with which data comes.
It is very difficult to estimate which of the 2 approaches will need less area and power for some given requirements.
It is likely that for a sufficiently complex device an asynchronous implementation will use less power, but the effort to design bug-free complex asynchronous logic is much higher than for synchronous designs, which is probably the main reason why very few commercial asynchronous devices have existed.
https://authors.library.caltech.edu/43698/1/25YearsAgo.pdf
It was the original paper for this that got me interested in building silicon tools
EDIT: talk begins around 7 minutes.
It's like:
* having ECC everywhere
* having a single display standard (as opposed to HDMI/DisplayPort/USB-C/DVI/VGA/...)
* some kind of architecture where a single bad expansion card (USB, PCIe, etc.) can't crash a whole computer
... and so on
On one hand, no brainer. On the other hand, it hasn't happened.
NVidia is breaking ground on the move to SIMD/MIMD-style architectures, as predicted at the same time, and only because it gives a 30x boost in performance. Async will probably net us a 50% performance boost or something.
But you need to add the ability to switch things off dynamically, meaning cores on CPU/GPU; so far the industry has solved this with little.big but that requires all software to change, it's going to take time that we unfortunately do not have as hardware is closing the ownership model.
There can be dynamic synchronous logic, and vice versa.
Dynamic vs. static determines whether the circuit as such needs to be driven by any constant pacing input, whether embedded clock, or external clock, vs. not needing it to arrive to a settled state (to latch.)
If you are to speak strictly, asynchronous vs. synchronous determines whether that pacing input is external, or recovered from input.