A Case for Asynchronous Computer Architecture (2000) [pdf]

darkstarsys · 4 years ago

I tried to do a clockless fully async bus interface in around 1988 in a chip I was designing at Masscomp for a fast data acquisition system. Never got built, but it was fun trying, and it would've been really fast. "Lower design complexity" though: hahaha! Nope.

pclmulqdq · 4 years ago

I worked with Alain Martin at Caltech, and I always loved the idea of asynchronous circuits. When I became an FPGA engineer, I realized the big problem with both FPGAs and asynchronous logic: the tooling doesn't generalize well to other domains, so you have to be a narrow specialist to make progress.

If someone could convert synchronous verilog to async circuits under the hood, they may see huge gains in speed and power use for their circuits, but that is a huge uphill climb.

ajb · 4 years ago

There is an FPGA company, Achronix, that claimed to do this. Their FPGA architecture was apparently asynchronous, and they had tools that compiled synchronous designs onto it. Don't know how good their tech was, but they got bought by Intel and are still making it AFAIR.

pclmulqdq · 4 years ago

Their async tech worked fine, but FPGAs likely don't get the same benefit as ASICs from async logic. Not to mention debugging is hard, so if you're not committed to a design, you may not want to put in the effort.

The Achronix folks are going strong today (still independent), but with a much more conventional FPGA. The on-chip network may be async, but they hide it well. I hope they have lots of success in the future.

achronix · 4 years ago

Achronix is still an independent FPGA company No longer focused on asynchronous FPGA technology Currently shipping high performance synchronous FPGA technology on 7nm Learn more about the Speedster7t FPGA: https://www.achronix.com/product/speedster7t-fpgas

Animats · 4 years ago

It's a classic idea. There were some early asynchronous mainframes built from discrite logic. It might come back. It's an idea that comes around when you can't make the clock speed any higher.

It's one of those things from the department of "we can make it a little faster at the cost of much greater complexity, higher cost, and lower reliability". That's appropriate to weapons systems and auto racing.

jacquesm · 4 years ago

I think the long term driver won't be speed but power consumption, something that Asynchronous Computing has the potential to materially improve.

bob1029 · 4 years ago

Having a common clock reference (per core) is essential for reducing latency between components. If you have to poll or await some other component arbitrarily, there will necessarily be extra overhead and delays in these areas. There will also need to be extra logic area dedicated to these activities. Make no mistake, just because there's no central clock, doesnt mean you are magically off the hook. You still need to logically serialize the instruction stream(s).

Even for low power applications, you would probably use less battery getting the work done quickly in a clocked CPU and then falling back to a lower power state ASAP. Allowing the pipeline effects to take hold in a modern clocked CPU should quickly offset any relative overhead. Heterogenous compute architecture is also an excellent and proven approach.

Certainly, there are many things that happen in a CPU that should not necessarily be bound by a synchronous clock domain (e.g. ripple adder). But, for these areas where async cpu a clear win, would we actually see any gains in practice using real software? Feels like there's a lot of other strategic factors that wash out any specific wins.

saurik · 4 years ago

My understanding--which seems to coincide with this article and which Wikipedia seems to agree with (not that that necessarily means much for this)--is that in an asynchronous circuit latency would be lower, not higher, as the clock is required to wait for the worst-case performance while a clock-less system can proceed immediately once only the required inputs have arrived (or even attempt to speculate on partial inputs, something which would offer no value if you would have to end up waiting for the next tick anyway).

blagie · 4 years ago

This is correct. It happens at multiple levels. Oversimplified:

* An async add operation takes variable time based on the number of carries, whereas a sync one is set to the worst-case.

* The clock for an ALU is set for the worst-case even when doing something faster (e.g. an ADD rather than a NAND)

* If you have multiple logic stages handled in one clock cycle, the problem is compounded. The clock is set by the slowest stage for all components in the system.

* If your system is doing nothing, you're still clocking it. Clocks are adjusted, but not at a nanosecond-by-nanosecond level.

All-in-all async gives a nice power boost and a nice performance boost (not enough of a boost to displace an entrenched ecosystem, mind you, but a nice boost nonetheless).

baybal2 · 4 years ago

Clock distribution eats a lot of power at gigahertz frequencies, and a lot of gates.

> If you have to poll or await some other component arbitrarily, there will necessarily be extra overhead and delays in these areas.

You don't poll. You have a lot of small input-clocked domains which work at a speed with which data comes.

adrian_b · 4 years ago

True, but asynchronous circuits need a lot of extra gates and signals for detecting when an operation is completed and notifying the next stage that it can proceed.

It is very difficult to estimate which of the 2 approaches will need less area and power for some given requirements.

It is likely that for a sufficiently complex device an asynchronous implementation will use less power, but the effort to design bug-free complex asynchronous logic is much higher than for synchronous designs, which is probably the main reason why very few commercial asynchronous devices have existed.

fivelessminutes · 4 years ago

This seems to be from 20 years ago, the most recent citation was from 2000 and it describes a MIPS chip built on a 1998 process.

matja · 4 years ago

And not even a mention of AMULET (https://en.wikipedia.org/wiki/AMULET_microprocessor)

Taniwha · 4 years ago

Nor this one:

https://authors.library.caltech.edu/43698/1/25YearsAgo.pdf

It was the original paper for this that got me interested in building silicon tools

nickdothutton · 4 years ago

Came here to say this.

mikeurbach · 4 years ago

We had the pleasure of hosting Dr. Manohar at a CIRCT weekly discussion session earlier this year. He presented much more recent work if anyone is interested. The talk and discussion was recorded here: https://sifive.zoom.us/rec/play/Bg99_niHh9OG_8uE_nhaz6otxvA0...

EDIT: talk begins around 7 minutes.

mahami · 4 years ago

Yes, but I thought that it could be interesting to look at research on the topic from 20 years ago to compare it with present progress.

UncleOxidant · 4 years ago

Has there been much progress? I remember hearing a lot about asynchronous logic circuits back in the 90s, but don't hear about much in the way of breakthroughs since then.

dgellow · 4 years ago

Could you add the publication year in the title of your submission?

blagie · 4 years ago

Asynchronous would work better, but we're unlikely to get there -- too big a change.

It's like:

* having ECC everywhere

* having a single display standard (as opposed to HDMI/DisplayPort/USB-C/DVI/VGA/...)

* some kind of architecture where a single bad expansion card (USB, PCIe, etc.) can't crash a whole computer

... and so on

On one hand, no brainer. On the other hand, it hasn't happened.

NVidia is breaking ground on the move to SIMD/MIMD-style architectures, as predicted at the same time, and only because it gives a 30x boost in performance. Async will probably net us a 50% performance boost or something.

bullen · 4 years ago

I don't think async can make things faster but it can make them more energy efficient and the incentives for that is still close to none as our economic models reward waste until all EROEI is depleted.

But you need to add the ability to switch things off dynamically, meaning cores on CPU/GPU; so far the industry has solved this with little.big but that requires all software to change, it's going to take time that we unfortunately do not have as hardware is closing the ownership model.

baybal2 · 4 years ago

I will raise an import distinction: asynchronous logic != dynamic logic.

There can be dynamic synchronous logic, and vice versa.

Dynamic vs. static determines whether the circuit as such needs to be driven by any constant pacing input, whether embedded clock, or external clock, vs. not needing it to arrive to a settled state (to latch.)

If you are to speak strictly, asynchronous vs. synchronous determines whether that pacing input is external, or recovered from input.

SavantIdiot · 4 years ago

Do you mean domino logic?

FullyFunctional · 4 years ago

Domino is _one_ version of asynchronous, but that's using a different notion of Asynchronous than the article. Because of the ambiguity, we talk today of clock-less logic, which comes in variants, most notably delay-insensitive and quasi-delay-insensitive. The latter is faster, but less immune to noise (has has terrible timing analysis issues).

CalChris · 4 years ago

Mini-MIPS isn't that different from a conventional out-of-order superscalar microarchitecture. The article even says:

  However, the MiniMIPS pipeline structure can execute instructions out-of-order with respect to each other because instructions that take different times to execute are not artificially synchronized by a clock signal.