kseniase (u/kseniase)

kseniase commented on GPU's Rival? What Is Language Processing Unit (LPU) turingpost.com/p/fod41... · Posted by u/kseniase

anon291 · 2 years ago

There's a youtube video somewhere. Intuitively, the speedup comes from the fact that the pipelining is controlled by software, rather than hardware. There's no memory management unit. Memory operations and timing is controlled by software entirely. This is terrible for CPUs, where it's really important to do register renaming, out-of-order execution, etc. However, for AI models, it doesn't really matter. Tensor accesses are easily optimized by software, which knows exactly how the chip runs. Essentially, the chip is able to read memory, write memory, do vector operations, multiply matrices, and do permutations all in the same cycle, and in fact it can have several of these operations going on at the same time.

The compiler obviously knows how to schedule all this to produce good timing. That's why the utilization is so high. Every piece of the chip is being used. Whereas with a GPU model, you have to kind of play 'code golf' and have a deep understanding of the architecture and the execution engine to be able to determine when a memory access is going to cause a pipeline stall.

If you understand the ISA of the GroqChip, you understand completely how the chip works. There is no pipeline stall. There is no waiting for memory. There is no memory hierarchy. Even the interconnects work at the same speed as the chip (or something like that... it's all in the paper, so public information), so when they network chips together, the latency between any two subunits of every chip is already known by the network topology and the chip timings.

TL;DR It's all very deterministic and this works well for tensor operations. Parallelism is determined at compile time

Source: https://www.youtube.com/watch?v=pb0PYhLk9r8 . This is basically exactly how the chip works. It's not some simplified architectural or marketing diagram. It's literally how it works.

kseniase · 2 years ago

I re-posted your comment here: https://www.turingpost.com/p/fod41

kseniase commented on GPU's Rival? What Is Language Processing Unit (LPU) turingpost.com/p/fod41... · Posted by u/kseniase

anon291 · 2 years ago

As a former Groq engineer, the LPU branding is not really technically accurate. These are general purpose compute chips with excellent software for transformer models. In the sense that the GroqChip is general-purpose, it is basically a highly parallel, high performance, deterministic execution engine. It 'rivals' GPUs in the sense that obviously both are used for AI execution. The whitepapers cited are basically a perfect model for how the chips operate. As you can see, they are general purpose tensor units.

That being said, yes the software is impressive and the engineering team at Groq is top-notch.

Unfortunately, the chips are mainly aimed at inference, and it seems like a lot of the large investments at the moment are being driven by training.

EDIT: In the spirit of full disclosure, I suppose I should point out that I own lots of Groq shares, so have every interest in their success.

kseniase · 2 years ago

do you think the article explains it well what LPU is?

kseniase commented on GPU's Rival? What Is Language Processing Unit (LPU) turingpost.com/p/fod41... · Posted by u/kseniase

kseniase · 2 years ago

With breakthroughs in inference and long context understanding, we are officially entering a new era in LLMs

kseniase commented on OpenAI Chronicle: What Drives ChatGPT Creators turingpost.com/p/openaich... · Posted by u/kseniase

kseniase · 2 years ago

The new series about 13 GenAI unicorns, starting with OpenAI: What is Sam Altman's vision, how OpenAI makes money, what caused ChatGPT success, and the urge for AI regulations