A primer on x86 by Casey Muratori and The Primeagen [video]

I think there are a few other legitimate problems with x86 than decode complexity:

* x86_64 traditionally only has 16 registers and destructive instructions. This is apparently also a problem for Intel, hence, they introduced APX which goes to 32 GPR and adds three-operand instructions. But who will use APX in the next 10 years outside specialized server software and gentoo users? Consider that most applications still target x86-64-v1, and runtime dispatch isn't really practical for this.

* Piling on ever larger SIMD extensions. This is not so much a burden on the implementation, but on the software landscape. The way intel and AMD traditionally seem to implement SIMD instructions, is to have execution units that can do an operation of the largest SIMD width, and use those to also execute the smaller SIMD widths. That seems reasonable, and probably is a good tradeoff given the decode complexity, but it also means that not using the latest SIMD extension wastes a lot of compute power. If decode was easier, you could've stuck with one SIMD width, and just add more execution ports, which means that older software also gets a speedup on newer generations. This is kinda what AMD did for Zen 4, where they implemented AVX512 on top of AVX2, instead of the other way around. Or how ARM implementations that only support 128-bit wide NEON can keep up with AVX2 (256-bit) and sometimes AVX512 (512-bit).

* All the legacy things may not cost much in terms of die space, because they are implemented in microcode, but I'd imagine they have significant development and verification costs.

Edit:

Another aspect regarding decode complexity is that the latest ARM cores started removing the uop cache, this would be pretty much impossible for a fast x86 implementation. I wonder how ARM decode width - uop cache and x86 decode width + uop cache scale for wider and wider cores.

zer00eyz · 2 years ago

>> Consider that most applications still target x86-64-v1, and runtime dispatch isn't really practical for this.

https://developers.redhat.com/articles/2024/01/02/exploring-...

Redhat 9 went v2 and they are pushing for v3 now...

I think the drums are beating for this to change in a lot of places. Debian will end up being the hold out and (as a Debian consumer) I dont know how I feel about this...

dralley · 2 years ago

Since Red Hat has a 10 year lifecycle, it can be a little bit more aggressive than Debian, since the last version is still going to have 7+ years of support for whatever hardware is phased out in the new version.

The target platforms are also a bit narrower.

dan-robertson · 2 years ago

I read that the hardware required to decode an x86 instruction isn’t actually that bad (potentially less than certain other fixed width encodings). The problem is that it is hard to decode several instructions from the stream at once

sweetjuly · 2 years ago

This is a problem we know how to solve though. Techniques like predecoding and to a greater extent uop caches mean that you only have to do the harder parts of decode on (relatively) rare misses. On hits, you can avoid the entire problem since you already have this information stored or abstracted away.

jiggawatts · 2 years ago

> But who will use APX in the next 10 years?

There is a decent chance that the bytecode runtimes will enable this instruction set in their JIT output. This would include .NET, JVM and V8.

pixelpoet · 2 years ago

I'd like to imagine x86 OpenCL targets for this as well, unfortunately historically OpenCL codegen on AMD and Intel has been pretty weak compared to offline codegen compared to Clang, GCC and even MSVC.

adrian_b · 2 years ago

> not using the latest SIMD extension wastes a lot of compute power

Not really. The unused parts of the SIMD execution units are powered down.

The only problem is that Intel has not introduced instructions for switching the execution width and the attempts to detect the intentions of the executed programs can never be perfect.

Therefore whenever a program starts executing wider instructions there is a large delay until the wide execution units are powered on and during that time the speed is much lower than possible. When a program stops using wider instructions, there is an even greater delay until the execution units are partially powered off and during that time there is a waste of energy.

All these inefficiencies could have been trivially avoided if Intel had not tried to be so clever, but they would have just allowed the programmers to have control over the power consumption, because only they know in advance what their program will do in the future.

camel-cdr · 2 years ago

> Not really. The unused parts of the SIMD execution units are powered down

True, and that helps with frequency and power, but what I meant with "wastes a lot of compute power" is that e.g. you have an avx2 capable execution unit that can do 256/32 float adds in parallel, but only a single SSE instruction can be scheduled to it, hence you can only get 128/32 float adds via SSE on that execution unit.

Deleted Comment

I got tired of the video because they never got to the point. Here’s the article:

https://hackaday.com/2024/03/21/why-x86-needs-to-die/

The article is, uh, well it’s bad. I’m a little hesitant to respond point-by-point to an article like this because it just sucks up so much time, and there’s just so much wrong with the article on so many levels. Here’s the response article:

https://chipsandcheese.com/2024/03/27/why-x86-doesnt-need-to...

The video appears to go into detail about things like how pipelines but it’s a long video that just doesn’t get to the point fast enough. I think the video is somewhat tearing into the article and ripping it apart, but I hear some comments in the video which I think are not that great, like at 1:01:25:

> RISC is reduced instruction set computer, just means not very many instructions

RISC is better understood not in terms of the acronym itself, but as a label for the way CPUs were designed post 1980 or so.

jcranmer · 2 years ago

What really needs to die here isn't x86, it's the entire concept of RISC versus CISC.

RISC is a collection of ideas about how to build chips that came about in the late 70s and early 80s. CISC is a retronym for things that didn't have those ideas. Some ideas worked out. Some ideas have failed. It's no longer a useful concept for talking about computer architecture; let's instead talk about the actual microarchitecture of the chips.

(Also, yowsers, that original article really is bad.)

klodolph · 2 years ago

Yeah, I completely agree. RISC vs CISC needs to be buried.

It’s interesting to hear stories from the actual designers of CISC chips, because they talk about the things that they did to try and make their chips faster and more useful. “Lots of addressing modes” is one of them, and it kinda turns out the future is to have fewer addressing modes.

And the reasoning for it is kinda interesting. Like, imagine that you have a memory indirect addressing mode, and decide to throw an increment on there for good measure. That’s cool and all, but most compilers never figured out how to generate those instructions, and you can get two page faults from a single damn instruction. Kinda makes it hard to deal with exceptions correctly.

There are also a few bad ideas that came out of the natural evolution of RISC doctrines. Non-interlocked instructions and hazards, VLIW for general-purpose CPUs, a few others. I think in general, if you expose too little of the details of how your CPU works you sabotage anyone trying to write fast code, and if you expose too much of the details, you sabotage your ability to ship a new CPU that works differently.

hayley-patton · 2 years ago

RISC more accurately stands for "load-store architecture", which is very odd because RISC has four letters and there are only three words in load-store architecture.

lloydatkinson · 2 years ago

It seems there's a cult following of pseudo-intellectuals/dev celebrities who all seem to have only worked at "faang" like companies. Everything about it is a bit weird.

In this case, the one named like prime number, seems to spend extraordinary amounts of time waffling on about nothing in particular on twitch streams with a relentless torrent of energy, and then this feeds the echo chamber of their fans.

Nothing of substance is touched on more than briefly. I am not really sure what the point was of this whole article and guest appearance.

SJC_Hacker · 2 years ago

Primagen is like the textbook definition of a brogrammer. He's not an idiot but I suspect there was more to him getting hired at FAANG than aside from just his technical ability.

Deleted Comment

wudangmonk · 2 years ago

TLDR but I'm responding to let you know that I'm grumpy, got that? I am grumpy, carry on.

klodolph · 2 years ago

Yes, I didn’t watch the entire hour-long video recorded from a live stream.

This is not a tight, edited, informational video. It’s not even a scripted presentation. It’s two guys reading an article and reacting to it. They spend some time talking about Three-Body Problem, and Twitch streaming.

MaximilianEmel · 2 years ago

This would have been much better if it was just Casey.

Zhyl · 2 years ago

The Primeagen feels like he's bringing Linux, neovim, programming et al to a new audience, in the same way that Low Level Learning [0] brings rust, assembly and reverse engineering to a younger audience.

I don't think his style is for me personally, but I definitely feel the world is a better place with his voice in it.

[0] https://www.youtube.com/channel/UC6biysICWOJ-C3P4Tyeggzg

znpy · 2 years ago

I followed the primagen briefly, and then unsubscribed. The primagen occasionally does super interesting content like this one video but otherwise his content is useless garbage on average.

Some examples: reaction to other videos rating programming languages, reading out articles (like one guy shutting down some cloud services and instantly realising half a million bucks savings for the company) while commenting out loud but adding little to nothing useful, and some neovim videos where he basically copy-pastes some code here and there and sone functionality appears (without any actual explanation about how the code he pasted actually works).

I like the guy, but i’ve realised that:

1. His content is not a good use of my time (wrt my interests and my skill level)

2. When he makes actually good content (like this one in this case) it’ll reach me anyway, somehow.

Hence I unsubscribed.

mort96 · 2 years ago

This video would certainly have been better if it was just Casey making a well-structured video about how CPUs work and how the article is wrong (and where the article is right).

And that video would've reached his already existing audience of people who already know they're interested in that stuff. Being on Prime's stream hopefully exposes a lot of new people to both him and the topic as a whole.

soupbowl · 2 years ago

I like Primeagen, he brings entertainment to boring topics, I understand others might not like his style. I think we can leave it at that and not accuse him of drug abuse, lol.

peterfirefly · 2 years ago

Entertainment? More like a lots of empty excitement.

Without a doubt. The other one is absolutely exhausting to listen to.

Havoc · 2 years ago

Definitely a stream best enjoyed in small doses

Dead Comment

petermcneeley · 2 years ago

The mill computing project is basically a criticism of x64 I wonder what article Ivan Godard would have written.

I’ve been follow the Mill project for many years now and it’s a bit disappointing that they’ve made more YouTube videos than chips.

mhh__ · 2 years ago

I can't tell if they're extremely bad at executing or whether they've hit a wall on something and the design doesn't actually work.

I think if they'd had started now (/ more recently) and been much more open source-y they'd be in a much better position e.g. it's interesting enough that random autists will work on their toolchain

aviat · 2 years ago

The article needs to mention that Intel's x86S proposal will phase out 16 and 32 bit compatibility.

In that sense, classic x86 is going die.

JoeAltmaier · 2 years ago

Maybe not much to mourn? We used to write lots of assembler, kernels and such. Now they're in C++. As long as you can get at some intrinsics (special instructions to manipulate certain registers e.g. system flags, interrupt condition) you don't really want to mess with assembler any more.

I remember when 'PC Compatible' was absolutely required to be a player in the market. Then, laptops came out and it was suddenly not a thing. Nothing a PC did could be done on a laptop (no accessory cards, new code to use laptop features etc) and the lure of portable overrode any marketing hype.

As markets grow the decisions we make, the optimal product locations, it all changes. You have to reinvent yourself to stay vital.

Michael Dell wrote about this. Every time Dell doubled in size, he had to reinvent the corporate processes. Things that were manual had to be automated, the job got too big for a single person (support, inventory etc).

And things that used to be automated, had to be made bespoke - remember the days when you could 'dial up' a custom Dell computer, every feature? They'd make just the one you wanted. All done on a website, that I imagine resulted in a ticket in front of an assembly line worker.

Then they dropped that, probably because there were no more workers on the line, that got automated too. Then you had options, but from a short list. And only the popular options survived.

I'm just surprised that hardware can persist so long. For instance x86 started in 1978! Most folks using it were not even born back then.

We could really, really use more innovation in the hardware space. For the first ten years it was all about putting mainframe features on silicon, not a lot to invent, just the die getting smaller so more fit, caches and paging and DMA and multiple busses.

Did anything happen after that? I'm not sure it did. What might have happened? Well, real security for instance. Keeping the kernel space 'secret' is a botched idea, we all know secrecy isn't security. Witness Spectre, Downfall and so on.

And why hasn't more of the kernel migrated into silicon? Waiting on a signal, communicating between processes, blocking on an interrupt, mapping memory to user space for i/o and on and on. All possible in silicon, all would avoid thousands of machine cycles and blowing process caches.

What do we get with each new generation? A little faster, a little less power. More compromising kernel bugs. Sigh.

ac130kz · 2 years ago

I've read the article itself, and have to disagree with the author on speculative vulnerabilities, ARM chips got a ton of them too.

grzeshru · 2 years ago

The video title is “x86 needs to die” so the HN re-word is highly editorialized.

tredre3 · 2 years ago

Which is good because the HN title tells me exactly what it is whereas the original title tells me nothing.

aCoreyJ · 2 years ago

The video title is the title of the article they are reacting to and mostly not agreeing with