AMD Killed the Itanium (2005)

Yeah this isn't quite what happened. Firstly, Intel didn't start Itanium, HP did, as a successor to their HP Precision line. I forget how they got together, but it was a collaboration between Intel and HP, but HP started it and had largely the architecture defined before Intel got involved.

Secondly, it's true that AMD hammered the nails in the coffin, but AMD wouldn't have mattered if Itanic had been faster, cheap, and on time. Itanic was a disaster partly because of overly complicated design by committee and partly because of the fundamentally flawed assumption (that you don't need dynamic scheduling, AKA OoO processing).

I have an Itanium in the garage, a monument to hubris.

UPDATE: I forgot to mention that from the outside it might seem that Intel had a singular vision, but the reality is that there were massive political battles internally and the company was largely split into IA-64 and x86 camps.

UPDATE2: Itanium was massively successful in one thing: it killed off Alpha and a few other cpus, just based on what Intel claimed.

jasoneckert · 3 years ago

I've always thought that killing off Alpha in favour of pushing Itanium was one of the worst things Intel/HP could have done. Not only was Alpha more advanced architecturally, it was actively implemented and mature. With active development by HP, it could have easily snowballed into the standard cloud hardware platform.

jabl · 3 years ago

Alpha's fate, like the other proprietary RISC architectures that focused on the lucrative but in the end small workstation market, was sealed. With exponentially increasing R&D and manufacturing costs, massive industry consolidation was inevitable. That it was Itanium that delivered the coupe de grace to Alpha was but the final insult, but it would have happened anyway without Itanium.

And it wasn't like the Alpha was some embodiment of perfection either. E.g. that mindbogglingly crazy memory consistency model.

ghaff · 3 years ago

What would become Itanium was publicly announced about eight years before HP acquired Compaq.

cesaref · 3 years ago

I seem to remember Alpha had some very weird memory coherency behaviour on multi-processor machines, but apart from that it was excellent.

inkyoto · 3 years ago

Itanium was a second Intel VLIW design. The first one was the i860, which was a mixed bag of either being eye popping fast, if instruction bundles were handcrafted by a human, or being as slow as a dog if it was a compiler that emitted the code.

Perhaps, there was a belief back then that compilers could be easily optimised or uplifted to generate fast and efficient code, and that did not turn out to be the case. Project management and mismanagement certainly did not help either.

I wonder how a VLIW architecture would pan out today given advances in compilers in last three decades, and whether a ML assisted VLIW backend could deliver on the old dream.

pclmulqdq · 3 years ago

GPUs today have a more VLIW-like architecture than CPUs and almost every neural network accelerator is a VLIW chip of some kind. It's worked out really well.

The big problem is SMT, since it's hard to share a VLIW core between processes, while a superscalar core shares really well.

throwawaymaths · 3 years ago

> fundamentally flawed assumption (that you don't need dynamic scheduling, AKA OoO processing)

I'm still unconvinced this is fundamental. It certainly was flawed back then, but compiler theory has improved a LOT since then, we have polyhedral optimization, e.g. that we didn't have access to... You could probably optimize delay line technology that way.

JonChesterfield · 3 years ago

If you know how long data will take to go to/from memory then you can schedule pretty well.

If you don't know whether some value will hit in the L1 or the L3 cache there's wild variance on how long it'll take so you have to do something else in the meantime. On x64, that's the pipeline and speculation. On a GPU, you swap to another fibre/warp until the memory op finished.

Fundamentally the hardware knows how long the memory access took, the software can only guess how long it will take. That kills effective ahead of time scheduling on most architectures.

moonchild · 3 years ago

Applications tend to exhibit a lot of dynamic behaviour. Consider any graph munger. I do expect there is an interesting subset of applications, particularly those that have fairly narrow scope and that have been highly optimised, which could be effectively statically scheduled. But for general-purpose computation, I don't buy it.

Perhaps as part of the trend towards increasingly heterogeneous architectures, we'll see big VLIW coprocessors for power efficiency in certain serial workloads (GPUs are massively parallel but little VLIW coprocessors; however, they do have dynamic scheduling).

throw0101c · 3 years ago

> It certainly was flawed back then, but compiler theory has improved a LOT since then

Which would be fine if there was a time machine available to send back what we know now to the past. But at the time having sucky compilers for what you want to (try to) accomplish was a bad idea.

The same decision now may be good, but then it was a mistake. It'd be like trying to have a Moon landing project in the 1920s: we got there eventually, but certain things are only possible when the time 'is right'.

therealcamino · 3 years ago

This is the same argument that was being made in the 90s, ironically, that compilers were now good enough that it would work.

The thing is, on regular, array-based codes, it's great. And it was then, too -- the polyhedral approach was all being developed at exactly that time, but maybe it's not clear in hindsight because the terminology hadn't settled yet. Ancourt and Irigoin's "Scanning Polyhedra With Do Loops" was published in 1991. Lots of the unification of loop optimization was labeled affine optimization or described as based on Presburger arithmetic. But that is the technology that they were depending on to make it work.

But most code we run is not Fortran in another guise. The dependences aren't known at compile time. That issue hasn't changed much.

The one change now is that workloads that were called "scientific computing" then are now being run for machine learning. But now it doesn't make sense to run regular, FP-intensive codes on a CPU at all, because of GPUs and ML accelerators. So what's left for CPUs that excel on that workload? I'm not sure there is a niche there.

hajile · 3 years ago

The halting problem means that for typical programs, you can't prove control flow. Proving optimization means not only proving control flow, but then knowing which branches get taken most often so you can optimize. There may be a small subset of programs for which this is true, but the rest leave the compiler completely blinded.

OoO execution does an end run around this by examining the program as it runs and adapting to the current reality rather than a simulated reality. This is the same reason a JIT can do optimizations that a compiler cannot do. The ability to look 500-700 instruction into the future to bundle them together into a kind of VLIW dynamically is a very powerful feature.

As to compiler theory, it really isn't that advanced. Our "cutting edge" compilers are doing glorified find-and-replace for their final optimizations (peephole optimization).

Look at SIMD and auto-vectorization. There are so many potential questions the compiler can't prove the answer to that even trivial vectorization that any programmer would identify can't be used by the compiler to the point where the entire area of research has resulted in basically zero improvements in real-world code.

ilyt · 3 years ago

Because even if you get compilers to optimize for your current VLIW, that makes it harder to make improvements down the line.

Current way, while arguably pretty wasteful on all the micro-optimizing CPU does on incoming bytecode, allows designers to nearly freely expand hardware to meet the needs without having compilers to produce different code.

Findecanor · 3 years ago

I've heard a rumour that supposedly one of the lead designers of the IA-64 architecture had died prematurely mid-project. And that would have left the project without the man with the vision. Hence, design by committee.

_a_a_a_ · 3 years ago

Frank Dobberpuhl died in 2019.

Also the Alpha was the utter opposite of design by committee. A quick skim of the alpha ISA would show you that.

amrb · 3 years ago

Talk about bad omens!

hinoki · 3 years ago

I thought the biggest problem with Itanium was the fact that it was optimising the wrong thing; it maximised single thread performance by going all in on speculative execution, but it turns out that optimising joules per instruction is much more important.

ghaff · 3 years ago

Certainly one of the issues with Itanium was it was fighting the last war.

It was trying to optimize for instruction-level parallelism when power efficiency and thread level parallelism were coming into vogue. Arguably companies like Sun overoptimized for the latter too soon but it was the direction things were going.

A senior exec at Intel told me at the time the focus on frequency in the case of Netburst was driven by Microsoft being uncomfortable with highly multi-core designs--and I have no reason to doubt that was one of the drivers. There was a lot of discussion around the challenges of parallelism, especially on the desktop, at the time. It generally wasn't the problem the hand wringing suggested it would be.

ptx · 3 years ago

> the fundamentally flawed assumption (that you don't need dynamic scheduling, AKA OoO processing)

Could this have worked better with JIT-compiled applications, e.g. Java given a sufficiently clever JVM, where assumptions can be dynamically adjusted at runtime?

(Edit: As opposed to an AOT compiler.)

acdha · 3 years ago

This was the grand hope but it never panned out. It’s possible now that a sufficiently brilliant compiler could make a difference since there was nothing like LLVM at the time and GCC was far less sophisticated. One of the many acts of self-sabotage Intel committed was insisting on their hefty license fees for icc, which meant that almost all real-world comparisons were made using code compiled using GCC or MSVC, which were not as effective optimizing for Itanium. There’s no way they made enough in revenue to balance out all of those lost sales.

The other point in favor of this approach now is that far more code is using high-level libraries. Back then there was still the assumption that distributing packages was hard and open source was distrusted in many organizations so you had many codebases with hand rolled or obsolete snapshots of things we’d get from a package manager now. It’s hard to imagine that wouldn’t make a difference now if Intel was smart enough to contribute optimizations upstream.

jonhohle · 3 years ago

Wasn’t that always the issue with Itanium - that it could have been fast with a sufficiently clever compiler? The problem seemed to be no one was clever enough to write that compiler.

formerly_proven · 3 years ago

PA-RISC seemed fairly neat from the somewhat limited information you can find online (there’s a 1.1 and 2.0 ISA manual on kernel.org). Where there major issues with the ISA? Killing your entire product line and starting over of course rarely worked.

rsaxvc · 3 years ago

The PA-RISC 1.1 ISA encoded specific implementation details that didn't age well, like the branch delay slot and instruction address queues. And required in-order memory accesses, because there wasn't support for cache coherent IO.

pjmlp · 3 years ago

HP-UX 11 was one of the UNIXes I worked on (1999 - 2002, 2005), and the only issue I had, at least during the first employer was their ongoing transition to 64 bits, and the C compiler we had available (aC) was a mix of K&R C with some ISO/ANSI C compliance.

I don't recall any issues with the ISA, and we really liked using its early container capabilities (HP Vault).

kjs3 · 3 years ago

but AMD wouldn't have mattered if Itanic had been faster, cheap, and on time

I dunno about that.

My own personal opinion is that Intel has never been able to re-architect their way out of the fact that the cornerstone of their success is that they are selling x86, and their customers mostly don't care about the theoretical advantages of the bright shiny new thing. They just want to run their software just like they always have. There's a reason why IA-64 has joined iAPX432, i860, StrongARM and i960 as Intel footnotes (outside of the embedded market).

When they were philosophizing about what Itanic should look like, the only thing that x86 obviously needed from a market perspective was a bigger address space. And AMD was smart enough to deliver on that, and here we are.

noNothing · 3 years ago

Itanium didn't kill off Alpha. Intel x86 pricing did. But the most important unheeded lesson in those days was software compatibilty. We went from the days of each computer having its own word length, instruction set, heck, even data format (remember the endian wars?) to source compatibility to binary compatibilty. We learned that for most useage, software stability that allowed taking advantage of Moores law, was seriously more valuable in most cases than gaining a bit more performance or price/performance by changing architectures.

Intel kept the X86 price at a point where no bean counter would favor investing in new architectures. Fortunately AMD broke the headlock on x86.

sliken · 3 years ago

Well Intel's original plan was to keep x86 32 bit, forcing anyone that needed more into IA64. Fortunately AMD came out with x86-64, and when it was clear that IA64 wasn't going to be competitive, Intel brought x86-64 to their chips.

killerstorm · 3 years ago

If you look at SPEC CPU benchmark, Itanium was not bad at all in terms of instructions-per-cycle. IIRC in fp performance it could beat Netburst Pentium4 running twice its clock speed, and even compared favorably to Core. I.e. if Intel produced Itaniums which ran at the same clock speed as Core CPUs, it would be world's fastest single-threaded number cruncher.

So I don't really buy the "Itanium is bad architecture" story.

It's fate was probably decided around 1999-2000. At that point Itanium was still pretty good against Pentium 3 and Pentium 4. And name "IA-64" indicates Intel didn't plan to make 64-bit Pentiums. So eventually Pentiums would fill low-end segment while the rest would be occupied by 64-bit Itaniums.

AMD killed that plan by releasing AMD64 architecture. It was an obvious upgrade to x86, so it would clearly do better in the market than IA-64. So Intel decided to go for x86-64 too, and Itanium was doomed at that point. They didn't even bother making Itaniums with same clock speed as Xeons.

So it's definitely possible that if AMD decided to stick to 32 bits at that time, Intel would have pushed optimized IA-64. Also AMD64 could be worse than it is. E.g. if they decided to increase only register size but keep the number of registers the same, IA-64 could still come on top.

giantrobot · 3 years ago

> AMD killed that plan by releasing AMD64 architecture. It was an obvious upgrade to x86, so it would clearly do better in the market than IA-64.

One of the best aspects of the Opteron was it also happened to be a fantastic 32-bit CPU in addition to AMD64. This was a period where a lot of software, even FOSS wasn't 64-bit clean. There was a lot of pointer arithmetic hiding deep in libraries that were assuming pointers would always be 32-bits.

The Opteron running a 32-bit OS at least as well as a 32-bit Athlon was a huge point in its favor. So your existing system running on new Opteron hardware ran fine and you could mix and match Xeon and Opterons in a fleet. Then switch over to 64-bit on the Opterons for (hopefully) better performance.

acdha · 3 years ago

One thing to remember is that FP performance with well-scheduled code was by far its strongest performing area, and Intel put a lot of work into tuning their compiler for those specific tests. The problem was that it fell off heavily the less your code is like that, especially for the branchy code most business apps depend on.

The other big problem was that the x86 compatibility story was worse than the earlier hopes. That meant that it not only wasn’t competitive with the current generation competition but often even the previous or worse - note losing to the original Pentium or even a 486 here:

https://tweakers.net/reviews/204/8/intel-itanium-sneak-previ...

Now, they could have improved that but statistically nobody was going to pay considerably more for lower performance in the hopes that a future update would improve matters.

The Athlon and Opteron weren’t just fast, they also had flawless 32-bit support so even if your 64-bit software update never happened you could justify the purchase based on their price/performance.

rocket_surgeron · 3 years ago

>I have an Itanium in the garage, a monument to hubris.

I love monuments to hubris so I too have one in my garage-- a four-node SGI Altix 350.

Deleted Comment

Itanium can be regarded as a huge success. Maybe some don't remember, but the minicomputers and unices were where all the money was. Sure, Intel had the process edge, and had completely cornered the microcomputer market. But PCs had slim margins and didn't really matter in the bigger scope important data processing.

Many of the computer nerds watched in awe as vendor after vendor dropped their hugely expensive and engineering heavy custom CPU architectures and lined up behind Intel. IBM was the only big player who didn't swallow the bait. "Even if they fail, that's a huge success" was a common observation at the time.

And sure enough. I don't think they failed on purpose, but business wise it was a win-win situation. The x86 architecture would have won anyway, because of the sheer scale, but the Itanium wreckage hastened it. Everyone needed to move, so why not move to x86/Linux directly?

andromeduck · 3 years ago

Comment I saw on YouTube a while back:

> In some ways Itanium was the most successful bluff every played in the tech industry. In much the same way that Reagan's Star Wars bankrupted the Soviet Union got almost every single competitor to fold. Back at the beginning of the project, Intel was nowhere in high-end & 64-bit computing. There was HP (PA- RISC), Sun (Sparc), Dec (Alpha), IBM (Power), MIPS (SGI). Intel wisely picked the partner with the stupidest management (Carly) to give up their competitive edge and announce to analysts that Intel's vision/roadmap is so awesome that RISC is dead and that they're going to follow the bidding of their master Intel for their 64-bit plan. Wall Street bought in to the story so much that almost everyone else with competitive chips folded their strong hands to Itanium's bluff - SGI spun off MIPS and MIPS decided to leave the high-end space. Compaq undervalued Alpha and let it die. Sun tried to become a software company and if it weren't for Fujitsu making modern sparcs, sparc would be dead.

> Basically, with nothing but PR and Carly's stupidity, Intel wiped out over half of the high-end computing processor market.

> Thankfully AMD had the vision to see through the bluff, and saw the opportunity for 64-bit computing that worked; and thankfully IBM didn't have someone like Carly around so they saw the value in retaining competitive advantages; or the computing world would be pretty bleak place right now..

neilv · 3 years ago

For anyone who doesn't know but is curious, "Carly" refers to Hewlett Packard (HP) CEO Carly Fiorina. Fiorina oversaw HP's acquisition of Compaq, which had previously acquired DEC, IIRC. Reportedly, the HP-Compaq acquisition was opposed by many, including board members and family. (This was before my time, but I read a lot of trade rags as a kid, and then later occasionally heard insider stories.)

HP was legendary for culture, like "management by walking around, and talking to the people on the ground", which was different from Fiorina's style.

Compaq was the most noteworthy IBM-compatible PC company, before Dell's dorm room dirt-cheap generic PC clones business skyrocketed into an empire.

DEC was the maker of the PDPs and VAX-based minicomputers on which much of the field of Computer Science was arguably developed, and later MIPS- and then Alpha-based workstations and servers, while also still developing VAXen (the plural form of the word).

All those proprietary CPU ISAs listed (PA-RISC, SPARC, Alpha, POWER), when they were introduced on engineering/graphics workstation computers, were especially exciting, because -- separate from the technical architecture itself -- they would briefly probably be the fastest workstation in your shop. All of these made MS-DOS/Windows PCs and Macs look like toys by comparison (though, eventually, Windows NT 3.51 started to be semi-credible if you just needed to run a single big-ticket application program). And you didn't know what exciting new development would be next.

Maybe it was like if, today, several makers of top-end gaming GPUs resulted in a leapfrogging on a cadence of every few/several months. And if they had different strengths, and, incidentally, curious exclusive game software to explore. Or like the very recent succession of Stable Diffusion, ChatGPT, etc., and wondering what the next big wow will be, what they've done with it, and what you can do with it.

When I knew some Linux developers working on Itanium, some were already calling it "Itanic". (I didn't read much into the name at the time, because were a lot of joke derogatory names for brands and technologies.) Later, I thought "Itanic" was because it was a huge expensive thing that was doomed to sink. The theory in the TFA sounds like most competing ship companies gave up on their own engine designs when they heard how great the Titanic would be.

mepian · 3 years ago

The HP-Intel joint effort to develop what became the Itanium was announced 5 years Fiorina became the CEO of HP. During that time she was working at AT&T/Lucent and had zero input in HP's strategy.

xorcist · 3 years ago

The comparison to star wars is certainly apt. It doesn't need to work, in the engineering sense, to be useful. Sometimes economics can trump engineering.

An important part of the PR machinery was that by picking up a hot topic from academia, they really got absolutely everyone to talk about VLIW as the next genreration RISC. And everyone already knew that RISC was superior and x86 was a toy, but which also was mostly true at time.

In the end, what won was huge caches and huge OoO pipelines. Linus Torvalds had some strong opinions and well known opinions on this, which turned out to be mostly right.

pavlov · 3 years ago

It’s pretty weird to frame it as “Carly’s stupidity” when all of HP’s competitors mentioned ended up worse. Smacks of misogyny to be honest.

acdha · 3 years ago

It was a humiliating failure, not a success. The trend against those other architectures was already clear: as processor complexity went up, the costs of building them skyrocketed and most of those companies had no plan for the kind of volume you’d need to support them. Don’t forget that applied on both the hardware and software sides: a competitive compiler and optimized libraries were important.

This is why Itanium got traction: everyone knew that you needed volume to stay in the game. IBM had a strategy to get that with Apple & Motorola (PowerPC started in 1991), but HP did not have anything like that for PA-RISC. DEC might have gotten there if they’d had a more aggressive partner for the lower-end Alpha strategy but the merger killed any chance of that.

Since x86 was rising so fast, it might not be clear why Intel got involved. That goes back to the licensing rights: they couldn’t prevent companies like AMD from competing directly with them. Itanium was the attempt to close off that line of competition legally and they were willing to attack their own product margins to do it.

ghaff · 3 years ago

I never got anyone from Intel to actually admit to the licensing angle but it was pretty widely speculated about at the time.

jojobas · 3 years ago

Doesn't quite add up. SGI folded their CPUs (thanks Mr Belluzzo) before Itanium was even released, Sun offered their x86 server about the same time (and kept their SPARCs).

HP was the only casualty to Itanium, but that was self-inflicted.

sliken · 3 years ago

Yes, but the CPU pipeline is years long. So years before Alpha, Sparc, MIPS, PA-Risc and related CPUs had to decide if the R&D for a next generation CPU made sense in the face of the announced Itanium which most believes was going to dominate the server industry.

When the Itanium shipped years late and slower than expected it was too late for any of the competition (except Power) to recover. Granted the x86-64s were ramping up and they would have all had tough competition, even without Itanium.

inkyoto · 3 years ago

To be completely honest, MIPS CPU's had never been known for their speed – not until the release of the 64-bit MIPS architecture anyway when they finally became competitive with other RISC CPU's, but it was a complete ISA redesign.

Sun was in a somewhat similar boat with the SPARC v8 architecture, and they were rather late with UltraSPARC (SPARC v9 ISA). Yet, they managed to hold out longer due to having a switched memory controller and a very wide memory bus, which allowed them to become the best hardware appliance to run the Oracle database (despite being less performant), and divert the cash flow into the UltraSPARC development. UltraSPARC I was underwhelming, and with UltraSPARC II they finally caught up with other RISC vendors and gradually started outperforming some (e.g. MIPS) in some areas.

Amusingly, the 512-bit wide memory bus has made a comeback in Apple M1 Max laptops (laptops!), and M1 Ultra has a 1024-bit wide memory bus.

Why HP went all in with ditching their own perfectly fine PA-RISC 2.0 architecture is an enigma to me tho.

bell-cot · 3 years ago

> ...can be regarded as a huge success...

So long as one puts big, fat "giga-money-losing" and "humiliating" disclaimers on "success", then yes.

Vs. - what if, instead of Itanium, Intel had more-quietly designed and delivered good, high-performance x86-64 CPU's? I'm thinking that, by bottom-line metrics, would have been a vastly more successful business strategy.

dsr_ · 3 years ago

Well, that's what AMD was doing. Intel's pride couldn't let them legitimize that strategy.

Meanwhile, ARM was designing little low-power RISC toys that were obviously no danger to Intel at all.