Readit News logoReadit News
jepler · 8 years ago
g-d those are some deceptive graphs! Take a look at the slide titled "Pushing the performance envelope". A 16% improvement (LMBench memcpy) is displayed so that it looks like a 164% improvement (size of bar increased from 50px to 132px)!
microcolonel · 8 years ago
Yeah, I noticed that immediately. It's kinda sad, too, given how the numbers are actually pretty good to begin with.

Why was the marketing department so ashamed about double digit percentage improvements in decent benchmarks? Do they think their customers are too stupid to appreciate that? Too stupid to notice the charts?

snvzz · 8 years ago
Might have to do with the pressure they're feeling from the unavoidable: RISC-V replacing ARM.
ant6n · 8 years ago
This is an obvious troll, but I'd like to point out that if risc-v where to become big, ARM could probably come up with one of the best implementations.
cm2187 · 8 years ago
Stupid question: if ARM keeps adding aggressively instructions and co-processors, how long before it becomes blotted and power hungry like the x86 architecture? Isn't its simplicity the strength of the ARM platform?
Symmetry · 8 years ago
Co-processors don't matter for power consumption if they aren't being used because you can always just turn off the power to those areas of the chip. People talk about a coming area of "dark silicon" where transistors keep getting cheaper without getting more efficient. That means we'll keep adding more and more specialized structures to chips which are very fast and power efficient when in use compared to a general purpose CPU but each sits there dark and silent most of the time.

Back in the day the advantages of RISC were that since everybody had small teams of designers you could spend less time implementing instructions and more time optimizing and adding features like pipelining. And also that RISC, when introduced, could be squeezed onto a single chip whereas CISCs needed multiple chips which resulted in big speed penalties. These days everybody has humongous teams and much better tooling and more transistors than you can shake a stick at. The complex decoding of x86 instructions gives you a 5-10% power penalty on big cores (and a larger one on small cores). And also the strickter memory ordering requirements of x86 might be a disadvantage, I've heard contradictory things. But in general the performance differences between ISAs is pretty small at the moment.

brigade · 8 years ago
No. All ISAs add instructions that make sense for some important use case and can be executed efficiently (generally with a throughput of at least 1 per cycle.) x86's "power hungry" issues are basically that determining how many bytes an instruction takes is a significant fraction of fully decoding it, thus the existence of the µop cache. And to a much lesser extent, partial register accesses. The former ARM has already shown that a simpler scheme can work well (thumb-2), and the latter aarch64 explicitly fixed over armv7 and isn't going to regress now that everyone knows that it's an issue.

Verification is more expensive for x86 sure, but any CPU approaching the complexity of modern OoOE monsters isn't going to be drastically cheaper to verify just because of the ISA.

Terribledactyl · 8 years ago
To add to the sibling comments, it might be better to think of ARM less like intel, and more like lego for building processors. Standard kits with a sensible collections, but you can always just ask for the IP blocks and diy. If a vendor discovers something isn't working for their application, they can turn it off, build a core without it, tell the compiler not to use that feature set. It's very modular.

additionally: x86 grew up on devices that were basically always plugged in. ARM has a huge install base thats significantly power or heat constrained.

johansch · 8 years ago
I think this issue has been overblown. The x86 ISA isn't necessarily more power hungry per 'computational unit' than the ARM ISA (except for some lower-end cases):

https://www.extremetech.com/extreme/188396-the-final-isa-sho...

In particular: https://www.extremetech.com/wp-content/uploads/2014/08/Avera...

pedroaraujo · 8 years ago
It's not like ARM randomly slaps features just for the fun of it. These features are carefully modeled and studied before released into the market.
Aissen · 8 years ago
Of course every design team wants to think it works this way. But the big/little mess (at the OS level), and the fact that Apple is beating them at their own game (like Qualcomm was, at the time) is a proof that it's not that simple.
kevin_thibedeau · 8 years ago
Intel does the same. Didn't prevent them from producing P4 or Itanium.
psi-squared · 8 years ago
It's worth noting that, if you need something which runs on really low power, ARM have their R and M series processors. So even if the A series did become really power-hungry, the other two lines presumably wouldn't.
pja · 8 years ago
Eg, the processor in the BBC MicroBit pulls about 10mA @ 2.5V or so. Of course, it only runs at 32MHz or so & has a whopping 16kbit of RAM, but you can run one off a pair of AA batteries for a week or two without ever sleeping.
dbancajas · 8 years ago
if it wants to reach x86 level of performance then the answer is yes. in fact, it could be less efficient simply because x86 has two decades of optimizations going for it. you notice they keep comparing against previous gen and not the state of the art x86 performance numbers??
Symmetry · 8 years ago
I'd heard about them allowing heterogeneous clusters ahead of time and I was wondering how that would work. Private L2s should do it, you really need to design the L2 to match the profile of the supported CPU(s) but with L3 the latencies are higher and there's less need to be specialized.
jumpkickhit · 8 years ago
Nice write-up.

Also I was curious when consumers might see this in their products, last line in the article says late 2017/early 2018.

DCKing · 8 years ago
If current trends continue, Huawei (through their subsidiary HiSilicon) will be the first to launch a product with the new ARM IP.

The Cortex A72 was announced in April 2015, Huawei launched the Kirin 950 (4x Cortex A72 + 4x Cortex A53) in November 2015 as part of the Huawei Mate 8. The Cortex A73 was announced in May 2016, Huawei launched the Kirin 960 (4x Cortex A73 + 4x Cortex A53) in November 2016 as part of the Huawei Mate 9.

So yeah, my guess is on the Huawei Mate 10 with a Kirin 970 (4x Cortex A75 + 4x Cortex A55) in November this year. The market has become very iterative and predictable, and this ARM announcement confirms it. Don't actually give Huawei any money for this hardware though, their software update policies are horrible. The more interesting and useful implementations will come from Samsung, Qualcomm and maybe Nvidia in 2018.

mtgx · 8 years ago
Cortex-A75 really seems like a chip designed to enter the PC market (Chromebooks/Windows on ARM), slow and steady.