Japan Captures TOP500 Crown with Arm-Powered Supercomputer

pininja · 6 years ago

The link is working for me, here are the details on the winner.

“The new top system, Fugaku, turned in a High Performance Linpack (HPL) result of 415.5 petaflops, besting the now second-place Summit system by a factor of 2.8x. Fugaku, is powered by Fujitsu’s 48-core A64FX SoC, becoming the first number one system on the list to be powered by ARM processors. In single or further reduced precision, which are often used in machine learning and AI applications, Fugaku’s peak performance is over 1,000 petaflops (1 exaflops). The new system is installed at RIKEN Center for Computational Science (R-CCS) in Kobe, Japan.

timClicks · 6 years ago

Wow, so we've hit exascale. I heard claims that we would reach exascale by 2020 in 2012. I didn't believe them.

zekrioca · 6 years ago

0.5 exascale

throwaway_pdp09 · 6 years ago

There seems to be a difference between peak and useful. Be careful of marketing and hype.

(I am not an HPC guy, just IMO).

theaustinseven · 6 years ago

It goes further than that. HPL is already a bad benchmark since it just prioritizes the narrow requirements of HPL(double precision multiplication and bandwidth). HPCC(https://icl.utk.edu/hpcc/) is generally regarded to be a better benchmark of the real value of a particular cluster for scientific use.

Animats · 6 years ago

The end of the US semiconductor industry is now in sight.

The only US owned state of the art fabs in the US belong to Intel. Intel survives because they have a high margin on x86 CPUs. Today, TMSC announced 5nm, and the top supercomputer is ARM-based.

Apple seems to be going ARM. Chromebooks are ARM. Microsoft now offers Windows on ARM, on the Surface Pro X. Mobile never used x86. x86 is on the way out. What's left for Intel?

(Micron is still a major force in DRAM, amazingly.)

floatboth · 6 years ago

> The only US owned state of the art fabs in the US belong to Intel

Is the "US owned" clarification to exclude Global Foundries' New York fab? :D

> Chromebooks are ARM

Maybe half of them.

> What's left for Intel?

Fabricating others' designs like TSMC?

But also, Intel isn't going away any time soon, just not being a monopoly anymore.

Animats · 6 years ago

Global Foundries New York fab (Fab 8), from Wikipedia:

Technology: 28 nm and 14 nm. 7 nm planned. However, in August 2018, GlobalFoundries made the decision to suspend 7 nm development and planned production, citing the unaffordable costs to outfit Fab 8 for 7 nm production. GlobalFoundries held open the possibility of resuming 7 nm operations in the future if additional resources could be secured.

So, not a state of the art fab. Couldn't afford to keep up.

btian · 6 years ago

I can't disagree more. The US semiconductor industry is more vibrant than ever.

Intel reported record quarter every quarter for the last 2-3 years.

Fabless semiconductors are doing better than ever - nVidia, AMD, Apple, Qualcomm, Google TPU etc.

Most of the high performance ARM SoCs come from Apple and Qualcomm, both American companies.

IanCutress · 6 years ago

>Intel reported record quarter every quarter for the last 2-3 years.

Due to high demand on people needing 30%+ more processing power after they lost 30% perf due to Spectre/Meltdown. When they get the manufacturing and supply up to a resonable standard, they'll start building inventory and start selling cheaper chips again. Margin and ASP will decrease, and investors will sign out

rasz · 6 years ago

Commodore reported 7-year record revenue/profit 4 years before going bankrupt. https://dfarq.homeip.net/commodore-financial-history-1978-19...

tom_mellior · 6 years ago

> What's left for Intel?

According to https://en.wikipedia.org/wiki/Usage_share_of_operating_syste..., about 80% to 90% of the desktop and laptop computer market (counting the share of Windows + Linux devices in these categories). Intel won't starve.

freeflight · 6 years ago

Which is the result of AMD not having been able to compete for close to a decade.

Expect to see these numbers change drastically over the next years as Zen 2 finally turned the ship around on that by not only making AMD CPUs competitive, but in many cases the straight up better, yet still more affordable, choice.

Which is already reflected in current trends: Barely any consumer-level hardware outlets still recommend Intel builds, which is down to lack of PCIe 4.0 support and only very expensive Intel CPUs being able to outperform AMD CPUs in fringe-use cases like single-core performance in gaming, while still demanding a hefty price-premium.

A premium that many people are simply not willing to pay for anymore.

As a small data point just look at the top 10 CPUs on price comparison websites, like German pcgameshardware [0]: 8 out of the top 10 CPUs are all AMD.

Which will not mean that Intel will starve, but it very much puts them into the position that AMD has been in these past years, that of the underdog fighting an uphill battle to regain relevancy in the consumer sector.

[0] https://preisvergleich.pcgameshardware.de/?o=4

pankajdoharey · 6 years ago

You see Intel is a 50 yr old company, you think they will sit hand in hand on their bums? If the majority Industry shifts towards ARM ISA Intel will evolve, What stops Intel from Licensing the ARM Core and build a industry leading ARM Chip? I think no-one with the exception of Apple in the semiconductor industry has more resources than Intel to build a world class ARM CPU. Intel is just trying to drag x86 as far as possible because it can monopolise the architecture only AMD and Via are other two vendors who have license to build x86 processors.

Animats · 6 years ago

What stops Intel from Licensing the ARM Core and build a industry leading ARM Chip?

That others can compete directly on price. Intel can probably do it technically, but will not have the margins they had with x86.

harpratap · 6 years ago

> I think no-one with the exception of Apple in the semiconductor industry has more resources than Intel to build a world class ARM CPU

Are you forgetting Amazon's Graviton? And Nuvia seems to be in a good position to dethrone Apple in best ARM CPU race too.

sitharus · 6 years ago

Intel already license the ARM core IP and used to build their own ARM chips under the XScale brand. They sold the line off in 2007 IIRC but retained the ARM architectural license.

The only thing that stops them is their will to do it.

Lio · 6 years ago

> What's left for Intel?

Well I'd like to see them go all in on Desktop Linux.

But then I'm a dreamer.

mey · 6 years ago

What would that look like? Shifting to becoming a software developer and leading the charge a Desktop Linux? A vertical integrator like Apple, Microsoft Surface, System 76 (yes those are varying degrees of success)?

I really like the NUC products. They have the knowledge and skills to do everything (cpu, ram, soc, radio/wifi/cell (or used too), storage) in house but industrial design (maybe they do). I have never personally experienced a software experience from Intel I have ever remotely enjoyed.

Edit: Actually there is software I have used from Intel that I have really enjoyed, the BIOS for the NUC. So I stand corrected.

ganzuul · 6 years ago

The EUV machines are made in the US.

rasz · 6 years ago

Netherlands is 7000km off your guess.

Jimmy Kimmel Can You Name a Country? strikes again https://www.youtube.com/watch?v=kRh1zXFKC_o

gnufx · 6 years ago

Perhaps it's worth pointing out some context. Given the remarkable predecessor, K Computer, this was only a matter of time. (I heard a great early talk on K, and I wish I knew the speaker for credit who was obviously working quite hard in English, but flawless, ending with basically we did it all ourselves largely de nuovo.) It seems that given the current circumstances, they haven't kept to schedule -- it was supposed to be operating next year.

There's a lot non-mainstream in this, like K, but partly influenced by K experience. Unusually, it's all apparently specifically designed for the job, from the processor to the operating system (only partly GNU/Linux). Notably, despite the innovation, it should still run anything that can reasonably be built for aarch64 straight off and use the whole node, even if it doesn't run particularly fast; contrast GPU-based systems. (With something like simde, you may even be able to run typical x86-specific code.) However, the amount of memory/core is surprising -- even less than Blue Gene Q -- and I wonder how that works out for large-scale materials science work for which it's obviously prepared. Also note Fujitsu's consideration of reliability, though the oft-quoted theory of failure rates in exascale-ish machines was obviously wrong, otherwise as the Livermore CTO said, he'd be out of a job.

The bad news for anyone potentially operating a similar system in a university, for instance, is that the typical nightmare proprietary software is apparently becoming available for it...

jabl · 6 years ago

> However, the amount of memory/core is surprising

I think it's a limitation of the technology. HBM2 provides amazing bandwidth, but capacity is quite limited. And it's not like DIMM slots where you can just insert more of them, the memory chips are bonded to the substrate chiplet-style.

This is very similar to high-end GPU's which also use HBM2 memory, e.g. NVIDIA A100 has 40 GB.

guicho271828 · 6 years ago

FYI, https://github.com/fujitsu/A64FX

m_mueller · 6 years ago

Maybe you mean Matsuoka sensei? He's the director of RIKEN AICS since a couple of years and a known media figure in Japan.

jabl · 6 years ago

Satoshi Matsuoka is an "international rock star" in HPC circles. But I don't think he was involved with the K computer; before RIKEN he was IIRC at Tokyo Tech doing their "Tsubame" GPU clusters.

ViralBShah · 6 years ago

Given that today's HPC architectures are mostly power constrained, and a majority of the FLOPS often come from GPUs (for their flop/watt ratios), this direction is not surprising.

ARM has been making major strides in the high performance area. The new AWS Graviton processors are pretty nice from what I have heard. And then there's the ARM in Mac. Yup and Julia will run on all of these!

While I say all of this, I should also point out that the top500 benchmark pretty much is not representative of most real-life workloads, and is largely based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

(The website is down, so I haven't been able to look at the specs of the actual machine).

fhqghds · 6 years ago

Get ready for a surprise then: all those FLOPS are coming from the ARM cores.... This beast has no GPUs:

https://postk-web.r-ccs.riken.jp/spec.html

Merrill · 6 years ago

It looks like this is not an ARM core, but a Fujitsu implementation of the Arm v8-A instruction set and Fujitsu-developed Scaleable Vector Extension. Most likely the latter is doing all the heavy lifting.

https://www.fujitsu.com/global/about/resources/news/press-re...

>A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance.

leeter · 6 years ago

So looking at anandtech's breakdown the CPUs are closer to a knights landing 'CPU/GPU' than a traditional CPU (currently). They also have a ton of HBM2 right next to the dies so this should be insanely fast as they can feed those cores very very quickly regardless of how fast each core is by clock and pipeline. That should massively reduce stalls.

ViralBShah · 6 years ago

That's pretty cool! That probably means that applications will have an easier time. Looks like it has 512-bit SIMD.

I wonder what BLAS they are using, and if the contributions are open sourced.

d_tr · 6 years ago

I am really happy to have come across this post, mainly due to this fact.

stephencanon · 6 years ago

Worth noting that Fugaku has no GPU/accelerator; all the compute is located in-core (cpu). The core itself has some GPU-like qualities, of course, since it's more optimized for semi-uniform compute throughput than a "normal" CPU is.

gpderetta · 6 years ago

Fujitsu has been building its own HPC CPUs, for a long time, whether they use the ARM architecture or SPARC doesn't probably matter much for them. They know how to make them fast.

bashinator · 6 years ago

Yup, one of my first jobs out of college was at HAL.

calaphos · 6 years ago

> While I say all of this, I should also point out that the top500 benchmark pretty much is not representative of most real-life workloads, and is largely based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

They also publish the HPCG benchmark with sparse matrixes. And unsurprisingly an order of magnitude lower flops across the board. The Fujitsu chip scales a whole lot better than the usual Nvidia GPUs though.

Symmetry · 6 years ago

I'll count myself as someone surprised, given that GPUs are often better tuned to HPC code, that Fujitsu was able to do so well with an Intel Phi approach of just using larger vector units on general purpose CPUs. I wouldn't have thought you could make an out of order core efficiently support scatter/gather the way this thing seems to, though I guess it's possible that the vector unit is in order. Well, the proof is in the pudding and hats off to Fujitsu and ARM.

wenc · 6 years ago

> based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

Sounds right.

I was going to say what about large-scale optimization problems? But I realized that most typically only require sparse linear solves.

Gradient descent does require the solution of dense Ax=b systems. But the most visible/popular application of large-scale gradient descent today, neural networks, typically use SGD which require no dense linear solves at all.

tasogare · 6 years ago

> And then there's the ARM in Mac.

Are you posting from the future or referring to the T2 chips?

kohtatsu · 6 years ago

It was officially announced at the end of the WWDC event today.

dman · 6 years ago

Really wish Fujitsu sold a developer kit with an A64fx chip - its the only shipping ARM chip with SVE that I know of and I would love to get my hands on one to play with.

vt240 · 6 years ago

There are some architecture manuals on github to peruse.

https://github.com/fujitsu/A64FX

loudmax · 6 years ago

No kidding!

I don't have any sense of how much these cost to manufacture. There ought to be a market for a A64fx based rackmount server system. If the price isn't outrageous, I'd love to see these sold as an SBC.

timthorn · 6 years ago

Something like the Fujitsu PRIMEHPC FX700?

gpderetta · 6 years ago

> its the only shipping ARM chip with SVE

IIRC the extension was specifically designed by Fujitsu for their needs, so it is not surprising.

throwaway5792 · 6 years ago

Not quite, although they were deeply involved for much of it.

gnufx · 6 years ago

I'm pretty sure there's an emulator, which is how you usually do early development.

gnufx · 6 years ago

See https://github.com/RIKEN-RCCS/riken_simulator

calaphos · 6 years ago

Even more impressive than the linpack result (2.8x faster than the runner up) is the HPCG result at 4.6x the result of summit in second place.

That benchmark consists of more sparse matrixes which are a lot more realistic depiction of hpc workloads. Seems to scale a lot better with irregular access patterns than basically Nvidia GPUs on the other systems.

ksec · 6 years ago

Well the link is dead. But I am guessing it is from Fujitsu A64FX, a 512 bit SIMD extension for ARM.

Edit: Turns out I was right. May be this link is better.

https://www.anandtech.com/show/15869/new-1-supercomputer-fuj...

floatboth · 6 years ago

To be clear, the extension -- Scalable Vector Extensions -- is for any width between 128 bits and 2048 bits. (It's in the name!) The implementation in Fujitsu A64FX seems to be 512 bit specifically.

Symmetry · 6 years ago

For those interested in more details they did a presentation at Hot Chips. The slides are here:

https://www.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujits...