AWS X1 instances – 1.9 TB of memory

This is probably a dumb question, but what does the hardware of such a massive machine look like? Is it just a single server box with a single motherboard? Are there server motherboards out there that support 2 TB of RAM, or is this some kind of distributed RAM?

zokier · 10 years ago

For example Dell sells 4U servers straight out of their webshop which max out at 96x32GB (that's 3TB) of RAM with 4 CPUs (max 18 cores/CPU => 72 cores total). They seem to have some (training?) videos on youtube that show the internals if you are curious:

https://www.youtube.com/watch?v=vS47RVrfBvE main system board

https://www.youtube.com/watch?v=_poMPOUGRa0 memory risers

schlarpc · 10 years ago

Don't know what hardware AWS is using, but Ark has server boards supporting 1.5TB, which is close enough to make 2TB believable: http://ark.intel.com/products/94187/Intel-Server-Board-S2600...

Edit: Supermicro has several 2TB boards, and even some 3TB ones: http://www.supermicro.com/products/motherboard/Xeon1333/#201...

(Disclaimer: AWS employee, no relation to EC2)

yuhong · 10 years ago

This would require expensive 64GB DDR4 LR-DIMMs though.

technologia · 10 years ago

We have some supermicros that have about 12TB RAM, but the built in fans sound like a jumbo jet taking off so consider the noise pollution for a second there.

jsmthrowaway · 10 years ago

Er, are you summing a TwinBlade chassis? You have to be.

6TB is about where single machines currently top out due to the hardware constraints of multiple vendors and architecture, and memory bandwidth starts being an issue. You have to throw 96x64GB at the ones that exist so wave buh bye to a cool half a million USD or so. If you're sitting on a 12TB box I want a SKU (I want one!).

I don't actually think Supermicro makes a 6TB SKU, even. That's Dell and HP land.

cbg0 · 10 years ago

> Are there server motherboards out there that support 2 TB of RAM

Sure, http://www.supermicro.com/products/motherboard/Xeon/C600/X10... supports 3TB in a 48 x 64GB DIMM configuration.

ereyes01 · 10 years ago

Once upon a time I hacked on the AIX kernel which ran on POWER hardware (I think they're up to POWER8 or higher now). In my time there the latest hardware was POWER7-based. It maxed out at 48 cores (with 4-way hyperthreading giving you 192 logical cores) and a max of I think 32TB RAM. Not the same hardware as mentioned in the OP, but pretty big scale nonetheless.

This shows a logical diagram of how they cobble all these cores together: http://www.redbooks.ibm.com/abstracts/tips0972.html?Open

I've seen these both opened up and racked up. They are basically split into max 4 rackmount systems, each I think was 2U IIRC. The 4 systems (max configuration) are connected together by a big fat cable, which is the interconnect between nodes in the Redbook I've linked above. The RAM was split 4 ways among the nodes, and NUMA really matters in these systems, since memory local to your nodes is much faster to access than memory across the interconnect.

This is what I observed about 5-6 years ago. I'm sure things have miniaturized further since then...

dekhn · 10 years ago

yeah, sure, you can get a quad xeon 2U server with 2TB of RAM for around $40K. Here's a sample configurator: https://www.swt.com/rq2u.php change the RAM and CPUs to your preference and add some flash.

rconti · 10 years ago

No insight into what Amazon uses, but we've got HP DL980s (g7s, so they're OLD) with 4TB of RAM) and just started using Oracle x5-8 x86 boxes with 6TB of RAM 8 sockets. I believe 144 cores/288 threads.

eip · 10 years ago

http://www.thinkmate.com/system/rax-xt24-4460-10g

4 CPU, 60 cores, 120 threads (cloud cores), 3TB RAM, 90TB SSD, 4 x 40GB Ethernet, 4 RU. $120K.

Same price as the AWS instance for one year of on demand.

rodgerd · 10 years ago

I can stick 1.5 TB and two sockets in blades right now. Blades. Servers can carry a lot more, amd it's not even especially expensive.

lovelearning · 10 years ago

Yeah, just realized my knowledge of server hardware is hopelessly outdated. They seem to be a couple of orders of magnitude more powerful than what I assumed was available.

zymhan · 10 years ago

4 physical CPUs and 1.9TB of RAM is doable in a 4U server for sure, and possibly in a 2U. So, it just looks like a big server.

lossolo · 10 years ago

Intel processor support up to 1536 GB of ram so basically 1.5 TB per processor.

wyldfire · 10 years ago

How flipping awesome is it that some very large portion (90% or so?) could probably all be one nice contiguous block of mine from x86_64 userspace with a quick mmap() and mlockall().

rzzzt · 10 years ago

I think I have picked this up from an earlier thread discussing huge servers: http://yourdatafitsinram.com/

One of the links on the top points to a server with 96 DIMM slots, supporting up to 6 TB of memory in total.

mbesto · 10 years ago

IDK about AWS, but for SAP HANA, this is done via blades. I've seen 10 TB+.

KSS42 · 10 years ago

My guess is that it is not really DRAM but flash memory on a DIMM like this product form Diablo Technology:

http://www.diablo-technologies.com/memory1/

fra · 10 years ago

Your guess is wrong. It's DRAM plain and simple.

Finally, an instance made for Java!

granos · 10 years ago

I dislike developing in Java. I am not a fanboy by any stretch of the imagination. That being said, someone who takes the time to understand how the JVM works and how to configure their processes with a proper operator's mindset can do amazing things in terms of resource usage.

It's easy to poke at Java for being a hog when in reality its just poor coding and operating practices that lead to bloated runtime behavior.

placeybordeaux · 10 years ago

For a long time I wondered if it was a failing of the language or the culture.

After spending 4 days trying to diagnose a problem with hbase given the two errors "No region found" and "No table provided" and finally figuring out it was due to a version mismatch I now believe it is the culture.

At the very least you should be printing a WARN when you connect to an incompatible version.

Kristine1975 · 10 years ago

So much this. Back in 2001 I used IntelliJ IDEA on a PC with 128MB of RAM. It worked perfectly, and it was the first IDE I used that checked my code while I was writing it. The much less evolved JBuilder on the other hand stopped every couple seconds for garbage collection.

Both were written in Java.

And don't get me started on Forte (developed by Sun itself, no less). It was even slower and more memory-hungry than JBuilder.

abraae · 10 years ago

I love Java. We shifted from c++ a year after it arrived on the scene. Since then, I've never needed to learn a new language in any depth. To me, that's a good thing and shows the longevity of the language.

yongjik · 10 years ago

> ...can do amazing things in terms of resource usage.

Sorry, but you just made my day. :P

sievebrain · 10 years ago

You jest, but think about how unbelievably painful it'd be to write a program that uses >1TB of RAM in C++ .... any bug that causes a segfault, div by zero, or really any kind of crash at all would mean you'd have to reload the entire dataset into RAM from scratch. That's gonna take a while no matter what.

You could work around it by using shared memory regions and the like but then you're doing a lot of extra work.

With a managed language and a bit of care around exception handling, you can write code that's pretty much invincible without much effort because you can't corrupt things arbitrarily.

Also, depending on the dataset in question you might find that things shrink. The latest HotSpots can deduplicate strings in memory as they garbage collect. If your dataset has a lot of repeated strings then you effectively get an interning scheme for free. I don't know if G1 can really work well with over 1TB of heap, though. I've only ever heard of it going up to a few hundred gigabytes.

Kristine1975 · 10 years ago

>With a managed language and a bit of care around exception handling, you can write code that's pretty much invincible without much effort because you can't corrupt things arbitrarily.

The JVM has crashed on me in the past (as in hard crash, not a Java exception). Less often than the C++ programs I write do? Yes, but I of course I wouldn't test a program on a 1TB dataset before ironing out all the kinks.

>The latest HotSpots can deduplicate strings in memory as they garbage collect

Obviously when working with huge datasets I would implement some kind of string deduplication myself. Most likely even a special string class an memory allocation scheme optimized for write-once, read-many access and cache friendliness.

Or I would use memory mapping for the input file and let the OS's virtual memory management sort it out.

0xfaded · 10 years ago

mmap is not "a lot of extra work".

tosseraccount · 10 years ago

Use shared memory.

scaleout1 · 10 years ago

When you suddenly realize that your "big" data is not really that big!. Who needs a Hadoop/Spark cluster when you can run one of these bad boys

tracker1 · 10 years ago

That was kind of my thought as well... I worked on a small-mid sized classifieds site (about 10-12 unique visitors a month on average) and even then the core dataset was about 8-10GB, with some log-like data hitting around 4-5GB/month. This is freakishly huge. I don't know enough about different platforms to even digest how well you can even utilize that much memory. Though it would be a first to genuinely have way more hardware than you'll likely ever need for something.

IIRC, the images for the site were closer to 7-8TB, but I don't know how typical that is for other types of sites, and caching every image on the site in memory is pretty impractical... just the same... damn.

samstave · 10 years ago

Heh, but I wonder what the default per account limits are on launching these... prolly (1) per account.

saosebastiao · 10 years ago

All I can think about is the 30 minute garbage collection pauses.

osi · 10 years ago

Solved: https://aws.amazon.com/marketplace/pp/B014ULFHQ6/ref=sp_mpg_...

stcredzero · 10 years ago

Actually, as far as VMs go, the JVM is fairly spare in comparison with earlier versions of Ruby and Python -- on a per object basis. (Because of its Smalltalk roots. Yes, I had to get that in there. Drink!) That said, I've seen those horrors of cargo-cult imitation of the Gang of Four patterns, resulting in my having to instantiate 7 freaking objects to send one JMS message.

If practice in recent decades has taught us anything, it's that performance is found in intelligently using the cache. In a multi-core concurrent world, our tools should be biased towards pass by value, allocation on the stack/avoiding allocating on the heap, and avoiding chasing pointers and branching just to facilitate code organization.

EDIT: Or, as placybordeaux puts it more succinctly in a nephew comment, "VM or culture? It's the culture."

EDIT: It just occurred to me -- Programming suffers from a worship of Context-Free "Clever"!

Whether or not a particular pattern or decision is smart is highly dependent on context. (In the general sense, not the function call one.) The difficulty with programming, is that often context is very involved and hard to convey in media. As a result, a whole lot of arguments are made for or against patterns/paradigms/languages using largely context free examples.

This is why we end up in so many meaningless arguments akin to, "What is the ultimate bladed weapon?" That's simply a meaningless question, because the effectiveness of such items is very highly dependent on context. (Look up Matt Easton on YouTube.)

The analogy works in terms of the degree of fanboi nonsense.

aaronkrolik · 10 years ago

A small word of caution: I'd strongly recommend against using a huge java heap size. Java GC is stop the world, and a huge java heap size can lead to hour long gc sessions. It's much better to store data in a memory mapped file that is off heap, and access accordingly. Still very fast.

Xorlev · 10 years ago

Good advice. Even with G1GC it's hard to run heaps that large. However, not to be overly pedantic, Java GC has many different algorithms and many avoid STW collection for as long as possible and do concurrent collection until it's no longer possible. I don't think it's fair to just call it stop the world.

tracker1 · 10 years ago

I know that you are probably going to be modded into oblivion, but can Java address this much memory in a single application? I'm genuinely curious, as I would assume, depending on the OS that you'd have to run several (many) processes in order to even address that much ram effectively.

Still really cool to see something like this, I didn't even know you could get close to 2TB of ram in a single server at any kind of scale.

fulafel · 10 years ago

Bigger iron has been at 64-512 TB for a while:

http://www.cray.com/blog/the-power-of-512-terabytes-of-share...

http://www.enterprisetech.com/2014/10/06/ibm-takes-big-workl...

Or significantly higher if you don't restrict yourself to single-system-image, shared memory machines - there are at least 2 1300-1500 TB systems on the Top 500 list.

wmfiv · 10 years ago

Not using the out of the box solutions. But while I haven't done this personally my understanding is Azul Zing will allow you to efficiently use multi TB heaps in Java.

astral303 · 10 years ago

Java can address 32GB heaps with compressedoops flag enabled. After that flag is off, you can address as much as 64 bits will allow. http://stackoverflow.com/questions/2093679/max-memory-for-64...

Do a little research before implying that there's no way that Java can address gigantic heaps.

0xmohit · 10 years ago

and Scala too.

Scala _beats_ Java in most of the benchmarks: http://benchmarksgame.alioth.debian.org/u64q/scala.html

igouy · 10 years ago

> _beats_ Java

Not according to that data!