Even a laptop can run RAM externally thanks to CXL

This wasn't mentioned in the article but CXL Stands for Compute Express Link which is an open standard, CPU-to-Device Interconnect, I guess they're on v3.0 now. First I had heard of it as well. 3.9GB/s on the low end and as high as 121.0GB/s with the latest 16x 3.0 spec over serial connection

https://www.computeexpresslink.org/about-cxl https://en.wikipedia.org/wiki/Compute_Express_Link

gostsamo · 2 years ago

I'm surprised that it is not known on HN. This is standard developed to enable different server configurations in the data center from what I've read through the years. The universal interconnect allows for adding and removing devices without special allowances for each device type as far as I get it.

It might work for laptops, but it is not the main goal and it might be only Framework who will do something in that direction.

cmrdporcupine · 2 years ago

These days that bulk of us people doing dev work aren't getting to play in the actual DC, it's all so abstracted away. Frankly looking to get a used rack or two at home here, just so I can play with the HW a bit.

(I do embedded and not server stuff mostly these days anyways, but generally interested in systems stuff, so)

hsbauauvhabzb · 2 years ago

IT is an extremely diverse field with extremely diverse subprofessions - knowledge about esoteric hardware should not be assumed.

sva_ · 2 years ago

Is it possible to run an eGPU with this? Like the TH3P4G3 one can buy on AliExpress

gbraad · 2 years ago

This is based on serial PCI Express (PCIe), and allows block based transfer, cache coherence, etc. In the Device Types they talk about "Accelerators with Memory", type 2: https://www.logic-fruit.com/blog/cxl/compute-express-link-cx....

CXL is useful for many more things beyond just memory expansion (which is a "type 3 device").

The purpose of CXL is to allow for memory coherency between different CXL devices. To quote the spec on type 2 devices:

> CXL Type 2 devices, in addition to fully coherent cache, also have memory, for example DDR, High-Bandwidth Memory (HBM), etc., attached to the device. These devices execute against memory, but their performance comes from having massive bandwidth between the accelerator and device-attached memory. The main goal for CXL is to provide a means for the Host to push operands into device-attached memory and for the Host to pull results out of device-attached memory such that it does not add software and hardware cost that offsets the benefit of the accelerator.

There's some cool things you can do with CXL, like resurrecting the whole persistent memory idea with low-latency flash, making hardware offload devices more capable since you now get free cache coherency, and a whole bunch of other stuff.

But yes, it's really not for consumer use-cases. The applications I've seen colleagues work on are mostly enterprise stuff like cool RDMA integrations, cache-coherent flash, and more I can't talk about here.

simcop2387 · 2 years ago

I've got to imagine that at some point in the future when it's been common on server stuff for a while it'll end up as part of a next-gen "thunderbolt" style connection, simply because the silicon by then will be a commodity and it'll be cheaper to use the same thing everywhere. I'm imagining a docking station that doubles the ram, and gives it a powerful GPU/NPU/TPU for a workstation. Since the ram would be close to the accelerator over CXL they can talk directly and make full bandwidth use of it even if the laptop uses the remote ram as a "very very very fast swap" or something else to prioritize things. I'm not so sure it'd be useful for persistent storage at that level but it'd make for some really cool options for a portable workstation since you wouldn't need everything inside the device.

jauntywundrkind · 2 years ago

CXL 3.0 is where it gets interesting, where you can start to have switched fabric where many hosts can talk to many devices. Having that pool of ram & GPUs be on demand usable by whomever has some attractive possibilities. Also, one can just imagine having some pools of data in CXL memory that a host can just attach to & read, which seems like a cool possibility.

It does kind of upset me a bit that CXL 3.0 still seems purely host-to-switch-to-device oriented. If you have your formerly PCIe slots on your cores speaking CXL and doing directory memory over fabric, I'd really really love to be able to talk to other hosts. Maybe that happens & is possible in 3.0, but it feels like CXL isnt paving that cowpath, isn't making is obvious, and that there will be a bunch of proprietary nasty ways to bridge computers & chat over CXL that are all non-standard, & I wish CXL had been more direct about making themselves & their upcoming switched fabric viable & interesting for host-to-host.

gavinray · 2 years ago

CXL for Persistent Memory, as the Optane successor (or rather, sole-survivor if you will) is what excites me.

Persistent Memory databases are such a neat idea, it's a shame that the hardware for it isn't commonplace.

e63f67dd-065b · 2 years ago

Yeah optane seemed like it had so much potential -- with the right abstractions, it neatly dealt with the eternal problem of the separation between RAM and disk. Much of DB and storage system engineering comes down to decisions around when to persist what, and optane was very promising in allowing for new architectures that had much better performance at lower complexity.

Alas, optane's dead now. I do know people actively working on resurrecting a lot of pmem work on low-latency flash, however, and it seems like this is one area with a low of momentum behind it.

tester756 · 2 years ago

>CXL for Persistent Memory, as the Optane successor (or rather, sole-survivor if you will) is what excites me.

wasn't Optane (hardware) trying to use CXL?

jnwatson · 2 years ago

CXL for CPU access to DRAM is overkill. Plenty of server mobos allow you to directly attach 2 TiB of DRAM.

vinay_ys · 2 years ago

Rack level disaggregated compute-memory-storage-accelerator architectures allow for dynamic partitioning/aggregation of hardware to suite concurrent workloads and their evolution over time and make it easier to achieve both cost efficiency as well as incremental, continuous and non-disruptive hardware upgrades.

lazide · 2 years ago

Some 4 and supposedly 6TB now too.

Dead Comment

hadlock · 2 years ago

dmw_ng · 2 years ago

It's been a long day, managed to take a few seconds wondering what this interesting new "best CPUs" technical architecture was before realizing the article is SEO'd blogspam.

Maybe switch to the source article? https://www.servethehome.com/fadu-cxl-2-0-switch-and-pcie-ge...

6th · 2 years ago

Thank you, that's way better.

lagniappe · 2 years ago

> 1TB of RAM

What are you going to do with the other 999,999,360KB?

jasonjmcghee · 2 years ago

If there's a reference I'm missing, enlighten me...

Anything to do with data processing, machine learning, llms- hundreds of gigabytes of ram can be incredibly nice.

Especially as pandas recommends ram equal to 5-10x the size of the dataset.

When you're an individual- not having to think about managing a cluster...

Edit: One of the lucky 10000- famous bill gates joke- got it. I walk away cultured

PolCPP · 2 years ago

You're missing Gates 640KB of ram reference

bagels · 2 years ago

It's a joke based on the fake quote "640k (ram) should be enough for anyone"

RussianCow · 2 years ago

It's the classic Bill Gates quote taken out of context where he says that 640kb ought to be enough for anyone.

Iwan-Zotow · 2 years ago

I think Bill Gates once said 640Kb should be enough for anything

grepfru_it · 2 years ago

999,999,360KB ought to be enough for anybody

LoganDark · 2 years ago

To my eyes, it looks like this article consists almost solely of "external RAM exists".

OK? What laptops even support it? How much capacity is there ("over 1TB" is kind of vague)? So many questions...

mr_toad · 2 years ago

The current devices are full size PCIe. And you need the latest Xeon CPU.

Are these the same Xeons that require a recurring subscription to unlock the silicon you've already purchased?

dvas · 2 years ago

If anyone wishes to dive deeper and attempt some emulation* of CXL and to find out more, checkout both links as QEMU supports emulating CXL devices.

[0] Emulating CXL Shared Memory Devices in QEMU https://memverge.com/cxl-qemuemulating-cxl-shared-memory-dev...

[1] CXL support in QEMU https://www.qemu.org/docs/master/system/devices/cxl.html

arandomhuman · 2 years ago

How's the latency with these vs on board memory?

170-250ns was what they were saying a year ago, which they were saying was 2x native latency. https://www.servethehome.com/compute-express-link-cxl-latenc...

But with Genoa for example, socket to socket latency has climbed up to 220ns, and going across nodes on a socket is 110ns. I feel like CXL will be less than 2x a hit, if only because cores themselves are having higher and higher latencies. https://chipsandcheese.com/2023/07/17/genoa-x-server-v-cache...

noobface · 2 years ago

CXL is a neat tech, but many innovative hardware systems go no where despite hundreds of millions of dollars of work and investment.

I have hopes. I can't help it. Cool shit is inspiring. I, however, am not holding my breath.