AMD's Strix Halo under the hood

For me the question is: what does this mean for future of desktop CPUs? High bandwidth unified memory seems very compelling for many applications, but the GPU doesn't have as much juice as a separate unit. Are we going to see more these supposedly laptop APUs finding their way into desktops, and essentially a bifurcation of desktops into APUs and discrete CPU/GPUs? Or will desktop CPUs also migrate to becoming APUs?

Tepix · 6 months ago

iGPUs have been getting ever closer to entry level and even mid-range GPUs.

In addition there's a interest in having a lot of memory for LLM acceleration. I expect both CPUs to get more LLM acceleration capabilities and desktop pc memory bandwidth to increase from its current rather slow dual channel 64bit DDR5-6000 status quo.

We're already hearing the first rumors for Medusa Halo coming in 2026 with 50% more bandwidth than Strix Halo.

Tepix · 6 months ago

The sentence "In addition there's a interest in having a lot of memory for LLM acceleration" was supposed to say "In addition there's a interest in having a lot of memory bandwidth for LLM acceleration" but it's too late to edit it now.

kllrnohj · 6 months ago

> iGPUs have been getting ever closer to entry level and even mid-range GPUs.

Not really closer. igpus got good enough to kill the low the discreet market basically entirely, but they haven't been "getting closer" to discreet cards. Both CPU SoCs and discreet GPUs have access to the same manufacturing nodes and memory technologies, and the simple physical reality is that they can just be bigger and use more power as a separate physical entity, along with memory better optimized for its workloads.

Gravityloss · 6 months ago

This has been the case for decades now.

GPU:s have existed about 30 years. Embedded ones for 20 years or so? Why are the embedded GPU:s always so stunted?

DCKing · 6 months ago

Strix Halo is impressive, but it isn't AMD going all out on the concept. Strix Halo's die area (300mm2 ish) is roughly the same as estimates for Apple's M3 Pro die area. The M3 Max and M3 Ultra are twice or four times the size.

In a next iteration AMD could look into doubling or quadrupling the memory channels and GPU die area like as Apple has done. AMD is already a pioneer in the chiplet technology Apple is also using to scale up. So there's lots of room to grow for even higher costs.

Dead Comment

c2h5oh · 6 months ago

APUs are going to replace low end video cards, because they no longer make economical or technical sense.

Historically those cards had narrow memory bus and about a quarter or less video memory of high end (not even halo) cards from the same generation.

That narrow memory bus puts their max memory bandwidth at a comparable level to desktop DDR5 with 2 DIMMs. At the same time quarter of high end is just 4GB VRAM which is not enough for low details for many games and prevents upscaling/frame gen from working.

From manufacturing standpoint low end GPUs aren't great either - memory controllers, video output and a bunch of other non-compute components don't scale with process node.

At the same time unified memory and bypassing PCIE benefits igpus greatly. You don't have to build an entire card, power delivery, cooler - you just slightly beef up existing ones.

tl;dr; sub-200 dollas GPUs are dead and will be replaced by APUs. I won't be surprised if they will start nibbling at lower mid-range market too in the near future.

rcarmo · 6 months ago

My main gaming rig (for admittedly not very intensive games) has been a 7000 series Ryzen APU with a 780M, and my next one will also be an APU. It makes zero economic sense to build a discrete CPU system for casual gaming, even if I believe that APU prices will be artificially inflated to "cozy up" to low-end discrete GPU prices for a while to maximize profits.

kllrnohj · 6 months ago

> tl;dr; sub-200 dollas GPUs are dead and will be replaced by APUs

that already happened like 5+ years ago. The GT 1030 never got an update, so Nvidia hasn't made an entry level GPU since. Intel kinda did with ARC, but that was almost more a dev board

sambull · 6 months ago

That new 'desktop' from framework appears to be just that with the AMD Ryzen Al Max 385

nrp · 6 months ago

We have both Max 385 and Max+ 395 versions.

Symmetry · 6 months ago

Having a system level cache for low latency transfer of data between CPU and GPU could be very compelling for some applications even if the overall GPU power is lower than a dedicated card. That doesn't seem to be the case here, though?

noelwelsh · 6 months ago

Strix Halo has unified memory, which is the same general architecture as Apple's M series chips. This means the CPU and GPU share the same memory, so there is no need to copy CPU <-> GPU.

phkahler · 6 months ago

>> Are we going to see more these supposedly laptop APUs finding their way into desktops, and essentially a bifurcation of desktops into APUs and discrete CPU/GPUs?

I sure hope so. We could use a new board form factor that omits the GPU slot. Although my case puts the power connector and button over that slot on the back so it's not completely wasted, but the board area is. This has seemed like a good idea for a long time to me.

This can also be a play against nVidia. When mainstream systems use "good enough" integrated GPUs and get rid of that slot, there is no place for nVidia except in high-end systems.

adrian_b · 6 months ago

There is no need for a new board form factor, because they have existed for many decades.

Below the mini-ITX format with a GPU slot, there are 3 standard form factors that are big enough for a full-featured personal computer that is more powerful than most laptops: nano-ITX (120 mm x 120 mm, for 5" by 5" cases; half the area of mini-ITX), 3.5" (from the size of the 3.5 inch HDDs, approximately the same area with nano-ITX, but rectangular instead of square) and the 4" x 4" NUC format introduced by Intel.

With a nano-ITX or 3.5" board you can make a computer not bigger than 1 liter that can ensure a low noise even for a 65 W power dissipation for the CPU+iGPU and that can have a generous amount of peripheral ports, to cover all needs.

Keeping the low noise condition, one could increase the maximum power-dissipation to 150 W for the CPU+iGPU in a somewhat bigger case, but certainly still smaller than 2.5 liter.

I expect that we will see such mini-PCs with Strix Halo, the only question is whether their price would be low enough to make them worthwhile.

The fabrication cost for Strix Halo must be lower than for a combo of CPU with discrete GPU, but the initial offerings with it attempt to make the customer pay more for the benefit of having a more compact system, which for many people will not be enough motivation to accept a higher price.

icegreentea2 · 6 months ago

The bifurcation is already happening. The last few years have seen lots of miniPC/NUC like products being released.

One of (many) factors that were holding back this form factor was the gap in iGPU/GPU performance. However with the frankly total collapse of the low end GPU market in the last 3-4 years, there's a much larger opening for iGPUs.

I also think that within the gaming space specifically, a lot of the chatter around the Steam Deck helped reset expectations. Like if everyone else is having fun playing games at 800p low/medium, then you suddenly don't feel so bad playing at maybe 1080p medium on your desktop.

juancn · 6 months ago

I would love a unified memory architecture, even for external GPUs.

Pay for memory once, and avoid all the copying around between CPU/GPU/NPU for mixed algorithms, and have the workload define the memory distribution.

adra · 6 months ago

Framework made a tiny desktop form factor version with this chip in it, so we'll if it gets much traction (at least among enthusiasts).

I really want LPDDR5X (and future better versions) to become standard on desktops, alongside faster and more-numerous memory controllers to increase overall bandwidth. Why hasn't CAMM gotten anywhere?

I also really want an update to typical form factors and interconnects of desktop computers. They've been roughly frozen for decades. Off the top of my head:

- Move to single-voltage power supplies at 36-57 volts.

- Move to bladed power connectors with fewer pins.

- Get rid of the "expansion card" and switch to twinax ribbon interconnects.

- Standardize on a couple sizes of "expansion socket" instead, putting the high heat-flux components on the bottom side of the board.

- Redesign cases to be effectively a single ginormous heatsink with mounting sockets to accept things which produce heat.

- Kill SATA. It's over.

- Use USB-C connectors for both power and data for internal peripherals like disks. Now there's no difference between internal and external peripherals.

gjsman-1000 · 6 months ago

> Why hasn't CAMM gotten anywhere?

Framework asked AMD if they could use CAMM for their new Framework Desktop.

AMD actually humored the request and did some engineering, with simulations. According to Framework, the memory bandwidth on the simulations was less than half of the soldered version.

This completely defied the entire point of the chip - the massive 256 bit bus ideal for AI or other GPU-heavy tasks, which allows this chip to offer the features it does.

This is also why Framework has apologized for non upgradability, but said it can’t be helped, so enjoy fair and reasonable RAM prices. Previously, it had been speculated that CAMM had a performance penalty, but Framework’s engineer on video saying it was that bad was fairly shocking.

arghwhat · 6 months ago

I do not believe they were asking for CAMM as replacement for soldered RAM, but as an upgrade for DIMMs in desktop.

CAMM is touted as being better than DIMMs when it comes to signal integrity and possible speed. Soldered of course beat any socket, in-package beats any soldered RAM, and on-die beat any external component.

That AMD Strix Halo is unable to maintain signal integrity for any socketed RAM is a Strix Halo problem, not a socket problem. They probably backed themselves a bit into a corner with other parts of the design sacrifying tolerances on the memory side, and it's a lot easier to push motherboard design requirements than redoing a chip.

If this wasn't a Strix Halo issue, then they would have been able to run with socketed memory with a lower memory clock. All CPUs, this one included, has variable memory clocks that could be utilized and perform memory training as even the PCB traces to the chip cause significant signal degradation.

Tuna-Fish · 6 months ago

The problem was specifically routing the 256-bit LPDDR5X out of the chip into the CAMM2 connector. This is hard to do with such a wide bus, because LPDDR5X wasn't originally designed for it.

LPDDR6X is designed for it, and an use CAMM2.

sunshowers · 6 months ago

I'm curious how much the CUDIMM thing Intel is doing, where the RAM has its own clock, can help in the CAMM context. The Zen 4/5 memory controller doesn't support it but a future one might.

simoncion · 6 months ago

> - Move to single-voltage power supplies at 36-57 volts.

Why? And why not 12V? Please be specific in your answers.

> - Get rid of the "expansion card" and switch to twinax ribbon interconnects.

If you want that, it's available right now. Look for a product known as "PCI Express Riser Cable". Given that the "row of slots to slot in stiff cards" makes for nicely-standardized cases and card installation procedures that are fairly easy to understand, I'm sceptical that ditching slots and moving to riser cables for everything would be a benefit.

> - Kill SATA. It's over.

I disagree, but whatever. If you just want to reduce the number of ports on the board, mandate Mini SAS HD ports that are wired into a U.2 controller that can break each port out into four (or more) SATA connectors. This will give folks who want it very fast storage, but also allow the option to attach SATA storage.

> - Use USB-C connectors for both power and data for internal peripherals like disks.

God no. USB-C connectors are fragile as all hell and easy to mishandle. I hate those stupid little almost-a-wafer blades.

> - Standardize on a couple sizes of "expansion socket" instead...

What do you mean? I'm having trouble envisioning how any "expansion socket" would work well with today's highly-variably-sized expansion cards. (I'm thinking especially of graphics accelerator cards of today and the recent past, which come in a very large array of sizes.)

> - Redesign cases to be effectively a single ginormous heatsink with mounting sockets...

See my questions to the previous quote above. I currently don't see how this would work.

wtallis · 6 months ago

Graphics cards have finally converged on all using about the same small size for the PCB. The only thing varying is the size of the heatsink, and due to the inappropriate nature of the current legacy form factor (which was optimized for large PCBs) the heatsinks grow along the wrong dimension and are louder and less effective than they should be.

Deleted Comment

arghwhat · 6 months ago

> Why? And why not 12V? Please be specific in your answers.

Higher voltages improve transmission efficiency, in particularly for connectors, as long as sufficient insulation is easy to maintain. Datacenters are looking at 48V for a reason.

Nothing comes for free though, and it makes for slightly more work for the various buck converters.

> God no. USB-C connectors are fragile as all hell and easy to mishandle. I hate those stupid little almost-a-wafer blades.

They are numerous orders of magnitude more rugged than any internal connector you've used - most of them are only designed to handle insertion a handful of times (sometime connectors even only work once!), vs. ten thousand times for the USB-C connector. In that sense, a locking USB-C connector would be quite superior.

... on that single metric. It would be ridiculously overcomplicated, driving up part costs when a trivial and stupidly cheap connector can do the job sufficiently. Having to run off 48V to push 240W and have no further power budget at all also increase complexity, cost and add limitations.

USB-C is meant for end-user things where everything has to be crammed into the same, tiny connector, where it does great.

wmf · 6 months ago

There's a rumor that future desktops will use LPDDR6 (with CAMMs presumably) instead of DDR6. Of course CAMMs will be slower so they might "only" run at ~8000 GT/s while soldered LPDDR6 will run at >10000.

Tuna-Fish · 6 months ago

LPDDR6 won't go that low, even on CAMM2. The interface is designed for up to 14.4Gbps, with initial modules aiming for 10.6Gbps.

nsteel · 6 months ago

What's the advantage of LPDDR6 Vs DDR6 given the latter will be faster ?