AMD Unveils Ryzen 9000 CPUs for Desktop, Zen 5

AVX512 in a single cycle vs 2 cycles is big if the clock speed can be maintained at all near 5GHz. Also doubling of L1 cache bandwidth is interesting! Possibly, needed to actually feed an AVX512 rich instruction stream I guess.

adrian_b · a year ago

For most instructions, both Intel and AMD CPUs with AVX-512 support are able to do two 512-bit instructions per clock cycle. There is no difference between Intel and AMD Zen 4 for most 512-bit AVX-512 instructions.

I expect that this will remain true for Zen 5 and the next Intel CPUs.

The only important differences in throughput between Intel and AMD were for the 512-bit load and store instructions from the L1 cache and for the 512-bit fused multiply-add instructions, where Intel had double throughput in its more expensive models of server CPUs.

I interpret AMD's announcement that now Zen 5 has a double transfer throughput between the 512-bit registers and the L1 cache and also a double 512-bit FP multiplier, so now it matches the Intel AVX-512 throughput per clock cycle in all important instructions.

Aardwolf · a year ago

> There is no difference between Intel and AMD Zen 4 for most 512-bit AVX-512 instructions.

Except for the fact that Intel hasn't had any AVX-512 for years already in consumer CPUs, so there's nothing to compare against really in this target market

tempnow987 · a year ago

The difference is intel chips that support AVX-512 run $1,300 - $11,000 with MUCH higher total system costs whereas AMD actually DOES support AVX-512 on all it's chips and you can get AVX-512 for dirt cheap. The whole intel instruction support story feels garbage here. Weren't they the ones to introduce this whole 512 thing in the first place?

DEADMINCE · a year ago

> The only important differences in throughput between Intel and AMD

Not exactly related, but AMD also has a much better track record when it comes to speculative execution attacks.

xattt · a year ago

I see the discussion of instruction fusion for AVX512 in Intel chips. Can someone explain the clock speed drop?

camel-cdr · a year ago

AVX512 was never over 2 cycles. In Zen4 it used the 256 wide execution units of avx2 (except for shuffle), but there are more then one 256-bit wide execution units, so you still got your one cycle throughput.

dzaima · a year ago

More importantly for the "2 cycles" question, Zen 4 can get one cycle latency for double-pumped 512-bit ops (for the ops where that's reasonable, i.e. basic integer/bitwise arith).

Having all 512-bit pipes would still be a massive throughput improvement over Zen 4 (as long as pipe count is less than halved), if that is what Zen 5 actually does; things don't stop at 1 op/cycle. Though a rather important question with that would be where that leaves AVX2 code.

SomeoneFromCA · a year ago

I wish it had AVX512 Fp16.

api · a year ago

At what point do these become competitive with GPUs for AI cost wise if GPUs retain their nutty price premium?

bloaf · a year ago

I’ve been running some LLMs on my 5600x and 5700g cpus, and the performance is… ok but not great. Token generation is about “reading out loud” pace for the 7&13 B models. I also encounter occasional system crashes that I haven’t diagnosed yet, possibly due to high RAM utilization, but also possibly just power/thermal management issues.

A 50% speed boost would probably make the CPU option a lot more viable for home chatbot, just due to how easy it is to make a system with 128gb RAM vs 128gb VRAM.

I personally am going to experiment with the 48gb modules in the not too distant future.

hajile · a year ago

Does Zen5 do FP math in a single cycle?

dzaima · a year ago

Almost certainly Zen 5 won't have single-cycle FP latency (I haven't heard of anything doing such even for scalar at modern clock rates (though maybe that does exist somewhere); AMD, Intel, and Apple all currently have 3- or 4-cycle latency). And Zen 4 already has a throughput of 2 FP ops/cycle for up to 256-bit arguments.

The thing discussed is that Zen 4 does 512-bit SIMD ops via splitting them into two 256-bit ones, whereas Zen 5 supposedly will have hardware doing all 512 bits at a time.

How are the 24x PCIe 5.0 lanes (~90GB/s) of the 9950X allocated?

The article makes it appear as:

* 16x PCIe 5.0 lanes for "graphics use" connected directly to the 9950X (~63GB/s).

* 1x PCIe 5.0 lane for an M.2 port connected directly to the 9950X (~4GB/s). Motherboard manufacturers seemingly could repurpose "graphics use" PCIe 5.0 lanes for additional M.2 ports.

* 7x PCIe 5.0 lanes connected to the X870E chipset (~28GB/s). Used as follows:

  * 4x USB 4.0 ports connected to the X870E chipset (~8GB/s).

  * 4x PCIe 4.0 ports connected to the X870E chipset (~8GB/s).

  * 4x PCIe 3.0 ports connected to the X870E chipset (~4GB/s).

  * 8x SATA 3.0 ports connected to the X870E chipset (some >~2.4GB/s part of ~8GB/s shared with WiFi 7).

  * WiFi 7 connected to the X870E chipset (some >~1GB/s part of ~8GB/s shared with 8x SATA 3.0 ports).

wtallis · a year ago

These processors will use the existing AM5 socket, so they fundamentally cannot make major changes to lane counts and allocations, only per-lane speeds. They're also re-using CPU's IO die from last generation and re-using the same chipset silicon, which further constrains them to only minor tweaks.

Typical use cases and motherboards give an x16 slot for graphics, x4 each to at least one or two M.2 slots for SSDs, and x4 to the chipset. Last generation and this generation, AMD's high-end chipset is actually two chipsets daisy-chained, since they're really not much more than PCIe fan-out switches plus USB and SATA HBAs.

Nobody allocates a single PCIe lane to an SSD slot, and the link between the CPU and chipset must have a lane width that is a power of two; a seven-lane link is not possible with standard PCIe.

Also, keep in mind that PCIe is packet-switched, so even though on paper the chipset is over-subscribed with downstream ports that add up to more bandwidth than the uplink to the CPU provides, it won't be a bottleneck unless you have an unusual hardware configuration and workload that actually tries to use too much IO bandwidth with the wrong set of peripherals simultaneously.

dhx · a year ago

Thanks for the description. The article was confusing as to whether the CPU's stated 24x PCIe 5.0 lanes included those required for the chipset. Given that the same AM5 socket is used and X870E is similar to the X670E, this appears to not be the case, and instead the 9950X would have 28x PCIe 5.0 lanes, 4 of which are connected to the daisy-chained chipset and 24 then remain available to the motherboard vendor (nominally as 16 for graphics, 8 for NVMe). I also hadn't realised the CPU would offer 4x USB 4.0 ports directly.

Block diagram for AM5 (X670E/X670): https://www.techpowerup.com/review/amd-ryzen-9-7950x/images/...

Block diagram for AM4 (X570): https://www.reddit.com/r/Amd/comments/bus60i/amd_x570_detail...

adrian_b · a year ago

Cheap small computers with Intel Alder Lake N CPUs, like Intel N100, allocate frequently a single PCIe lane for each SSD slot.

However you are right that such a choice is very unlikely for computers using AMD CPUs or Intel Core CPUs.

lmz · a year ago

It's usually x16/x4/x4 for GPU/M.2/Chipset. You can check the diagrams from the current x670 boards for info.

wtallis · a year ago

AMD's previous socket was usually x16/x4/x4 for GPU/M.2/chipset. For AM5, they added another four lanes, so it's usually x16/x4+x4/x4 for GPU/(2x)M.2/chipset, unless the board is doing something odd with providing lanes for Thunderbolt ports or something like that.

paulmd · a year ago

Siena would make a very practical HEDT socket - it's basically half of a bergamo, 6ch DDR5/96x pcie 5.0. It's sort of an unfortunate artifact of the way server platforms have gone that HEDT has fizzled out, they're mostly just too big and it isn't that practical to fit into commodity form-factors anymore, etc. a bigass socket sp3 and 8ch was already quite big, now it's 12ch for SP5 and you have a slightly smaller one at SP6. But still, doing 1DPC in a commodity form factor is difficult, you really need an EEB sort of thing for things like GENOAD8X etc let alone 2 dimms per channel etc, which if you do like a 24-stick board and a socket you don't fit much else.

https://www.anandtech.com/show/20057/amd-releases-epyc-8004-...

2011/2011-3/2066 were actually a reasonable size. Like LGA3678 or whatever as a hobbyist thing doesn't seem practical (the W-3175X stuff) and that was also 6ch, and Epyc/TR are pretty big too etc. There used to exist this size-class of socket that really no longer gets used, there aren't tons of commercial 3-4-6 channel products made anymore, and enthusiast form-factors are stuck in 1980 and don't permit the larger sockets to work that well.

The C266 being able to tap off IOs as SAS3/12gbps or pcie 4.0 slimsas is actually brilliant imo, you can run SAS drives in your homelab without a controller card etc. The Asrock Rack ones look sick, EC266D4U2-2L2Q/E810 lets you basically pull all of the chipset IO off as 4x pcie 4.0x4 slimsas if you want. And actually you can technically use MCIO retimers to pull the pcie slots off, they had a weird topology where you got a physical slot off the m.2 lanes, to allow 4x bifurcated pcie 5.0x4 from the cpu. 8x nvme in a consumer board, half in a fast pcie 5.0 tier and half shared off the chipset.

https://www.asrockrack.com/general/productdetail.asp?Model=E...

Wish they'd do something similar with AMD and mcio preferably, like they did with the GENOAD8X. But beyond the adapter "it speaks SAS" part is super useful for homelab stuff imo. AMD also really doesn't make that much use of the chipset, like, where are the x670E boards that use 2 chipsets and just sling it all off as oculink or w/e. Or mining-style board weird shit. Or forced-bifurcation lanes slung off the chipset into a x4x4x4x4 etc.

https://www.asrockrack.com/general/productdetail.asp?Model=G...

All-flash is here, all-nvme is here, you just frustratingly can't address that much of it per system, without stepping up to server class products etc. And that's supposed to be the whole point of the E series chipset, very frustrating. I can't think of many boards that feel like they justify the second chipset, and the ones that "try" feel like they're just there to say they're there. Oh wow you put 14 usb 3.0 10gbps ports on it, ok. How about some thunderbolt instead etc (it's because that's actually expensive). Like tap those ports off in some way that's useful to people in 2024 and not just "16 sata" or "14 usb 3.0" or whatever. M.2 NVMe is "the consumer interface" and it's unfortunately just about the most inconvenient choice for bulk storage etc.

Give me the AMD version of that board where it's just "oops all mcio" with x670e (we don't need usb4 on a server if it drives up cost). Or a miner-style board with infinite x4 slots linked to actual x4s. Or the supercarrier m.2 board with a ton of M.2 sticks standing vertically etc. Nobody does weird shit with what is, on paper, a shit ton of pcie lanes coming off the pair of chipsets. C'mon.

Super glad USB4 is a requirement for X870/X870E, thunderbolt shit is expensive but it'll come down with volume/multisourcing/etc, and it truly is like living in the future. I have done thunderbolt networking and moved data ssd to ssd at 1.5 GB/s. Enclosures are super useful for tinkering too now that bifurcation support on PEG lanes has gotten shitty and gpus keep getting bigger etc. An enclosure is also great for janitoring M.2 cards with a simple $8 adapter off amazon etc (they all work, it's simple physical adapater).

vardump · a year ago

I think that decision is ultimately made by the mainboard vendor.

AMD Ryzen 9 7950X (16 core) 560.8 Apple M2 Ultra (24 cores) 501.82 Apple M3 Max (12 cores) 408.27 Apple M3 Pro 226.46 Apple M3 160.58

buildbot · a year ago

irusensei · a year ago

I'll probably wait one or two years before getting into anything with DDR5. I've blew some money on an AMD laptop in 2021. At the time it was a monster with decent expansion options: RX 6800m, Ryzen 9 5900HX. I've stuck it with maximum 64GB DDR4 and 2x 4TB psi 3.0 nvme. Runs Linux very well.

But now I'm seeing lots of things I'm locked out. Faster ethernet standards, the fun that brings with tons of GPU memory (no USB4, can't add 10Gbe either), faster and larger memory options, AV1 encoding. It's just sad that I bought a laptop right before those things were released.

Should had go with a proper PC. Not doing this mistake anymore.

isoos · a year ago

It sounds like you need a desktop workstation with replaceable extension cards, and not a mostly immutable laptop, which has different strengths.

Agreed but it will need to wait for now.

worthless-trash · a year ago

You will find that this is the cost of any laptop, any time you buy it there is always new tech around the corner and there isn't much you can do about it.

simcop2387 · a year ago

(disclosure I own a 13in one)

Yea closest I see to being better about it is Frame.work laptops, and even then it's not as good a story as desktops, just the best story for upgrading a laptop right now. Other than that buying one and making sure you have at least two thunderbolt (or compatible) ports on separate busses is probably the best you can do since that'd mean two 40Gb/s links for expansion even if it's not portable, but would let you get things like 10GbE adapters or fast external storage and such without compromising too much on capability.

kmfrk · a year ago

Waiting for CAMM2 to get wider adoption could be interesting:

https://x.com/msigaming/status/1793628162334621754

Hopefully won't be too long now.

Delmololo · a year ago

Either it was a shitty investment from the beginning or you actually use it very regularly and it would be worth it anyway to slowly thinking about something new.

gautamcgoel · a year ago

Surprisingly not that much to be excited about IMO. AMD isn't using TSMC's latest node and the CPUs only officially support DDR5 speeds up to 5600MHz (yes, I know that you can use faster RAM). The CPUs are also using the previous-gen graphics architecture, RDNA2.

mrweasel · a year ago

> AMD isn't using TSMC's latest node

Staying on an older node might ensure AMD the production capacity they need/want/expect. If they had aimed for the latest 3nm then they'd have get in line behind Apple and Nvidia. That would be my guess, why aim for 3nm, if you can't get fab time and you're still gaining a 15% speed increase.

arvinsim · a year ago

TBH, CPUs nowadays are mostly good enough for the consumer, even at mid or low tiers.

It's the GPUs that are just getting increasing inaccessible, price wise.

michaelt · a year ago

Yes - with more and more users moving to laptops and wanting a longer battery life, raw peak performance hasn't moved much in a decade.

A decade ago, Steam's hardware survey said 8GB was the most popular amount of RAM [1] and today, the latest $1600 Macbook Pro comes with.... 8GB of RAM.

In some ways that's been a good thing - it used to be that software got more and more featureful/bloated and you needed a new computer every 3-5 years just to keep up.

[1] https://web.archive.org/web/20140228170316/http://store.stea...

MenhirMike · a year ago

Given that there are only 2 CUs in the GPU (and fairly low clock speeds), does the architecture matter much? Benchmarks were kinda terrible, and it looks to me that the intent of the built-in GPU is for hardware video encoding or to run it in a home server system, or emergency BIOS and the like. Compared to the desktop CPUs, even the lowest end mobile 8440U has 4 CUs, going up to 12 CUs on the higher end. Or go with Strix Point, which does have an RDNA 3.5 GPU (with 12 or 16 CUs) in it.

I guess you _can_ game on those 2 CU GPUs, but it really doesn't seem to be intended for that.

spixy · a year ago

Better efficiency with three external 4K 120Hz monitors?

cschneid · a year ago

Yeah, I'm glad they started including built in gpu so there's something there, but beyond booting to a desktop I wouldn't use this graphics for anything else. But if you're just running a screen and compiling rust, that's all you need. Or in my case, running a home server / NAS.

AlfeG · a year ago

> DDR5 speeds up to 56000MHz (yes, I know that you can use faster RAM)

Not sure that I actually CAN. 56 GHz is already a lot.

Fixed, thanks :)

re-thc · a year ago

> The CPUs are also using the previous-gen graphics architecture, RDNA2

Faster GPU is reserved for APUs. These graphics are just here for basic support.

diffeomorphism · a year ago

Nah, you can get RDNA3.5 if you want to (not sure why you want that in a (home)server though)

https://www.anandtech.com/show/21419/amd-announces-the-ryzen...

Arrath · a year ago

Well perhaps I will stop holding out and just get the 7800x3d, if the 9000 generation won't be too terribly groundbreaking.

doikor · a year ago

> The CPUs are also using the previous-gen graphics architecture, RDNA2.

The GPU on these parts is there mostly for being able to boot into BIOS or OS for debugging. Basically when things go wrong and you want to debug what is broken (remove GPU from machine and see if things work)

matharmin · a year ago

These are decent GPUs for anything other than heavy gaming. I'm driving two 4k screens with it, and even for some light gaming (such as factorio) it's completely fine.

moooo99 · a year ago

Hard disagree on that one. I am daily driving an RDNA2 graphics unit for 1.5 years now and it’s absolutely sufficient. I mostly do office work and occasionally play Minecraft. It’s absolutely sufficient for that and I don’t see any reason why you‘d want to waste money on a dGPU for that kind of load

nickjj · a year ago

Another advantage of having an integrated GPU is you can do a GPU pass-through and let a VM directly and fully use your dedicated GPU.

This could be a thing if you're running native Linux but some games only work on Windows which you run in a VM instead of dual booting.

ffsm8 · a year ago

I have to disagree. They work great for video playback and office work. So media server, and workstations are fine without a dedicated gpu

ubercore · a year ago

> The GPU on these parts is there mostly for being able to boot into BIOS or OS for debugging.

That's wildly not true. Transcoding, gaming, multiple displays, etc. They are often used as any other GPU would be used.

TacticalCoder · a year ago

Not at all. I drive a 38" monitor with the iGPU of the 7700X. If you don't game and don't run local AI models it's totally fine.

And... No additional GPU fans.

My 7700X build is so quiet it's nearly silent. I can barely hear it's Noctua NH-12S cooler/fan ramping up when under full load and that's how it should be.

JonChesterfield · a year ago

They also mean you can drive monitors using the builtin GPU while using dedicated ones for compute.

btgeekboy · a year ago

Yeah - I've been waiting to see what this release would entail as I kinda want to build a SFF PC. But now that I know what's in it, and since they didn't come out with anything really special chipset-wise, I'll probably just see if I can get some current-get stuff at discounted prices during the usual summer sales.

aurareturn · a year ago

It's because x86 chips are no longer leading in the client. ARM chips are. Specifically, Apple chips. Though Qualcomm has huge potential leapfrog AMD/Intel chips in a few generations too.

KingOfCoders · a year ago

[If you're a laptop user, scroll down the thread for laptop Rust compile times, M3 Pro looks great]

You're misguided.

Apple has excellent Notebook CPUs. Apple has great IPC. But AMD and Intel have easily faster CPUs.

https://opendata.blender.org/benchmarks/query/?compute_type=...

Blender Benchmark

It depends on what you're doing.

I'm a software developer using a compiler that 100%s all cores. I like fast multicore.

      Apple Mac Pro, 64gb, M2 Ultra, $7000
      Apple Mac mini, 32gb, M2 Pro, 2TB SSD, $2600

[Edit2] Compare to: 7950x is $500 and a very fast SSD is $400, fast 64gb is $200, very good board is $400 so I get a very fast dev machine for ~$1700 (0,329 p/$ vs. mini 0,077 p/$)

[Edit] Made a c&p mistake, the mini has no ultra.

krasin · a year ago

Until ARM has a proper UEFI support, it's not a practical desktop/server with a few notable exceptions (Mac, Raspberry Pi) and only because there's so much support from the respective vendors.

I know that there's some work happening about UEFI+ARM (https://developer.arm.com/Architectures/Unified%20Extensible...), but its support is very rare. The only example I can recall is Ampere Altra: https://www.jeffgeerling.com/blog/2023/ampere-altra-max-wind...

mmaniac · a year ago

AMD seem to be playing it safe this with this desktop generation. Same node, similar frequencies, same core counts, same IOD, X3D chips only arriving later... IPC seems like the only noteworthy improvement here. 15% overall is good but nothing earth shattering.

The mobile APUs are way more interesting.

> 15% overall is good but nothing earth shattering

Interestingly though the 9700X seems to be rated at 65W TDP (compared to a 105 TDP for the 7700X). I run my 7700X in "eco mode" where I lowered the TDP to max 95 W (IIRC, maybe it was 85 W: I should check in the BIOS).

So it looks like it's 15% overall more power with less power consumption.

AMD used to give decently accurate TDP, but Intel started giving unrealistically optimistic TDP ratings, so AMD joined the game. Their TDP is more of an aspiration than a reality (ironically, Intel seems to have gotten more accurate with their new base and turbo power ratings).

9700x runs 100MHz higher on the same process as the 7700x. If they are actually running at full speed, I don't see how 9700x could possibly be using less power with more transistors at a higher frequency. They could get lower power for the same performance level though if they were being more aggressive about ramping down the frequency (but it's a desktop chip, so why would they?).

vegabook · a year ago

Geekbench Ryzen 9 7950x is 2930 max, so if we're generous and give the 9950x 15% uplift we'll be at 3380, which is still 400 points or so behind apple silicon for a much higher clock speed and a multiplier larger power draw. Also the max memory bandwidth at 70GB/s or so is basically pathetic, trounced by ASi.

You're comparing apples and oranges. Ryzen has never had the lead with 1T performance, but emphasises nT and core counts instead. Memory bandwidth is largely meaningless for desktop CPUs but matters a lot for a SoC with a big GPU.

Strix Halo appears to be AMD's competitor to Apple SoCs which will feature a much bigger iGP and much greater memory bandwidth. When we hear more about that, comparisons will be apt.

(Not a gamer)

For me as a developer Geekbench Clang benchmarks:

    M2 Ultra   233.9 Klines/sec
    7950x      230.3 Klines/sec  
    14900K     215.3 Klines/sec  
    M3 Max     196.5 Klines/sec

> which is still 400 points or so behind apple silicon for a much higher clock speed and a multiplier larger power draw

Not a fair comparison. If we're on about Geekbench as per the announcement, it's +35%. The 15% is a geomean. It might not be better but definitely not far off Apple.

In a similar manner, except Geekbench the geomean of M3 vs M4 isn't that great either.

ArtTimeInvestor · a year ago

Do all CPUs and GPUs these days involve components made by TSMC?

If so, is this unique - that a whole industry of relies on one company?

danieldk · a year ago

Arguably that single company is ASML. There are more fabs (e.g. Intel), but AFAIK cutting-edge nodes all use ASML EUV chip fabrication machines?

Cu3PO42 · a year ago

Intel still fabs their own CPUs, their dedicated Xe graphics are made by TSMC, though.

Nvidia 30-series was fabbed by Samsung.

So there is some competition in the high-end space, but not much. All of these companies rely on buying lithography machines from ASML, though.

Wytwwww · a year ago

>Intel still fabs their own CPUs

Isn't Lunar Lake made by TSMC? Supposedly they have comparable efficiency to AMD/Apple/Qualcomm at the cost of making their fab business even less profitable

unwind · a year ago

As far as I know, Intel is still very much a fab company.

pjc50 · a year ago

This is probably a lot more common than you might think. How much of an "entire industry", or indeed industry as a whole, relies on Microsoft?

icf80 · a year ago

wait till china invades

brokencode · a year ago

Good thing TSMC is getting billions of dollars of government subsidies to build fabs all around the world including in the US and Japan.

preisschild · a year ago

As long as Biden wins the election, they won't.

Because the US will defend Taiwan.

gattr · a year ago

I normally rebuild my workstation every ~4 years (recently more for fun than out of actual need for more processing power), might finally do it again (preferably a recent 8-/12-core Ryzen). My most recent major upgrade was in 2017 (Core i5 3570K -> Ryzen 7 1700X), with a minor fix in 2019 (Ryzen 7 2700, since 1700X was suffering from the random-segfaults-during-parallel-builds issue).

Night_Thastus · a year ago

Same. I'm on a 10700k and thinking on an upgrade. I'll wait for X3D parts to come out (assuming they're doing that this gen, not sure if we've got confirmation), and compare vs 15th gen Intel once it's out in like September-ish.

tracker1 · a year ago

Might be worth considering a Ryzen 9 5900XT (just launched as well) for a drop in upgrade. Been running a 5950X since close to launch and still pretty happy with it.

mananaysiempre · a year ago

Would it really be smart to build an AM4 desktop at this point though?

ComputerGuru · a year ago

Interesting unveil; it seems everyone was expecting more significant architectural changes though they still managed to net a decent IPC improvement.

In light of the "very good but not incredible" generation-over-generation improvement, I guess we can now play the "can you get more performance for less dollars buying used last-gen HEDT or Epyc hardware or with the newest Zen 5 releases?" game (NB: not "value for your dollar" but "actually better performance").