Remnant44 (u/Remnant44)

Remnant44 commented on Ask HN: Why hasn't x86 caught up with Apple M series? · Posted by u/stephenheron

ozgrakkurt · 12 days ago

But is the uOp trace cache free? It surely doesn’t magically decode and put stuff in there without cost

Remnant44 · 12 days ago

For sure.. for what it's worth though, I have run across several references to arm also implementing uop caches as a power optimization versus just running the decoders, so I'm inclined to say that whatever it's cost it pays for itself. I am not a chip designer though!

Remnant44 commented on Ask HN: Why hasn't x86 caught up with Apple M series? · Posted by u/stephenheron

Tuna-Fish · 12 days ago

> and the latency is half that of going out to a DRAM slot.

No, it's not. DRAM latency on Apple Silicon is significantly higher than on the desktop, mainly because they use LPDDR which has higher latencies.

Remnant44 · 12 days ago

I was going to mention this as well.

Source: chipsandcheese.com memory latency graphs

Remnant44 commented on Ask HN: Why hasn't x86 caught up with Apple M series? · Posted by u/stephenheron

trashface · 13 days ago

I may be out of date or wrong, but I recall when the M1 came out there was some claims that x86 could never catch up, because there is an instruction decoding bottleneck (instructions are all variable size), which the M1 does not have, or can do in parallel. Because of that bottleneck x86 needs to use other tricks to get speed and those run hot.

Remnant44 · 13 days ago

ARM instructions are fixed size, while x86 are variable. This makes a wide decoder fairly trivial for ARM, while it is complex and difficult for x86.

However, this doesn't really hold up as the cause for the difference. The Zen4/5 chips, for example, source the vast majority of their instructions out of their uOp trace cache, where the instructions have already been decoded. This also saves power - even on ARM, decoders take power.

People have been trying to figure out the "secret sauce" since the M chips have been introduced. In my opinion, it's a combination of:

1) The apple engineers did a superb job creating a well balanced architecture

2) Being close to their memory subsystem with lots of bandwidth and deep buffers so they can use it is great. For example, my old M2 Pro macbook has more than twice the memory bandwidth than the current best desktop CPU, the zen5 9950x. That's absurd, but here we are...

3) AMD and Intel heavily bias on the costly side of the watts vs performance curve. Even the compact zen cores are optimized more for area than wattage. I'm curious what a true low power zen core (akin to the apple e cores) would do.

Remnant44 commented on GitHub Copilot Coding Agent github.blog/changelog/202... · Posted by u/net01

ukuina · 4 months ago

Likely a Five Worlds scenario.

https://www.joelonsoftware.com/2002/05/06/five-worlds/

Remnant44 · 4 months ago

Man, I miss Joel's blog. So much developer wisdom that is still relevant even if aged now.

Remnant44 commented on Link Time Optimizations: New Way to Do Compiler Optimizations johnnysswlab.com/link-tim... · Posted by u/signa11

Remnant44 · 4 months ago

Link time optimization is definitely not new, but it is incredibly powerful - I have personally had situations where the failure to be able to inline functions from a static library without lto cut performance in half.

It's easy to dismiss a basic article like this, but it's basically a discovery that every Junior engineer will make, and it's useful to talk about those too!

Remnant44 commented on xAI's Grok 3 comes to Microsoft Azure techcrunch.com/2025/05/19... · Posted by u/mfiguiere

Analemma_ · 4 months ago

Just speaking for myself here, but my most natural-sounding conversations with people don't involve them launching into rants about white genocide in Africa regardless of conversation context, but maybe I'm setting my bar too high.

Remnant44 · 4 months ago

Just like talking to Grandpa!

Remnant44 commented on Implicit UVs: Real-time semi-global parameterization of implicit surfaces [pdf] baptiste-genest.github.io... · Posted by u/ibobev

Remnant44 · 4 months ago

Great timing on this paper. I actually just started tackling a problem that is essentially exactly what is under discussion here (creating a coherent UV set for implicit geometry), so I'm very looking forward to reading it in depth.

At a first glace through, it seems to be written at a good blend between concept and implementation followthrough, something that notoriously is not always there for CG papers :) And it's also refreshing to read something that is not neuro-AI-generation of this or that for a change!

Remnant44 commented on 21 GB/s CSV Parsing Using SIMD on AMD 9950X nietras.com/2025/05/09/se... · Posted by u/zigzag312

g-mork · 4 months ago

It also appears to be reporting whole-CPU vs. single thread, 1.3 GB/sec is not impressive for single thread perf

Remnant44 · 4 months ago

I mean... A single 9950x core is going to struggle to do more than 16 GB/second of direct mem copy bandwidth. So being within an order of magnitude of that seems reasonable

Remnant44 commented on U.S. Economy Contracts at 0.3% Rate in First Quarter wsj.com/economy/us-gdp-q1... · Posted by u/bko

myflash13 · 4 months ago

Risk. It's called "doing business". Anyone who has built anything great took risks because they had a vision and believed in it. They weren't "rent seeking" and looking for guaranteed returns.

Remnant44 · 4 months ago

You're missing half the equation.

It's Risk vs Reward. Anyone who built something great took huge risks, for a huge return.

There is no return here. At best, you've sunk a ton of capital into a low-profit business that is propped up only by government subsidy.

Remnant44 commented on A cheat sheet for why using ChatGPT is not bad for the environment simonwillison.net/2025/Ap... · Posted by u/edward

spcebar · 4 months ago

How does people using it offset the amount of energy used to train it? If I use three hundred pounds of flour learning to make pizza, the subsequent three hundred pounds of flour I use making delicious pizzas doesn't make the first 300 go away. Am I misunderstanding the numbers?

Remnant44 · 4 months ago

It doesn't make it go away. Using your analogy - if you used 300lb to learn and then only made 10 lb of pizza after that, it would be a pretty poor use of resources.

If you instead went on to produce millions of pizzas for people and 30,000lb of flour, that 300lb you used to learn looks like a pretty reasonable investment.