BeeOnRope (u/BeeOnRope)

BeeOnRope commented on Cloud VM benchmarks 2026 devblog.ecuadors.net/clou... · Posted by u/dkechag

zackify · 6 days ago

I just ran some massive tests on our own CI. I use AMD Turin for this on gcp, which was noted as one of the fastest ones in the article.

The most insane part here is that the AMD EPYC 4565p can beat the turin's used on the cloud providers, by as much as 2x in the single core.

Our tests took 2 minutes on GCP, 1 minute flat on the 4565p with its boost to 5.1ghz holding steady vs only 4.1ghz on the gcp ones.

GCP charges $130 a month for 8vcpus. ALSO this is for SPOT that can be killed at any moment.

My 4565p is a $500 cpu... 32 vcpus... racked in a datacenter. The machine cost under 2k.

i am trying hard to convince more people to rack themselves especially for CI actions. The cloud provider charging $130 / mo for 3x less vcpus you break even in a couple months, it doesn't matter if it dies a few months later. On top of that you're getting full dedicated and 2x the perf. Anyways... glad to see I chose the right cpu type for gcloud even though nothing comes close to the cost / perf of self racking

BeeOnRope · 5 days ago

> The most insane part here is that the AMD EPYC 4565p can beat the turin's used on the cloud providers, by as much as 2x in the single core.

That is ... hard to believe for a CPU-bound task. Do you have any open benchmark which can reproduce that?

BeeOnRope commented on What every compiler writer should know about programmers (2015) [pdf] complang.tuwien.ac.at/kps... · Posted by u/tosh

somat · 25 days ago

Apologies for the flippant one liner, You made a good point and deserve more than that.

On the one hand, having the optimizer save you from your own bad code is a huge draw, this is my desperate hope with SQL, I can write garbage queries and the optimizer will save me from myself.

But... Someone put that code there, spent time and effort to get that machinery into place with the expectation that it is doing something. and when the optimizer takes that away with no hint. That does not feel right either. Especially when the program now behaves differently when "optimized" vs unoptimized.

BeeOnRope · 25 days ago

What I mean is that we look at a function in isolation and see that it doesn't have any "dead code", e.g.,:

  int factorial(int x) {
    if (x < 0) throw invalid_input();
    // compute factorial ...
  }

This doesn't have any dead code in a static examination: at compilation-time, however, this function may be compiled multiple times, e.g., as factorial(5) or factorial(x) where x is known to be non-negative by range analysis. In this case, the `if (x < 0)` is simply pruned away as "dead code", and you definitely want this! It's not a minor thing, it's a core component of an optimizing compiler.

This same pruning is also responsible for the objectionable pruning away of dead code in the examples of compilers working at cross-purposes to programmers, but it's not easy to have the former behavior without the latter, and that's also why something like -Wdead-code is hard to implement in a way which wouldn't give constant false-positives.

BeeOnRope commented on What every compiler writer should know about programmers (2015) [pdf] complang.tuwien.ac.at/kps... · Posted by u/tosh

somat · 25 days ago

That's the problem

BeeOnRope · 25 days ago

Why is that a problem? Inlining and optimization aren't minor aspects of compiling to native code, they are responsible for order-of-magnitude speedups.

My point is that it is easy to say "don't remove my code" while looking at a simple single-function example, but in actual compilation huge portions of a function are "dead" after inlining, constant propagation and other optimizations: not talking anything about C-specific UB or other shenanigans. You don't want to throw that out.

BeeOnRope commented on What every compiler writer should know about programmers (2015) [pdf] complang.tuwien.ac.at/kps... · Posted by u/tosh

rurban · 25 days ago

This was 2015, and we still have no -Wdeadcode, warning of removal of "dead code", ie what compilers think of dead code. If a program writer writes code, it is never dead. It is written. It had purpose. If the compiler thinks this is wrong, it needs to warn about it.

The only dead code is generated code by macros.

BeeOnRope · 25 days ago

Dead code is extremely common in C or C++ after inlining, other optimizations.

BeeOnRope commented on How many registers does an x86-64 CPU have? (2020) blog.yossarian.net/2020/1... · Posted by u/tosh

noelwelsh · a month ago

I'm not sure we're talking abou the same paper. Here's the one I'm referring to:

https://smlnj.org/compiler-notes/k32.ps

E.g. "Our strategy is to pre-allocate a small set of memory locations that will be treated as registers and managed by the register allocator."

There are more recent publications on "compiler controlled memory" that mostly seem to focus on GPUs and embedded devices.

BeeOnRope · 25 days ago

Relevant section:

> Compiler controlled memory: There is a mechanism in the processor where frequently accessed memory locations can be as fast as registers. In Figure 2, if the address of u is the same as x, then the last load μ-op is a nop. The internal value in register r25 is forwarded to register r28, by a process called write-buffer feedforwarding. That is to say, provided the store is pending or the value to be stored is in the write-buffer, then loading form a memory location is as fast as accessing external registers.

I think it over-sells the benefit. Store forwarding is a thing, but it does not erase the cost of the load or store, at least certainly on the last ~20 years of chips and I don't think on the PII (the target of the paper) either.

The load and store still effectively occur in terms of port usage, so the usual throughput, etc, limits apply. There is a benefit in latency of a few cycles. Perhaps also the L1 cache access itself is omitted, which could help for bank conflicts, though on later uarches there were few to none of these so you're left with perhaps a small power benefit.

BeeOnRope commented on How many registers does an x86-64 CPU have? (2020) blog.yossarian.net/2020/1... · Posted by u/tosh

throwaway17_17 · a month ago

Why does having more more registers lead to spilling? I would assume (probably) incorrectly, that more registers means less spill. Are you talking about calls inside other calls which cause the outer scope arguments to be preemptively spilled so the inner scope data can be pre placed in registers?

BeeOnRope · a month ago

More registers leads to less spilling not more, unless the compiler is making some really bad choices.

Any easy way to see that is that the system with more registers can always use the same register allocation as the one with fewer, ignoring the extra registers, if that's profitable (i.e. it's not forced into using extra caller-saved registers if it doesn't want to).

BeeOnRope commented on Converting a $3.88 analog clock from Walmart into a ESP8266-based Wi-Fi clock github.com/jim11662418/ES... · Posted by u/tokyobreakfast

stevenjgarner · a month ago

$3.88 ? Walmart.com uses dynamically variable pricing that includes geographic and user variance - my price is $5.92

https://www.walmart.com/ip/Mainstays-Basic-Indoor-8-78-Black...

BeeOnRope · a month ago

User variance? Any evidence?

BeeOnRope commented on Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust github.com/j178/prek... · Posted by u/fortuitous-frog

BeeOnRope · a month ago

How does prek handle pre-push hooks? I.e. how does it determine the list of modified files.

This is a long standing sore point in pre-commit, see https://github.com/pre-commit/pre-commit/issues/860 and also linked duplicates (some of which are not duplicates).

BeeOnRope commented on Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust github.com/j178/prek... · Posted by u/fortuitous-frog

candiddevmike · a month ago

Why wouldn't I just call the same shell script in CI and locally though? What's the benefit here? All I'm seeing is circular logic.

BeeOnRope · a month ago

If you had a shell script hook, yes you would also run that in CI.

Are you asking what advantage pre-commit has over a shell script?

Mostly just functionality: running multiple hooks, running them in parallel, deciding which hooks to run based on the commit files, "decoding" the commit to a list of files, offering a bunch canned hooks, offering the ability to write and install non-shell hooks in a standard way.