owlbite (u/owlbite) - Readit News

owlbite commented on Operation Costs in CPU Clock Cycles (2016) ithare.com/infographics-o... · Posted by u/limoce

bee_rider · 20 days ago

I think it is a totally different type of table. Yours is real data. Theirs is more like a ballpark. Maybe there could be some use for the latter? Just to help folks reason about performance.

Although, reasoning about performance can be hard anyway.

owlbite · 19 days ago

Trying to reduce high end processor performance to "operation X takes Y cycles" likely confuses the uninitiated more than it helps once you get beyond "cache miss bad".

For the uninitiated, most high-performance CPUs of recent years:

- Are massively out-of-order. It will run any operation that has all inputs satisfied in the next slot of the right type available.

- Have multiple functional units. A recent apple CPU can and will run 5+ different integer ops, 3+ load/stores and 3+ floating point ops per cycle if it can feed them all. And it may well do zero-cost register renames on the fly for "free".

- Functional units are pipelined, you can throw 1 op in the front end of the pipe each cycle, but the result sometimes isn't available for consumption until maybe 3-20 cycles later (latency depends on the type of the op and if it can bypass into the next op executed).

- They will speculate on branch results and if they get them wrong it needs to flush the pipeline and do the right thing.

- Assorted hazards may give +/- on the timing you might get in a different situation.

owlbite commented on Fast catherinejue.com/fast... · Posted by u/gaplong

SatvikBeri · a month ago

I've noticed over and over again at various jobs that people underestimate the benefit of speed, because they imagine doing the same workflow faster rather than doing a different workflow.

For example, if you're running experiments in one big batch overnight, making that faster doesn't seem very helpful. But with a big enough improvement, you can now run several batches of experiments during the day, which is much more productive.

owlbite · a month ago

Whenever we make our code faster the users just run bigger models :P.

owlbite commented on Mathematics for Computer Science (2024) ocw.mit.edu/courses/6-120... · Posted by u/vismit2000

owlbite · a month ago

Having "Mathematics for Computer Science" as a course title rubs me the wrong way, I always believed Computer Science was a specialized subfield of Mathematics.

owlbite commented on TODOs aren't for doing sophiebits.com/2025/07/21... · Posted by u/todsacerdoti

owlbite · a month ago

My policy: If it's a real TODO, it includes the ticket number/link tracking the actual fix. Otherwise it should either be a regular comment or not exist.

owlbite commented on Phrase origin: Why do we "call" functions? quuxplusone.github.io/blo... · Posted by u/todsacerdoti

flufluflufluffy · 2 months ago

I think “raise” comes from the fact that the exception propagates “upward” through the call stack, delegating the handling of it to the next level “up.” “Throw” may have to do with the idea of not knowing what to do/how to handle an error case, so you just throw it away (or throw your hands up in frustration xD). Totally just guessing

owlbite · 2 months ago

I suspect it comes from raising flags/signals (literally as one might run a flag up a flag pole?) to indicates CPU conditions, and then that terminology getting propagated from hw to sw.

owlbite commented on The messy reality of SIMD (vector) functions johnnysswlab.com/the-mess... · Posted by u/mfiguiere

exDM69 · 2 months ago

> if you don't describe your code and dataflow in a way that caters to the shape of the SIMD

But when I do describe code, dataflow and memory layout in a SIMD friendly way it's pretty much the same for x86_64 and ARM.

Then I can just use `a + b` and `f32x4` (or its C equivalent) instead of `_mm_add_ps` and `_mm128` (x86_64) or `vaddq_f32` and `float32x4_t` (ARM).

Portable SIMD means I don't need to write this code twice and memorize arcane runes for basic arithmetic operations.

For more specialized stuff you have intrinsics.

owlbite · 2 months ago

So we write a lot of code in this agnostic fashion using typedef's and clang's vector attribute support, along with __builtin_shufflevector for all the permutations (something along similar lines to Apple's simd.h). It works pretty well in terms of not needing to memorize/lookup all the mnemonic intrinsics for a given platform, and letting regular arithmetic operations exist.

However, we still end up writing different code for different target SOCs, as the microarchitecture is different, and we want to maximize our throughput and take advantage of any ISA support for dedicated instructions / type support.

One big challenge is targeting in-order cores the compiler often does a terrible job of register allocation (we need to use pretty much all the architectural registers to allow for vector instruction latencies), so we find the model breaks down somewhat there as we have to drop to inline assembly.

owlbite commented on Airlines are charging solo passengers higher fares than groups thriftytraveler.com/news/... · Posted by u/_tqr3

carabiner · 3 months ago

What if larger sizes of clothes were priced higher, since they use more material? I wear a small in almost every case so wouldn't affect me, but man it'd be nerve wracking for a lot of Americans.

owlbite · 3 months ago

They already are?

owlbite commented on Trump Threatens 25% Tariffs on Apple If iPhones Not Made in US bloomberg.com/news/articl... · Posted by u/impish9208

symlinkk · 3 months ago

Yeah that sounds like a good idea

owlbite · 3 months ago

I mean, we could just throw them in jail and then sell their labor back to giant corporate entities for a fraction of minimum wage.

owlbite commented on Improving Naval Ship Acquisition construction-physics.com/... · Posted by u/Luc

palmotea · 4 months ago

> Ships go through a lot of expensive missiles shooting down cheap drones.

When will we have cheap drones that can take out other cheap drones?

owlbite · 4 months ago

There is a significant asymmetry in the requirements: cheap attack drones only have to succeed once, cheap defense drones have to succeed every time (or intercept sufficiently far out that some more reliable backup can be deployed when they fail).