This is perhaps overly generous to pure-human authorship. These days, when I write code I like to think I know what it does. I still wouldn't call most of it "crafted like poetry". When I was just learning though, I wrote plenty of code 100% without AI (in fairness, it didn't exist) that I had little understanding of, and it was only "deliberate" in that I deliberately cajoled it into passing the tests.
Or put differently: don't conflate human authorship with quality; people can write garbage without needing AI help.
actv = A[_:1] & B[_:1]
sign = A[_:0] ^ B[_:0]
dot = pop_count(actv & !sign) - pop_count(actv & sign)
It can probably be made more efficient by taking a column-first format.
Since we are in CPU land, we mostly deal with dot products that match the cache size, I don't assume we have a tiled matmul instruction which is unlikely to support this weird 1-bit format.
A * x * x * x * x * x * x + B * x * x * x * x + C * x * x + D
(10 muls, 3 muladds)instead of the faster
tmp = x * x;
((A * tmp + B) * tmp + C) * tmp + D
(1 mul, 3 muladds) typedef double v2d __attribute__ ((vector_size (16)));
v2d packed = { x, x };
packed = fma(packed, As, Bs);
packed = fma(packed, Cs, Ds);
// ...
return x * packed[0] + packed[1]
smth like thatActually one project I was thinking of doing was creating SLP vectorized versions of libm functions. Since plenty of programs spend a lot of time in libm calling single inputs, but the implementation is usually a bunch of scalar instructions.
And on the platforms that have a NPU with a usable programming model and good vendor support, the NPU absolutely does get used for those tasks. More fragmented platforms like Windows PCs are least likely to make good use of their NPUs, but it's still common to see laptop OEMs shipping the right software components to get some of those tasks running on the NPU. (And Microsoft does still seem to want to promote that; their AI PC branding efforts aren't pure marketing BS.)
Downside: harder to think about it all.
Upside: a rocket may hit the datacenter.
From what I remember about Figma, it can be proclaimed CRDT. Google Docs got their sync algorithm before CRDT was even known (yep, I remember those days!).
Edit: I had an excerpt here which I completely misread. Sorry.