dosshell (u/dosshell)

dosshell commented on Zig breaking change – Initial Writergate github.com/ziglang/zig/pu... · Posted by u/Retro_Dev

audunw · 2 months ago

This is a standard library change, not a syntax change

I think the main big thing that’s left for 1.0 is to resurrect async/await.. and that’s a huge thing because arguably very few if any language has gotten that truly right.

As the PR description mentions: “This is part of a series of changes leading up to "I/O as an Interface" and Async/Await Resurrection.”

So this work is partially related to getting async/await right. And getting IO right is a very important part of that.

I think it’s a good idea for Zig to try to avoid a Python 3 situation after they reach 1.0. The project seems fairly focused to me, but they’re trying to solve some difficult problems. And they spend more time working on the compiler and compiler infrastructure than other languages, which is also good. Working on their own backend is actually critical for the language itself, because part of what’s holding Zig back from doing async right is limitations and flaws in LLVM

dosshell · 2 months ago

>> because part of what’s holding Zig back from doing async right is limitations and flaws in LLVM

this was interesting! Do you have a link or something to be able to read about it?

dosshell commented on Nobel Prize in Physics awarded to John Hopfield and Geoffrey Hinton [pdf] nobelprize.org/uploads/20... · Posted by u/drpossum

kelahcim · a year ago

Kahneman was awarded Nobel prize in economic sciences even though his work was, in fact, all about psychology.

dosshell · a year ago

Note that: There are no economic science Nobel prize.

Only one similar named price in the name and memory of Alfred Nobel, which some how, is allowed to be part of the Nobel prize celebration.

I guess my opinion is in minority, but i don't like that another prize hijacks the Nobel prize.

dosshell commented on The Performance Impact of C++'s `final` Keyword 16bpp.net/blog/post/the-p... · Posted by u/hasheddan

dosshell · a year ago

I agree with you. It should take the same time when thinking more about it. I remember learning this in ~2016 and I did performance test on Skylake which confirmed (Windows VS2015). I think I remember that i only tested with addsd/addss. Definitely not x87. But as always, if the result can not be reproduced... I stand corrected until then.

dosshell · a year ago

I tried to reproduce it on Ivybridge (Windows VS20122) and failed (mulss and muldd) [0]. single and double precision takes the same time. I also found a behavior where the first batch of iterations takes more time regardless of precision. It is possible that this tricked me last time.

[0] https://gist.github.com/dosshell/495680f0f768ae84a106eb054f2...

Sorry for the confusion and spreading false information.

dosshell commented on The Performance Impact of C++'s `final` Keyword 16bpp.net/blog/post/the-p... · Posted by u/hasheddan

jcranmer · a year ago

Um... no. This is 100% completely and totally wrong.

x86-64 requires the hardware to support SSE2, which has native single-precision and double-precision instructions for floating-point (e.g., scalar multiply is MULSS and MULSD, respectively). Both the single precision and the double precision instructions will take the same time, except for DIVSS/DIVSD, where the 32-bit float version is slightly faster (about 2 cycles latency faster, and reciprocal throughput of 3 versus 5 per Agner's tables).

You might be thinking of x87 floating-point units, where all arithmetic is done internally using 80-bit floating-point types. But all x86 chips in like the last 20 years have had SSE units--which are faster anyways. Even in the days when it was the major floating-point units, it wasn't any slower, since all floating-point operations took the same time independent of format. It might be slower if you insisted that code compilation strictly follow IEEE 754 rules, but the solution everybody did was to not do that and that's why things like Java's strictfp or C's FLT_EVAL_METHOD were born. Even in that case, however, 32-bit floats would likely be faster than 64-bit for the simple fact that 32-bit floats can safely be emulated in 80-bit without fear of double rounding but 64-bit floats cannot.

dosshell · a year ago

I agree with you. It should take the same time when thinking more about it. I remember learning this in ~2016 and I did performance test on Skylake which confirmed (Windows VS2015). I think I remember that i only tested with addsd/addss. Definitely not x87. But as always, if the result can not be reproduced... I stand corrected until then.

dosshell commented on The Performance Impact of C++'s `final` Keyword 16bpp.net/blog/post/the-p... · Posted by u/hasheddan

sgerenser · a year ago

I think this is only true if using x87 floating point, which anything computationally intensive is generally avoiding these days in favor of SSE/AVX floats. In the latter case, for a given vector width, the cpu can process twice as many 32 bit floats as 64 bit floats per clock cycle.

dosshell · a year ago

Yes, as I wrote, it is only true for one float value.

SIMD/MIMD will benefit of working on smaller width. This is not only true because they do more work per clock but because memory is slow. Super slow compared to the cpu. Optimization is alot about cache misses optimization.

(But remember that the cache line is 64 bytes, so reading a single value smaller than that will take the same time. So it does not matter in theory when comparing one f32 against one f64)

dosshell commented on The Performance Impact of C++'s `final` Keyword 16bpp.net/blog/post/the-p... · Posted by u/hasheddan

tombert · a year ago

I don't do much C++, but I have definitely found that engineers will just assert that something is "faster" without any evidence to back that up.

Quick example, I got in an argument with someone a few years ago that claimed in C# that a `switch` was better than an `if(x==1) elseif(x==2)...` because switch was "faster" and rejected my PR. I mentioned that that doesn't appear to be true, we went back and forth until I did a compile-then-decompile of a minimal test with equality-based-ifs, and showed that the compiler actually converts equality-based-ifs to `switch` behind the scenes. The guy accepted my PR after that.

But there's tons of this stuff like this in CS, and I kind of blame professors for a lot of it [1]. A large part of becoming a decent engineer [2] for me was learning to stop trusting what professors taught me in college. Most of what they said was fine, but you can't assume that; what they tell you could be out of date, or simply never correct to begin with, and as far as I can tell you have to always test these things.

It doesn't help that a lot of these "it's faster" arguments are often reductive because they only are faster in extremely minimal tests. Sometimes a microbenchmark will show that something is faster, and there's value in that, but I think it's important that that can also be a small percentage of the total program; compilers are obscenely good at optimizing nowadays, it can be difficult to determine when something will be optimized, and your assertion that something is "faster" might not actually be true in a non-trivial program.

This is why I don't really like doing any kind of major optimizations before the program actually works. I try to keep the program in a reasonable Big-O and I try and minimize network calls cuz of latency, but I don't bother with any kind of micro-optimizations in the first draft. I don't mess with bitwise, I don't concern myself on which version of a particular data structure is a millisecond faster, I don't focus too much on whether I can get away with a smaller sized float, etc. Once I know that the program is correct, then I benchmark to see if any kind of micro-optimizations will actually matter, and often they really don't.

[1] That includes me up to about a year ago.

[2] At least I like to pretend I am.

dosshell · a year ago

> I can get away with a smaller sized float

When talking about not assuming optimizations...

32bit float is slower than 64bit float on reasonable modern x86-64.

The reason is that 32bit float is emulated by using 64bit.

Of course if you have several floats you need to optimize against cache.

dosshell commented on Hidden dependencies in Linux binaries thelittleengineerthatcoul... · Posted by u/thunderbong

qwertox · a year ago

If static and dynamic libraries use the same interface, shouldn't they be detectable in both cases? Or is it removed at compile time?

dosshell · a year ago

First IANACC (I'm not a compiler programmer), but this is my understanding:

What do you mean by interface?

A dynamic library is handled very different from a static one. A dynamic library is loaded into the process virtual memory address space. There will be a tree trace there of loaded libraries. (I would guess this program walks this tree. But there may be better ways i do not know of that this program utilize)

In the world of gnu/linux a static library is more or less a collection of object files. The linker, to my best knowledge, will not treat the content of the static libraries different than from your own code. LTO can take place. In the final elf the static library will be indistinguishable from your own code.

My experience of the symbole table in elf files is limited and I do not know if they could help to unwrap static library dependencies. (A debug symbol table would of course help).

dosshell commented on Hidden dependencies in Linux binaries thelittleengineerthatcoul... · Posted by u/thunderbong

quotemstr · a year ago

We're in this situation because we're using a model of dynamic linking that's decades out of date. Why aren't we using process-isolated sandboxed components talking over io_uring-based low-latency IPC to express most software dependencies? The vast majority of these dependencies absolutely do not need to be co-located with their users.

Consider liblzma: would liblzma-as-a-service really be that bad, especially if the service client and service could share memory pages for zero-copy data transfer, just as we already do for, e.g. video decode?

Or consider React Native: RN works by having an application thread send a GUI scene to a renderer thread, which then adjusts a native widget tree to match what the GUI thread wants. Why do these threads have to be in the same process? You're doing a thread switch anyway to jump from the GUI thread to the renderer thread: is switching address spaces at the same time going to kill you? Especially if the two threads live on different cores and nothing has to "switch"?

Both dynamic linking and static linking should be rare in modern software ecosystems. We need to instead reinvigorate the idea of agent-based component systems with strongly isolated components.

dosshell · a year ago

This is very interesting! Are there any movements to move towards this?

Wouldn't it open up for a new attack vector where process could read each other data?

dosshell commented on Hidden dependencies in Linux binaries thelittleengineerthatcoul... · Posted by u/thunderbong

ris · a year ago

> Meanwhile, when I use CUDA instead of Vulkan, I get serenity back. CUDA FTW!

Just because the complexity is hidden from you doesn't mean it's not there. You have no idea what is statically bundled into the CUDA libs.

dosshell · a year ago

I agree with you, hidden is worse.

But we do know what it can not static link to, any GPL library, which many indirect dependencies are.

dosshell commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

dosshell · a year ago

I know this is maybe not the answer you want, but if you are just interested in getting the job done there exist companies that are experts on this, for example:

https://fortune.com/2024/03/11/adaptive-startup-funding-falc...