The second one is your problem. Haswell is 15 years old now. Almost nobody owns a CPU that old. -O3 makes a lot of architecture-dependent decisions, and tying yourself to an antique architecture gives you very bad results.
https://godbolt.org/z/oof145zjb
So no, Haswell is not the problem. LLVM just doesn't know about the dependency thing.
Still, it's all LLVM, so perhaps unsafe Rust for Fil-space can be a thing, a useful one for catching (what would be) UBs even [Fil-C defines everything, so no UBs, but I'm assuming you want to eventually run it outside of Fil-space].
Now I actually wonder if Fil-C has an escape hatch somewhere for syscalls that it does not understand etc. Well it doesn't do inline assembly, so I shouldn't expect much... I wonder how far one needs to extend the asm clobber syntax for it to remotely come close to working.
https://medicalxpress.com/news/2025-06-acetaminophen-discove...
Perhaps the first approved by FDA, I don't know. In many countries, metamizole is the first-line drug for postoperative pain.
(It should be noted that metamizole may very rarely cause agranulocytosis. It is suspected that the risk varies depending on the genetic makeup of the population, which would explain why it is banned in some countries but available OTC in others.)
Tangential: China technically banned metamizole due to the agranulocytosis scare, but somehow small clinics always have fresh stocks of this stuff. And their stocks don't look like my metamizole for horses! It's pressed out of the usual magnesium stearate instead of whatever rock-hard thing they use for animal drugs in China.
Therer was a time when BLAST-ing a DNA and protein sequence you have is like doing a Google search on it: it simply tells you where the sequence might come from. This is useful especially when your research is to figure out what that specific sequence is doing. It won't give you the answer immediately (otherwise why bother doing the research at all), but it certainly gives context: sequence similarity often hints at similar / related functions.
As an analogy: imagine if StackOverflow is suddenly down and you don't know *if* it's going to be up again.
Even non-cryptographic functions can benefit from the same kinds of threat modeling and carefully attention to selection criteria common in crypto. The risk for failing is that you end up with something as catastrophically flawed as boost's hash_combine [0].
[0] https://www.boost.org/doc/libs/1_70_0/doc/html/hash/referenc...
[]: https://github.com/boostorg/container_hash/commit/40ec854466...
The idea of bitwise reproducibility for floating point computations is completely laughable in any part of the DL landscape. Meanwhile in just about every other area that uses fp computation it's been the defacto standard for decades.
From NVidia not guaranteeing bitwise reproducibility even on the same GPU: https://docs.nvidia.com/deeplearning/cudnn/backend/v9.17.0/d...
To frameworks somehow being even worse. Where the best you can do is order the frameworks in terms of how bad they are - with tensorflow being far down at the bottom and jax being (currently) at the top - and try to use the best one.
This is a huge issue to anyone serious about developing novel models and I see no one talking about it, let alone trying to solve it.
Not that strongly for more parallel things, quite similar to the situation with atomics on cuDNN. cuBLAS for example has a similar issue with multi-stream handling, though this can be overcome with a proper workspace allocation: https://docs.nvidia.com/cuda/cublas/index.html?highlight=Rep....
Still better than cuDNN where some operations just don't have a reproducible version though. The other fields are at least trying. DL doesn't seem to be.
On that note Intel added reproducible BLAS to oneMKL on CPU and GPU last year. https://www.intel.com/content/www/us/en/developer/archive/tr...