cwzwarich (u/cwzwarich)

cwzwarich commented on Typechecker Zoo sdiehl.github.io/typechec... · Posted by u/todsacerdoti

wk_end · 9 days ago

Slightly out of my depth, but per Diaconescu's theorem [0] axiom of choice implies the law of the excluded middle. Does that make Lean non-constructive?

[0] https://en.wikipedia.org/wiki/Diaconescu%27s_theorem

cwzwarich · 9 days ago

Yes, in fact Lean proves the law of the excluded middle using Diaconescu's theorem rather than assuming it as an independent axiom:

https://github.com/leanprover/lean4/blob/ad1a017949674a947f0...

cwzwarich commented on Typechecker Zoo sdiehl.github.io/typechec... · Posted by u/todsacerdoti

tromp · 9 days ago

The CoC implementation includes inductive types, so is this an implementation of CiC, the Calculus of Inductive Constructions, with the same proving power as Lean?

cwzwarich · 9 days ago

Lean’s type theory extends CIC with the (global) axiom of choice, which increases consistency strength over base CIC.

cwzwarich commented on Load-Store Conflicts zeux.io/2025/05/03/load-s... · Posted by u/ashvardanian

Sesse__ · 4 months ago

I don't know if AMD allows more or fewer _situations_, but empirically, I'm seeing a lot of total cycles lost to this on Zen 2 and 3, and much less on the Intel CPUs I've been testing (mostly Skylake derivatives and Alder Lake).

I haven't tested Zen 4 or 5, but I haven't heard anything that indicates they should be a lot better.

cwzwarich · 4 months ago

Interesting! IIRC, the LLVM passes dedicated to dodging this issue were contributed by Intel engineers, so maybe there’s some bias.

cwzwarich commented on Load-Store Conflicts zeux.io/2025/05/03/load-s... · Posted by u/ashvardanian

Sesse__ · 4 months ago

I find Clang generally a bit too eager to combine loads. This is especially bad when returning structs through the stack; you typically write them piecemeal in some function, return, and then the caller often wants to copy it from the stack into somewhere else, which it does with SIMD loads/stores.

This is a significant problem on AMD; Intel and Apple seems to be better.

cwzwarich · 4 months ago

> This is a significant problem on AMD; Intel and Apple seems to be better.

When did this change? In my testing years ago (while I was writing Rosetta 2, so Icelake-era Intel), Intel only allowed a load to forward from a single store, and no partial forwarding (i.e. mixed cache/register) without a huge penalty, whereas AMD at least allowed partial forwarding (or had a considerably lower penalty than Intel).

cwzwarich commented on Less Slow C++ github.com/ashvardanian/l... · Posted by u/ashvardanian

ashvardanian · 4 months ago

I've made a bunch of bad measurements, until someone reminded me to:

  #if defined(__AVX512F__) || defined(__AVX2__)
  void configure_x86_denormals(void) {
      _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);         // Flush results to zero
      _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON); // Treat denormal inputs as zero
  }
  #endif

It had a 50x performance impact on Intel. As benchmarked on `c7i.metal-48xl` instances:

  - `f64` throughput grew from 0.2 to 8.2 TFLOPS.
  - `f32` throughput grew from 0.6 to 15.1 TFLOPS.

Here is that section in the repo with more notes AVX-512, AMX, and other instructions: <https://github.com/ashvardanian/less_slow.cpp/blob/8f32d65cc...>.

cwzwarich · 4 months ago

Intel has always had terrible subnormal performance. It's not that difficult to implement in HW, and even if you still want to optimize for the normalized case, we're talking about a 1 cycle penalty, not an order(s)-of-magnitude penalty.

cwzwarich commented on C and C++ prioritize performance over correctness (2023) research.swtch.com/ub... · Posted by u/endorphine

AlotOfReading · 5 months ago

There's 3 reasonable choices for what to do with unsigned overflow: wraparound, saturation, and trapping. Of those, I find wrapping behavior by far the most intuitive and useful.

Saturation breaks the successor relation S(x) != x. Sometimes you want that, but it's extremely situational and rarely do you want saturation precisely at the type max. Saturation is better served by functions in C.

Trapping is fine conceptually, but it means all your arithmetic operations can now error. That's a severe ergonomic issue, isn't particularly well defined for many systems, and introduces a bunch of thorny issues with optimizations. Again, better as functions in C.

On the other hand, wrapping is the mathematical basis for CRCs, Error correcting codes, cryptography, bitwise math, and more. There's no wasted bits, it's the natural implementation in hardware, it's familiar behavior to students from a young age as "clock arithmetic", compilers can easily insert debug mode checks for it (the way rust does when you forget to use Wrapping<T>), etc.

It's obviously not perfect either, as it has the same problem of all fixed size representations in diverging from infinite math people are actually trying to do, but I don't think the alternatives would be better.

cwzwarich · 5 months ago

> it's the natural implementation in hardware

The natural implementation in hardware is that addition of two N-bit numbers produces an N+1-bit number. Most architectures even expose this extra bit as a carry bit.

cwzwarich commented on The inevitability of the borrow checker yorickpeterse.com/article... · Posted by u/rbanffy

caspper69 · 7 months ago

In my research a few weeks back, I went down the rabbit hole of region based memory management, and Cyclone was one of the first languages I came across (the majority of academic papers on the topic retrofitted an existing language- usually C).

I might be wrong here, so please feel free to correct me if so, but I don't think borrowing was a concept, per se, of the language itself.

As you mention, the concept the Rust designers took from Cyclone was explicit lifetimes.

Borrow checking provides two features (but in my opinion in a very un-ergonomic way): (1) prevention of use after free; and (2) temporal memory safety (i.e. guaranteeing no data races, but not eliminating race conditions in general).

I'm still wobbly on PLT legs though; I'm sure there's a pro or ten who could step in and elaborate.

cwzwarich · 7 months ago

Cyclone had borrowing. See Section 4.4 (Temporary Aliasing) in the paper

https://homes.cs.washington.edu/~djg/papers/cyclone_memory.p...

or the more detailed discussion throughout this journal paper:

https://homes.cs.washington.edu/~djg/papers/cyclone_scp.pdf

As their citations indicate, the idea of borrowing appeared immediately in the application of subtructural logics to programming languages, back to Wadler's "Linear types can change the world!". It's just too painful without it.

cwzwarich commented on Rosetta 2 creator leaves Apple to work on Lean full-time linkedin.com/posts/leonar... · Posted by u/ladberg

mattgreenrocks · 8 months ago

Rosetta 2 is easily one of the most technically impressive things I've seen in my life. I've done some fairly intense work applying binary translation (DynamoRIO) and Rosetta 2 still feels totally magical to me.

cwzwarich · 8 months ago

Thanks. It means a lot coming from someone with experience in our niche field.