Hey, really cool work love the idea of focusing on key decision points. I was curious though since confidence can be non monotonic during CoT[1], how does binary search handle cases where there are multiple ups and downs in confidence? It seems like there might be more than one "pivotal" token, so I wonder if there's a plan to support multi-token pivots or use a different approach than binary search?
I’m not familiar with Zig and would appreciate an explanation of how this works. My understanding is that cache behavior is managed by the CPU, and programmers only influence it indirectly through the sequence of instructions (i.e., access patterns). Is that accurate? Also, is this approach specific to Zig, or could it be achieved in C or Rust as well? Thanks