martianlantern (u/martianlantern)

martianlantern commented on DeepSeek-v3.1-Base huggingface.co/deepseek-a... · Posted by u/meetpateltech

Is there any benchmarks and comparisons compared to gpt-oss? I believe it far exceeds gpt oss or even gpt5 otherwise they wounldn't have released it

martianlantern commented on Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training huggingface.co/blog/codel... · Posted by u/codelion

martianlantern · 10 days ago

Hey, really cool work love the idea of focusing on key decision points. I was curious though since confidence can be non monotonic during CoT[1], how does binary search handle cases where there are multiple ups and downs in confidence? It seems like there might be more than one "pivotal" token, so I wonder if there's a plan to support multi-token pivots or use a different approach than binary search?

[1] - https://arxiv.org/abs/2505.14489

martianlantern commented on We Hit 100% GPU Utilization–and Then Made It 3× Faster by Not Using It daft.ai/blog/embedding-mi... · Posted by u/DISCURSIVE

martianlantern · 10 days ago

There's no explanation as to how they achieved that speed up :( it would have been better if they also wrote a post on that

martianlantern commented on A highly performant cache for very small data github.com/jeremytregunna... · Posted by u/tanelpoder

martianlantern · 11 days ago

I’m not familiar with Zig and would appreciate an explanation of how this works. My understanding is that cache behavior is managed by the CPU, and programmers only influence it indirectly through the sequence of instructions (i.e., access patterns). Is that accurate? Also, is this approach specific to Zig, or could it be achieved in C or Rust as well? Thanks

martianlantern commented on SortBench: Benchmarking LLMs based on their ability to sort lists arxiv.org/abs/2504.08312... · Posted by u/wslh

martianlantern · 15 days ago

But why?