Readit News logoReadit News
martianlantern commented on DeepSeek-v3.1-Base   huggingface.co/deepseek-a... · Posted by u/meetpateltech
martianlantern · 8 days ago
Is there any benchmarks and comparisons compared to gpt-oss? I believe it far exceeds gpt oss or even gpt5 otherwise they wounldn't have released it
martianlantern commented on Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training   huggingface.co/blog/codel... · Posted by u/codelion
martianlantern · 10 days ago
Hey, really cool work love the idea of focusing on key decision points. I was curious though since confidence can be non monotonic during CoT[1], how does binary search handle cases where there are multiple ups and downs in confidence? It seems like there might be more than one "pivotal" token, so I wonder if there's a plan to support multi-token pivots or use a different approach than binary search?

[1] - https://arxiv.org/abs/2505.14489

martianlantern commented on We Hit 100% GPU Utilization–and Then Made It 3× Faster by Not Using It   daft.ai/blog/embedding-mi... · Posted by u/DISCURSIVE
martianlantern · 10 days ago
There's no explanation as to how they achieved that speed up :( it would have been better if they also wrote a post on that
martianlantern commented on A highly performant cache for very small data   github.com/jeremytregunna... · Posted by u/tanelpoder
martianlantern · 11 days ago
I’m not familiar with Zig and would appreciate an explanation of how this works. My understanding is that cache behavior is managed by the CPU, and programmers only influence it indirectly through the sequence of instructions (i.e., access patterns). Is that accurate? Also, is this approach specific to Zig, or could it be achieved in C or Rust as well? Thanks

u/martianlantern

KarmaCake day38July 26, 2025View Original