Readit News logoReadit News
boroboro4 commented on NeurIPS 2025 Best Paper Awards   blog.neurips.cc/2025/11/2... · Posted by u/ivansavz
energy123 · 14 days ago
I am not sure how to interpret the first paper's results.

If we use a random number generator then we will converge to 100% correct answers under pass@n in the limit.

A random number generator will eventually outperform or match all models (for large n) whenever top-p is less than 1 because the other models will most likely have some level of bias that makes correct CoTs mathematically impossible due to the tokens being too improbable and being filtered out by top-p, meaning that other models will asymptote to below 100% while the RNG will reach 100% in an almost surely sense.

Under this paper's logic doesn't that mean that the random number generator is a superior reasoner?

boroboro4 · 13 days ago
To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).
boroboro4 commented on Several core problems with Rust   bykozy.me/blog/rust-is-a-... · Posted by u/byko3y
anon-3988 · 25 days ago
> Arc<Mutex<Box<T>>> is complex

This is not the language problem, but its simply the nature of the problem no? It is like saying, full adders are complex, can't we design something simpler? No, full adders are the way they are because addition in binary is complicated.

What you are saying is that this problem is not your kind of problem, which is fine. Not everyone needs to face the complexity of optimizing full adders. And so we created abstractions. The question is, how good is that abstraction?

C++ is like using FP math to do binary addition.

boroboro4 · 25 days ago
Probably if you use a lot of Arc<Mutex<Box<T>>> languages with proper runtime (like Go or Java) are gonna be more performant, in the end they are built with those abstractions in mind. So the question isn’t only how much the nature of the problem it is, but also how common the problem is, and is rust a correct way to solve this problem.
boroboro4 commented on America is getting an AI gold rush instead of a factory boom   washingtonpost.com/busine... · Posted by u/voxleone
trenchpilgrim · 2 months ago
No they don't. 5-8 years is common. The source for the 3 year number is an unnamed random person claiming to be a Google engineer, and Google specifically reached out to all the journalists publishing that claim with this response.

> Recent purported comments about Nvidia GPU hardware utilization and service life expressed by an “unnamed source” were inaccurate, do not represent how we utilize Nvidia’s technology, and do not represent our experience.

boroboro4 · 2 months ago
Data centers might be, GPUs not really. No one needs GPUs from 8 years, and hardly even 5.
boroboro4 commented on No science, no startups: The innovation engine we're switching off   steveblank.com/2025/10/13... · Posted by u/chmaynard
cgh · 2 months ago
Also, I believe in the US ordinary dividends are taxed at the income tax rate which is much higher than the capital gains rate.
boroboro4 · 2 months ago
It doesn’t make sense to compare ordinary dividends to capital gains - either compare ordinary to short term gains or qualified to long term gains.
boroboro4 commented on Bcachefs removed from the mainline kernel   lwn.net/Articles/1040120/... · Posted by u/Bogdanp
ChocolateGod · 3 months ago
> After all, Linus would never break userspace right?

But bcachefs never lived in userspace even before it was merged

boroboro4 · 3 months ago
In my opinion any family lives in user space, through a implicit contract of filesystems and data stored on disk?
boroboro4 commented on Java 25's new CPU-Time Profiler   mostlynerdless.de/blog/20... · Posted by u/SerCe
porridgeraisin · 3 months ago
ChatGPT
boroboro4 · 3 months ago
Thank you for telling, I went through their comments and they all like this :-( While having substance very obviously AI generated
boroboro4 commented on Defeating Nondeterminism in LLM Inference   thinkingmachines.ai/blog/... · Posted by u/jxmorris12
eldenring · 3 months ago
Very impressive! I guess this still wouldn't affect their original example

> For example, you might observe that asking ChatGPT the same question multiple times provides different results.

even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.

> Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.

The router also leaks batch-level information across sequences.

boroboro4 · 3 months ago
> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.

I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.

boroboro4 commented on ML needs a new programming language – Interview with Chris Lattner   signalsandthreads.com/why... · Posted by u/melodyogonna
davidatbu · 3 months ago
IIUC, triton uses Python syntax, but it has a separate compiler (which is kinda what Mojo is doing, except Mojo's syntax is a superset of Python's, instead of a subset, like Triton). I think it's fair to describe it as a different language (otherwise, we'd also have to describe Mojo also as "Python"). Triton's website and repo describes itself as "the Triton language and compiler" (as opposed to, I dunno, "Write GPU kernels in Python").

Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?

[0] https://github.com/Dao-AILab/flash-attention

But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?

boroboro4 · 3 months ago
From this perspective PyTorch is separate language, at least as soon as you start using torch.compile (only subset of PyTorch python will be compilable). That’s strength of python - it’s great for describing things and later for analyzing them (and compiling, for example).

Just to be clear here - you use triton from plain python, it runs compilation inside.

Just like I’m pretty sure not all mojo can be used to write kernels? I might be wrong here, but it would be very hard to fit general purpose code into kernels (and to be frank pointless, constrains bring speed).

As for flash attention there was a leak: https://www.reddit.com/r/LocalLLaMA/comments/1mt9htu/flashat...

boroboro4 commented on ML needs a new programming language – Interview with Chris Lattner   signalsandthreads.com/why... · Posted by u/melodyogonna
davidatbu · 3 months ago
Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc.
boroboro4 · 3 months ago
But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python.

Somehow python managed to be both high level and low level language for GPUs…

boroboro4 commented on ML needs a new programming language – Interview with Chris Lattner   signalsandthreads.com/why... · Posted by u/melodyogonna
davidatbu · 3 months ago
Last I checked , all of pytorch, tensorflow, and Jax sit at a layer of abstraction that is above GPU kernels. They avail GPU kernels (as basically nodes in the computational graph you mention), but they don't let you write GPU kernels.

Triton, CUDA, etc, let one write GPU kernels.

boroboro4 · 3 months ago
Torch.compile sits at both the level of computation graph and GPU kernels and can fuse your operations by using triton compiler. I think something similar applies to Jax and tensorflow by the way of XLA, but I’m not 100% sure.

u/boroboro4

KarmaCake day551December 24, 2021View Original