Readit News logoReadit News
chillee commented on Helion: A high-level DSL for performant and portable ML kernels   pytorch.org/blog/helion/... · Posted by u/jarbus
markush_ · 2 months ago
Interesting choice from PyTorch to release yet another DSL, on positive side it's one more point in the design space on the other hand it's even more difficult to choose the right technology among Triton, Gluon, CuTe, ThunderKittens and a few others.
chillee · 2 months ago
I think unlike Gluon/CuTe/ThunderKittens (which distinguish themselves from Triton by being lower level giving you more control, thus being less performance portable and harder to write), Helion distinguishes itself from Triton by being higher level and easier to write.

IMO, this is something that makes sense for PyTorch to release, as "neutral ground" in the industry.

chillee commented on Helion: A high-level DSL for performant and portable ML kernels   pytorch.org/blog/helion/... · Posted by u/jarbus
bwfan123 · 2 months ago
I dont get the point of helion as compared to its alternatives like gluon.

For best performance I would presume one needs low-level access to hardware knobs. And, these kernel primitives are written one-time and reused. So, what is the point of a DSL that dumbs things down as a wrapper around triton.

chillee · 2 months ago
What's the point of Triton compared to Gluon? What's the point of PyTorch compared to Triton?

One of the main values of Triton is that it significantly expanded the scope of folks who can write kernels - I think Helion could expand the scope even more.

chillee commented on Helion: A high-level DSL for performant and portable ML kernels   pytorch.org/blog/helion/... · Posted by u/jarbus
maknee · 2 months ago
How does this compare against other DSLs?
chillee · 2 months ago
If you think of Triton as a "baseline", most other DSLs are lower-level than Triton, whereas this is higher-level.
chillee commented on Amazon has mostly sat out the AI talent war   businessinsider.com/amazo... · Posted by u/ripe
shagie · 4 months ago
Revenue... yes. Profit is still an open question.

https://www.cnbc.com/2025/08/08/chatgpt-gpt-5-openai-altman-...

> Last year, OpenAI expected about $5 billion in losses on $3.7 billion in revenue. OpenAI’s annual recurring revenue is now on track to pass $20 billion this year, but the company is still losing money.

> “As long as we’re on this very distinct curve of the model getting better and better, I think the rational thing to do is to just be willing to run the loss for quite a while,” Altman told CNBC’s “Squawk Box” in an interview Friday following the release of GPT-5.

Selling compute for less than it cost you will have as much revenue as you want to pay for.

chillee · 4 months ago
Their gross profits are very high even though they're not making operating profit.
chillee commented on Amazon has mostly sat out the AI talent war   businessinsider.com/amazo... · Posted by u/ripe
shagie · 4 months ago
I am reminded of the Uncomfortable Amazon Truths ( https://news.ycombinator.com/item?id=20980025 ) by Corey Quinn.

While they're protected now, https://news.ycombinator.com/item?id=20980557 quotes the one I recall...

      - Nobody has figured out how to make money from AI/ML other than by selling you a pile of compute and storage for your AI/ML misadventures.
https://threadreaderapp.com/thread/1173367909369802752.html maintains the entire chain of tweets.

chillee · 4 months ago
Clearly not true anymore given OpenAI and Anthropic's revenue growth.
chillee commented on Are OpenAI and Anthropic losing money on inference?   martinalderson.com/posts/... · Posted by u/martinald
martinald · 4 months ago
Thanks for the correction (author here). I'll update the article - very fair point on compute on input tokens which I messed up. Tbh I'm pleased my napkin math was only 7x off the laws of physics :).

Even rerunning the math on my use cases with way higher input token cost doesn't change much though.

chillee · 4 months ago
The 32 parallel sequences is also arbitrary and significantly changes your conclusions. For example, if they run with 256 parallel sequences then that would result in a 8x cheaper factor in your calculations for both prefill and decode.

The component about requiring long context lengths to be compute-bound for attention is also quite misleading.

chillee commented on Are OpenAI and Anthropic losing money on inference?   martinalderson.com/posts/... · Posted by u/martinald
Den_VR · 4 months ago
So, bottom line, do you think it’s probable that either OpenAI or Anthropic are “losing money on inference?”
chillee · 4 months ago
No. In some sense, the article comes to the right conclusion haha. But it's probably >100x off on its central premise about output tokens costing more than input.
chillee commented on Are OpenAI and Anthropic losing money on inference?   martinalderson.com/posts/... · Posted by u/martinald
chillee · 4 months ago
This article's math is wrong on many fundamental levels. One of the most obvious ones is that prefill is nowhere near bandwidth bound.

If you compute out the MFU the author gets it's 1.44 million input tokens per second * 37 billion active params * 2 (FMA) / 8 [GPUs per instance] = 13 Petaflops per second. That's approximately 7x absolutely peak FLOPS on the hardware. Obviously, that's impossible.

There's many other issues with this article, such as assuming only 32 concurrent requests(?), only 8 GPUs per instance as opposed to the more efficient/standard prefill-decode disagg setups, assuming that attention computation is the main thing that makes models compute-bound, etc. It's a bit of an indictment of HN's understanding of LLMs that most people are bringing up issues with the article that aren't any of the fundamental misunderstandings here.

chillee commented on Tokasaurus: An LLM inference engine for high-throughput workloads   scalingintelligence.stanf... · Posted by u/rsehrlich
refibrillator · 7 months ago
The code has few comments but gotta love when you can tell someone was having fun!

https://github.com/ScalingIntelligence/tokasaurus/blob/65efb...

I’m honestly impressed that a pure python implementation can beat out vLLM and SGLang. Granted they lean on FlashInfer, and of course torch.compile has gotten incredibly powerful in the last few years. Though dynamic shapes have still been a huge thorn in my side, I’ll need to look closer at how they pulled it off…

chillee · 7 months ago
I mean, vllm and sglang are both "pure python" essentially as well. But yeah, in ML you rarely require C++ to get good performance for most of the systems people are writing.
chillee commented on Blender-made movie Flow takes Oscar   reuters.com/lifestyle/flo... · Posted by u/boguscoder
tzs · 10 months ago
That's kind of surprising. Academy members are not required to watch all the nominees for Best Animated Feature before voting. In fact they are not require to watch any of them.

Several years ago I remember that after a year where the movie that won best animated was not the one that those in the animation industry overwhelming thought was sure to win some animation industry magazine survived Academy members asking which movie they voted for and why.

What they found was that a large number of the voters thought of animated movies as just for little kids and hadn't actually watched any of the nominees. They picked their vote by whatever they remembered children in their lives watching.

E.g., if they were parents of young children, they'd vote for whatever movie that their kids kept watching over and over. If they no longer had children at home they would ask grandkids or nieces or nephews "what cartoon did you like last year?" and vote for that.

Another factor was that a lot of these people would vote for the one they had heard the most about.

That gives Disney a big advantage. How the heck did Flow overcome that?

Inside Out 2 had a much wider theatrical release in the US, was widely advertised, made $650 million domestic, is the second highest grossing animated movie of all time so far worldwide, and streams on Disney+.

All that should contribute to making it likely that those large numbers of "vote even though they don't watch animated movies" Academy members would have heard of it.

Flow had a small US theatrical release at the end of the year. I didn't see any advertising for it. I'd expect a lot of Academy members hadn't heard of it.

As a guess, maybe Moana 2 is the movie that the kids are repeat streaming. That was not a nominee so maybe those "vote for what my kid watched" voters didn't vote this year and so we actually got a year where quality non-Disney movies had a chance?

chillee · 10 months ago
A couple things:

1. The academy has had a significant increase of young voters in the past 10 years or so. Generally speaking, young voters are more likely to take animation as a "serious" medium.

2. These interviews were always somewhat overstated. Of course some voters have stupid rationales, but I don't think this dominates the academy.

3. Disney's Inside Out 2 was nowhere close to winning the award this year - Flow's biggest competition was The Wild Robot, which did gross far more than Inside Out 2, but far below Inside Out 2.

If you look at the past couple years, The Boy and the Heron (Studio Ghibli) won over Across the Spider-Verse (with Pixar's movie Elemental nowhere close) in 2023, Guillermo del Toro's Pinocchio won over Across the Spider-Verse (with Pixar's movie Turning Red nowhere close) in 2022, etc.

I'm curious what year you're thinking about above. Perhaps Toy Story 4 over Klaus in 2019?

u/chillee

KarmaCake day1261August 10, 2016
About
horace.io / twitter.com/cHHillee
View Original