For best performance I would presume one needs low-level access to hardware knobs. And, these kernel primitives are written one-time and reused. So, what is the point of a DSL that dumbs things down as a wrapper around triton.
One of the main values of Triton is that it significantly expanded the scope of folks who can write kernels - I think Helion could expand the scope even more.
IMO, this is something that makes sense for PyTorch to release, as "neutral ground" in the industry.