davidatbu (u/davidatbu)

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

boroboro4 · 6 days ago

But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python.

Somehow python managed to be both high level and low level language for GPUs…

davidatbu · 6 days ago

IIUC, triton uses Python syntax, but it has a separate compiler (which is kinda what Mojo is doing, except Mojo's syntax is a superset of Python's, instead of a subset, like Triton). I think it's fair to describe it as a different language (otherwise, we'd also have to describe Mojo also as "Python"). Triton's website and repo describes itself as "the Triton language and compiler" (as opposed to, I dunno, "Write GPU kernels in Python").

Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?

[0] https://github.com/Dao-AILab/flash-attention

But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

tomovo · 8 days ago

While I appreciate all his work on LLVM, Chris Lattner's Swift didn't work out so well for me, so I'm cautious about this.

Swift has some nice features. However, the super slow compilation times and cryptic error messages really erase any gains in productivity for me.

- "The compiler is unable to type-check this expression in reasonable time?" On an M3 Pro? What the hell!?

- To find an error in SwiftUI code I sometimes need to comment everything out block by block to narrow it down and find the culprit. We're getting laughs from Kotlin devs.

davidatbu · 7 days ago

Fwiw, Chris has mentioned both of those as lessons he took from Swift that he'd like to avoid for Mojo.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

JonChesterfield · 7 days ago

ML seems to be doing just fine with python and cuda.

davidatbu · 7 days ago

Yeah the rate of progress in AI definitely makes it seem like that from the outside for me too.

But having never written cuda, I have to rely on authority to some extent for this question. And it seems to me like few are in a better position to opine on whether there's a better story to be had for the software-hardware boundary in ML than the person who wrote MLIR, Swift-for-Tensorflow (alongside with making that work on TPUs and GPUs), ran ML at Tesla for some time, was VP at SiFive, ... etc.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

boroboro4 · 7 days ago

Torch.compile sits at both the level of computation graph and GPU kernels and can fuse your operations by using triton compiler. I think something similar applies to Jax and tensorflow by the way of XLA, but I’m not 100% sure.

davidatbu · 7 days ago

Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

bjourne · 7 days ago

Yes, they kinda do. The computational graph you specify is completely different from the execution schedule it is compiled into. Whether it's 1, 2, or N kernels is irrelevant as long as it runs fast. Mojo being an HLL is conceptually no different from Python. Whether it will, in the future, become better for DNNs, time will tell.

davidatbu · 7 days ago

I assume HLL=Higher Level Language? Mojo definitely avails lower-level facilities than Python. Chris has even described Mojo as "syntactic sugar over MLIR". (For example, the native integer type is defined in library code as a struct).

> Whether it's 1, 2, or N kernels is irrelevant.

Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

monkeyelite · 7 days ago

That’s right, and saying otherwise is actually robbing him of his rightful talent and ability.

> really hard and long term software project

That’s kind of what I mean. He commits to projects and gets groups of people talking about them and interested.

Imagine how hard it was to convince everyone at apple to use his language - and how many other smart engineers projects were not chosen. It’s not even clear the engineering merits were there for that one.

davidatbu · 7 days ago

So I think a demonstrative example of your claim would be if you knew someone who is as accomplished with regards to compilers, language design, tackling really hard long term projects, but not as good at self promotion, and elaborate on what the lack of that skill-set caused.

The only other person I know of who has started and lead to maturity multiple massive and infrastructural software projects is Fabrice Bellard. I've never ran into him self promoting (podcasts, HN, etc), and yet his projects are widely used and foundational.

It seems to me like the evidence points to "if you tackle really hard, long term, and foundational software projects successfully, people will use it, regardless of your ability to self promote."

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

Alexander-Barth · 7 days ago

Actually in julia you can write kernels with a subset of the julia language:

https://cuda.juliagpu.org/stable/tutorials/introduction/#Wri...

With KernelAbstractions.jl you can actually target CUDA and ROCm:

https://juliagpu.github.io/KernelAbstractions.jl/stable/kern...

For python (or rather python-like), there is also triton (and probably others):

https://pytorch.org/blog/triton-kernel-compilation-stages/

davidatbu · 7 days ago

Chris's claim (at least with regards to Triton) is that it avails 80% of the performance, and they're aiming for closer to 100%.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

itsn0tm3 · 7 days ago

I‘m pretty sure in their early communications the stated very clearly that Mojo is going to be a clear superset of Python. Seems like they paddled back a bit in that regard.

davidatbu · 7 days ago

Yeah this is slightly confusing for me as well. Even in this very podcast, being a superset of Python was mentioned as a goal (albeit a long term one).

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

bjourne · 7 days ago

> Why can't there be a language which is both as easy to write as Python, but can still express GPU kernels for ML applications? That's what Mojo is trying to be through clever use of LLVM MLIR.

It already exists. It is called PyTorch/JAX/TensorFlow. These frameworks already contain sophisticated compilers for turning computational graphs into optimized GPU code. I dare say that they don't leave enough performance on the table for a completely new language to be viable.

davidatbu · 7 days ago

Last I checked , all of pytorch, tensorflow, and Jax sit at a layer of abstraction that is above GPU kernels. They avail GPU kernels (as basically nodes in the computational graph you mention), but they don't let you write GPU kernels.

Triton, CUDA, etc, let one write GPU kernels.

davidatbu commented on ML needs a new programming language – Interview with Chris Lattner signalsandthreads.com/why... · Posted by u/melodyogonna

bobajeff · 7 days ago

I don't think Mojo can solve the two language problem. Maybe if it was going to be superset of Python? Anyway I think that was actually Julia's goal not Mojo's.

davidatbu · 7 days ago

Being a Python superset is literally a goal of Mojo mentioned in the podcast.

Edit: from other posts on this page, I've realized that being a superset of Python is now regarded a nice-to-have by Modular, not a must-have. They realized it's harder than they thought initially, basically.