Readit News logoReadit News
dgacmu · 3 years ago
Out of curiosity, I rewrote their prime counter example to use a sieve instead of being a silly maximally-computation-dense example.

To make it work with taichi, I had to change the declaration of the sieve from sieve = [0] * N to sieve = ti.field(ti.i8, shape=N) but the rest of the code remained the same.

Ordinary Python:

time elapsed: 0.444s

Taichi ignoring compile time, I believe:

time elapsed: 0.119s

A slightly more realistic example than the 10x+ improvement they show on the really toy example with results that aren't too bad. I'd take a 3x improvement for tiny changes. Pretty neat!

(I tried some other trivial things like using np.int8 and it was slower. One can obviously make this a ton faster but I was interested in seeing how the toy was if we just made it slightly more memory-bound).

A negative was that throwing list comprehensions in made the python version faster - about 0.3 seconds - (and shorter and arguably more "pythonic") and simultaneously broke the port to taichi.

vishal0123 · 3 years ago
I tried implementing the same and I am getting 500ms vs 20ms with wrong answer in the first call in taichi but correct in subsequent calls. I guess I found some bug in taichi: https://imgur.com/a/lpK2iVF

Could you share your code as well.

    N = 1000000

    isnotprime = [0] * N

    def count_primes(n: int) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count

    import taichi as ti
    ti.init(arch=ti.cpu)

    isnotprime = ti.field(ti.i8, shape=(N, ))

    @ti.kernel
    def count_primes(n: ti.i32) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count

dgacmu · 3 years ago
Python:

    import time
    import math
    import numpy as np
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = [False] * N
    
    def init_sieve():
        for i in range(2, SN):
            if not sieve[i]:
                k = i*2
                while k < N:
                    sieve[k] = True
                    k += i
            
    
    def count_primes(n: int) -> int:
        return (N-2) - sum(sieve)
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")
Taichi:

    import taichi as ti
    import time
    import math
    ti.init(arch=ti.cpu)
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = ti.field(ti.i8, shape=N)
    
    @ti.kernel
    def init_sieve():
        for i in range(2, SN):
            if sieve[i] == 0:
                k = i*2
                while k < N:
                    sieve[k] = 1
                    k += i
            
    @ti.kernel
    def count_primes(n: int) -> int:
        count = 0
        for i in range(2, N):
            if (sieve[i] == 0):
                count += 1
        return count
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")
(The difference of using 0 vs False is tiny; I had just been poking at the python code to think about how I'd make it more pythonic and see if that made it worse to do taichi)

bombolo · 3 years ago
But how fast does it go with pypy and no changes to the code?
v3ss0n · 3 years ago
Interested as well
garyrob · 3 years ago
Might be worthwhile to run the same code with an appropriate numba decorator. My guess is that you'd get at least as much speed up but without having to change the sieve declaration, but I'm not sure.

Deleted Comment

sh1mmer · 3 years ago
> No barrier to entry for Python users: Taichi shares almost the same syntax as Python. Apply a single Taichi decorator, and your functions are automatically turned into optimized machine code.

It looks super interesting except “almost the same syntax as Python” part here seems like such a foot gun for everything from IDE integration to subtle bugs and more.

I was super into the idea of a strict Python subset that gets JIT compiled inline based on just a decorator.

icefo · 3 years ago
I helped someone that had to use taichi code written by a PhD student and it was a bit weird. It looks a lot like python but you have to code like you would in cuda (e.g control flow), there is no magic.

For this we had to calculate forces to animate some kind of polygon with a lot of joints and we could not just call sycipy from the taichi code. I had to implement a very dirty polynomial equation solver in taichi for the demo

farcaster · 3 years ago
I was playing earlier today with Triton[0], from OpenAI. Like Taichi it makes it super easy to write native GPU code from Python, but it really does feel like something very experimental for now. (I know the use case is very different)

[0] https://openai.com/research/triton

robertlagrant · 3 years ago
Triton is clearly a popular name for GPU access and inference[0].

[0] https://developer.nvidia.com/nvidia-triton-inference-server

coldtea · 3 years ago
>but it really does feel like something very experimental for now

Meaning?

- their approach is still bizarre and exploratory and they still don't know how to structure their APIs and are making it up as they go?

or:

- there are still some rough edges, bugs, and no full documentation yet?

as those are quite different cases...

Archit3ch · 3 years ago
>CPUs and AMD GPUs are not supported at the moment

CUDA-only, no mention of Metal.

Deleted Comment

xiaodai · 3 years ago
Please don't retrofit more stuff to make python work. move over to julia already. u can call python from Julia
Buttons840 · 3 years ago
It's been awhile, but last time I was doing serious work in Julia things were a little janky. For example, the REPL would segfault sometimes if I Ctrl-C'ed during heavy computations. And Flux at first seems like it will work on any code, which seems amazing, but then you find out at runtime that one of the operations you used isn't supported and get a runtime error. PyTorch might not work on regular Python code, but at least I know the APIs provided by PyTorch will work, even though they are a subset of what can be done in regular Python.

Still, most things worked in Julia, and there have been many improvements since then so I suspect the few remaining rough spots are being smoothed out. In the future I will be happy if I get to work with Julia more.

xiaodai · 3 years ago
Yeah. It's a chicken and egg thing. Imagine if resources applied to PyTorch was spent implementing in Julia.

But then there's not enough users... so the cycle continues until one day Julia hits critical mass and a tipping point is reached.

montebicyclelo · 3 years ago
I think the Julia people underestimate how many people, and how much those people, like the syntax of Python. If they'd stuck more to that they might have been able to have won more people over.. (white space, plus many other things).
acmj · 3 years ago
It is not just syntax. While python has Java-like OOP from a far, Julia has a distinct language design. It doesn't have a concept of "class" in the traditional sense. It instead has multiple dispatch, which is very flexible but sometimes too flexible to control. I found Julia harder to write for an averaged programmer. Furthermore, the time-to-first-plot problem had pissed off many early adopters (I know a few and they won't come back) and apparently remains a problem for some [1].

Julia is a great niche language for what it is good at. It will survive but won't gain much popularity.

[1] https://discourse.julialang.org/t/very-slow-time-to-first-pl...

jarbus · 3 years ago
As someone who switched from python to Julia, Julia syntax supports things like list comprehensions, but also many better things like broadcasting that python really needs. The only thing python has over Julia is that Julia requires an “end” keyword where python uses indentation. But julia has so much more than python (like macros) that it’s just better syntax wise.
amval · 3 years ago
One of the features that I dislike the most abiout Julia is how list comprehensions have been directlly ported from Python. This was a conscious choice they made out of pragmatism, not because of the merits of the design.

I don't doubt that what you say it's true, but to me, it comes down more to lack of familiarity with other languages than any actual merits of Python syntax and semantics. Frankly, I am glad the didn't take more from Python.

nhgiang · 3 years ago
New language developers really overestimate most people's willingness to learn new languages
cwp · 3 years ago
Yes. It's quite amazing really. Also, people really hate using more than one language. Web developers will twist themselves into incredible knots to avoid having to write HTML and CSS and Javascript in the same project.
schemescape · 3 years ago
It’s not even the language that causes the most friction, it’s the runtime, libraries, package manager, build system, foreign function interfaces, and on and on…
xiaodai · 3 years ago
not overestimate. i think they know of the challenges and stickiness of python die hards.

but there's a better way, there's a better way.

like python used to be less popular than perl but look at where perl is now vs python. things do take time to change though

Archit3ch · 3 years ago
As someone in the Julia ecosystem, I wouldn't know where to begin with writing performant python. Is it Numba? Taichi? Torchscript? NumPy? Cython? Pypy? None of these seem to work together (besides calling each other, I suppose).
montebicyclelo · 3 years ago
Wow, it seems like you have a number of good options to choose from. What a horrible situation, to have a language with a really rich ecosystem of powerful libraries.
ActorNightly · 3 years ago
Performant code is really a big group of code.

If you look at ML, Python is completely fine because all the processing that happens with matrix multiplication, even on CPUs, far, far, FAR outweighs all the setup stuff in volume of operations.

On the other hand, if majority of your application relies heavily on processing speed (i.e you need compare/jump operations rather just add/multiply/load/store of the GPUs), Python is going to be slow. In this case, if you want custom performant code, you write C extensions for the performant critical code, and launch them from higher level python code.

That being said, there is generally a library (like Taichi) that already does this for you.

catchnear4321 · 3 years ago
The problem is I have no interest in calling Python from Julia, since I can just use Python.
alphanullmeric · 3 years ago
Ah yes, the language that claims to look like Python and run like C while not being particularly close in either aspect.
v3ss0n · 3 years ago
That's quite silly. Julia ecosystem is non-existent. If we move nimlang would be the closest . PyPy shaping up nice for cext part and when fully compatible we will just use PyPy.org.

Julia language features are really weak.

packetlost · 3 years ago
I wouldn't consider Julia a replacement for Python
Iwan-Zotow · 3 years ago
and its an ugly thing

they didn't get range composition right

okaleniuk · 3 years ago
Tried that. It was fun. Didn't see any benefit though, went back to Python.

I don't care much about how the interface over LLVM looks like. As soon as I have the same result in the end, I'd rather stick to whatever has more users.

ActorNightly · 3 years ago
I used to hate Python too, but i complety 180ed in the past few years as I learned more about computer science.

If you look at compute in general, it can be pretty much be summed up as add/multiply/load/store/compare/jump (straight from Jim Keller). All the other instructions are more specialized versions of that, with some having dedicated hardware in CPUS.

If you need to do those 6 things as fast as possible, on single piece of hardware, you are most likely writing a video game. Thus video game development is pretty much C/C++ with a few things of Swift/C# sprinkled about.

If a single piece of hardware requirement goes away (i.e you are writing a distributed system to serve a web app), people quickly figured out that hardware is cheaper than developer salary, and also network latency is going to be the dominant thing for speed. This is the reason Python took off - its super quick to write and deploy applications, and instead of paying a developer $10k+ a month, you can just spend half that on more EC2s that handle the load just fine, even if the end user has to wait 1.5 seconds instead of 1.1 for a result.

If you don't need compare/jump, your program is essentially better off suited to running on the GPUs. OpenCL/CUDA came about because people realized that a lot of applications simply need to do math without any decisions along the way, and GPUS are much better at this. The paradigm is that you write kernels that you then load onto the GPU - this can be done in any language since you really just need to run the code once. I.e Python, despite being slow is used primarily for ML because of this.

Then there is multiply/add only, which you probably best know in implementation as ASICs for bitcoin mining that blew GPUs out of the water. When you don't have memory controllers and just load/store from predefined locations, your speed goes through the roof. This is the future of ML chips as well, where your compiler looks a lot like the verilog/hdl compilers for FPGAs.

Furthermore, with ML, the compare/jump and even load/store is being rolled into multiply/add. You have seemingly complex algorithms like GPT that make decisions, but without any branching. Technically speaking, a NAND gate is all you need to make a general purpose CPU and you need 2 neurons to simulate a NAND gate. So you can build an entire general purpose CPU from multiply/add.

So in the end, its absolutely worth investing in Python and making it better. Languages like Julia are currently better suited to performant tasks, but the necessity of writing performant code to run on CPUs is going away slowly with every day. Its better to have a high purpose language that allows you to put ideas into code as quickly as possible, and then have different specializations for more generic tasks.

Deleted Comment

Deleted Comment

jalino23 · 3 years ago
julia is so nice but WHY did they have to decide to do a 1 based index?? like theres literally no reasonnnnnnnnnnnnnnnn
xiaodai · 3 years ago
let's see, fortran, r, matlab.

all serious numerical languages have that. it's more natural

0 base indexing is only good for calculating memory offsets. Nothing else. like in go `vec[a,b]` is indexing `a` to `b-1` which is purely because it's more convenient due to 0-indexing. this `b-1` is hugely confusing and big gotcha for the layman.

Dead Comment

Dead Comment

inconceivable · 3 years ago
wow. i ran their prime number python accelerator example on 10 000 000 upper bound:

(taichi) [X@X taichi]$ python primes.py

[Taichi] version 1.4.1, llvm 15.0.4, commit e67c674e, linux, python 3.9.14

[Taichi] Starting on arch=x64

Number of primes: 664579

time elapsed: 93.54279175889678/s

Number of primes: 664579

time elapsed: 0.5988388371188194/s

ipnon · 3 years ago
It seems like Taichi fast language, but it can also never be overstated how slow Python is on contemporary architectures.
nerdponx · 3 years ago
Right, I'd like to see a comparison to Nim or Julia, or another compiled high-level language that isn't particularly performance-oriented like Haskell or Clojure, or Common Lisp, or even Ruby with its new JIT(s). Or for that matter, Python with Numba or one of the other JIT implementations (PyPy, Pyston, Cinder).
ActorNightly · 3 years ago
It doesn't support 3.11 yet. 3.10 is slow compared to 3.11.
gus_massa · 3 years ago
Is that improvement caused because the program is automatically parallelized or because the code is compiled/JITed? A x150 improvement is too much, so I suspect both reasons collaborate to get it.
inconceivable · 3 years ago
they say the example i used is JIT compiled into machine code. i haven't looked into the codebase yet but i presume that means it just un-pythons it back into C? not sure.

fwiw, i tried the gpu target (cuda) and it was faster than vanilla, but slower than accelerated cpu target by about 4x.

adgjlsfhk1 · 3 years ago
And if you run it on a number bigger than 2^64, does it error because taichi automatically assumes python's BigInts are Int64s?

Deleted Comment

calebm · 3 years ago
This looks amazing! (particularly for some of my interests - https://gods.art)