Accelerate Python code by importing Taichi

I'll take this opportunity to point out that if you're doing anything numpy related that seems too slow you should run numba on it, in my case we were doing a lot of cosine distance calculations and our inference time sped up 10x by simply running the cosine distance function from numpy through numba and it's as easy as adding a decorator.

learndeeply · 3 years ago

This is mentioned directly in the article:

Taichi vs. Numba: As its name indicates, Numba is tailored for Numpy. Numba is recommended if your functions involve vectorization of Numpy arrays. Compared with Numba, Taichi enjoys the following advantages:

Taichi supports multiple data types, including struct, dataclass, quant, and sparse, and allows you to adjust memory layout flexibly. This feature is extremely desirable when a program handles massive amounts of data. However, Numba only performs best when dealing with dense NumPy arrays. Taichi can call different GPU backends for computation, making large-scale parallel programming (such as particle simulation or rendering) as easy as winking. But it would be hard even to imagine writing a renderer in Numba.

jack_pp · 3 years ago

Except some people don't read the article and already assume numpy is "very" optimized that they might gloss over that line without reading much into it. That line also doesn't say that you might get a 10x speed-up while using numba. I remember when I first came across numba I searched HN for references and didn't find many stories or comments praising it so I skipped over it initially so having HN comments might be useful for future HN'ers.

gautamdivgi · 3 years ago

There is Numba and then there is Nutika if you want to compile to a binary. I’m not sure the two work together. But Taichi may work with Nutika for runtime optimization as a binary.

cycomanic · 3 years ago

The equivalent of Nutika for numerical code is pythran. Which compiles to highly optimized c++ code. I have been getting the best speedup with the least changes when using it compared to numba or cython (haven't tested taichi yet).

vram22 · 3 years ago

s/Nutika/Nuitka

https://en.m.wikipedia.org/wiki/Nuitka

krastanov · 3 years ago

Do jax/tensorflow/pytorch work with numba? I.e. can you pass one of their arrays through a numba function and have it (a) not crash (b) support backprop?

p1esk · 3 years ago

Seems like pytorch does: https://stackoverflow.com/questions/63169760/how-can-i-use-n...

Siira · 3 years ago

Jax already has its own tracing compiler which produces optimized code.

I now get that Reaction-Diffusion business.

1) "Diffussion" is species vs time equals species spatial laplacian.

2) The "reaction" equations are non-painfully derived from Baez stochastic Petri nets/chemical reaction networks in [1] (species vs time = multivariate polynomial in species, "space dependant rate equation")

So Reaction-Diffusion is just adding up. Species vs time = species spatial laplacian plus multivariate polynomial in species. One more for the toolbox!

[1] https://arxiv.org/abs/1209.3632

127 · 3 years ago

It's just alternating between sharpen and smoothen. Sharpen hallucinates new information. Smoothen diffuses and erases information. Thus there is this interplay with constant hallucinations that take place over previous ones.

Mathematicians and biologists have a hammer, so everything looks like a nail.

astroalex · 3 years ago

I'm interested in understanding this comment but I don't know where to start. I love the way Reaction Diffusion simulations look and I've coded it up a few times. But I don't understand what you mean by "species vs time". (Some of the other technical language seems more Google-able, but "species vs time" isn't turning up anything obvious.)

soVeryTired · 3 years ago

I think GP is referring to the heat equation. "species" is the concentration of the "stuff" that you're describing mathematically. Call that u(x), where is a spatial coordinate and u is a real valued function.

Then the diffusive part says

du(x, t)/dt = \nabla_x u(x, t).

The \nabla term is the laplacian: a multivariate form of the second derivative.

The equation says that a short time from now, u(x, t) will change in proportion to the average value of u, calculated over a small ball surrounding the point x, minus the value of u at the point x itself.

If there's less "stuff" in the points that neighbour x than at x itself, the function will decrease over time. Similarly if there's more stuff at the neighbours of x, u(x, t) will increase. This is the basis of diffusive behaviour.

(Edit: I think the equation in the article is wrong, unless I've misunderstood something: they have a delta (first derivative) when they should have a nabla (laplacian))

jesuslop · 3 years ago

IanCal · 3 years ago

I thought I recognized the name. Taichi also is linked with differentiable programming https://docs.taichi-lang.org/docs/differentiable_programming

An extremely interesting area. I keep wanting to use it for something but haven't had a good use case yet, nor frankly do I think I really understand it.

jokoon · 3 years ago

question to OP:

how faster can the code in those SO answers be?

https://stackoverflow.com/questions/73473074/speed-up-set-pa...

This code is recursive and generate set partitions for large N values (N larger than 12), it essentially works by skipping small partitions and small subsets to target desirable set partitions. Solutions that don't skip those suffer from "combinatory explosion".

I did not write this code, I want to test it later with taichi, but I'm curious if taichi can run this faster.

windsignaling · 3 years ago

Slightly off topic but the choice of name is interesting given that Tai chi is well-known for its slow movements and being practiced by the elderly at the park.

tiagod · 3 years ago

I think it might have to do with making your code faster without touching it

wiz21c · 3 years ago

I practice tai chi and I'm not elderly (though not exactly young either :-) ) tai chi is actually very hard to do well because it requires a lot of flexibility, I mean a lot).

Bimos · 3 years ago

You should see what Taichi lessons in Chinese colleges look like: full with students who like no kinds of sports but must choose a PE class, and yes, I was one of them.

Dead Comment

tomthe · 3 years ago

Unfortunately, phytran is missing in the comparison. Phytran works in a lot of cases and it easy to use by just using python types. I would like to see a comparison with taichi, as taichi also seems to be interesting.

mkl · 3 years ago

I'm pretty sure you mean Pythran. That's been disappointing in my experiments with it. Nuitka is another one that's missing.

What do you mean disappointing? I have consistently been getting the best results with pythran. That said it is strongly focused on numerical code, so your milage might vary for other code. You also should add compiler optimisation flags to get the best performance.

Regarding Nutika AFAIK its goal is not a speed up and performance gains are pretty modest in most cases.

Yes, I mean Pythran ( https://github.com/serge-sans-paille/pythran ). Thank you.

Was Nuitka better? Pythran is quite simple to install and use in Jupyter.

NotYourLawyer · 3 years ago

Sounds kind of like the old psyco, but with GPU support.

sitkack · 3 years ago

Or JAX

speps · 3 years ago

I thought it was a parsing issue in Python when doing "import taichi as ti" vs "import taichi". No it's just presenting Taichi, a Python package to do parallel computation.

EDIT: title of the thread was "Accelerate Python code 100x by import taichi as ti" like TFA

rjh29 · 3 years ago

Me too - it wouldn't be unheard of in a language where referencing multiple.levels.of.variable in a loop is orders of magnitude slower than doing "a = multiple.levels.of.variable" outside the loop and referencing a inside of it.

*may have been fixed in recent versions of Python - I heard of this many years ago!

nerdponx · 3 years ago

Historically this was a legitimate performance micro-optimization in Python:

    def f(i):
         ...

    def g():
        _f = f
        for i in range(100000):
            _f(i)

because looking up a local variable was faster than looking up a global. I'm not sure if that's still true in newer versions.

stingraycharles · 3 years ago

Isn’t that expected behaviour, as you’re only looking up “a” once when you do it outside the loop, while doing it every time when inside the loop?

Because any reference in the whole hierarchy could change during the looping (e.g. one could say “multiple.levels = {}” at some point), the interpreter really would need to check it every time unless it can somehow “prove” that these changes will never happen / haven’t happened.

Just keeping a reference to “a” is semantically very different, and I’d consider that a normal optimisation.

Deleted Comment