Taichi: [Perf] Comparing Taichi with Numba

Created on 10 Jun 2020 · 3Comments · Source: taichi-dev/taichi

A zhihu user commented at https://zhuanlan.zhihu.com/p/145222094 shows that using numba is much faster than taichi on GPUs for my calc_pi example. Also it's heard that numba support CUDA at some degree too.
Not sure if this also apply to other applications. If we can reproduce this performance de-boost on other examples, then that may warn us that we may lose users go for numba for python-embbed parallel computation.
Lack of numba knowledges, I failed to make a numba version for simple_uv.py.
Here's a numpy-only version:

import taichi as ti
import numpy as np

res = 1280, 720


def paint():
    a = np.linspace(0, 1, res[1])
    b = np.linspace(0, 1, res[0])
    a, b = np.meshgrid(a, b)
    c = np.zeros((*res, 1))
    a = a.reshape((*res, 1))# + c
    b = b.reshape((*res, 1))# + c
    return np.concatenate((a, b, c), axis=2)


gui = ti.GUI('UV', res)
while not gui.get_event(ti.GUI.ESCAPE):
    pixels = paint()
    gui.set_image(pixels)
    gui.show()

gets ~ 32 fps on my machine, while Taichi/x64 gets ~ 51 fps, Taichi/OpenGL gets ~ 34 fps (because of copying overhead).
For numba parallelization example, see https://github.com/numba/numba/issues/3336.
For numba docs, see http://numba.pydata.org/numba-doc/latest.
An article about numba: https://www.jianshu.com/p/69d9d7e37bc5.

Source

archibate

👍2

Most helpful comment

Yeah I wouldn't too much about that - we are adding Thread Local Storage IR to address the reduction performance issue very soon.

Also, TBH, we haven't done a systematic performance study after switching to LLVM - there's a lot of space for performance improvements...

yuanming-hu on 10 Jun 2020

🎉6

All 3 comments

I am also curious about the advantage of taichi compared to numba.
As far as I know, taichi supports sparse computation, which is common and useful in simulation. But for now most examples don't seem to use sparse computation.

lvjiahui on 10 Jun 2020

I am not too surprised by this. The calc_pi example is doing a lot of atomic adds, which are really slow in Taichi right now. Run the profiler on the mgpcg example and you'll see the at the reductions taking very very long. We need some thread local/shared memory optimizations to make these faster.

KLozes on 10 Jun 2020

👍2

Yeah I wouldn't too much about that - we are adding Thread Local Storage IR to address the reduction performance issue very soon.

Also, TBH, we haven't done a systematic performance study after switching to LLVM - there's a lot of space for performance improvements...

yuanming-hu on 10 Jun 2020

🎉6

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[CUDA] [Perf] The performance of cornell_box.py on CUDA is still poor than OpenGL

archibate · 3Comments

Support `continue` in loops

yuanming-hu · 4Comments

Revamp dependencies and build system

kigawas · 4Comments

raise python errors instead of c++ abort #533

archibate · 4Comments

No taichi package on linux

kazimuth · 4Comments