Taichi: [benchmark] A simple shader takes too long time in set_image & to_numpy

Created on 3 Jun 2020  路  2Comments  路  Source: taichi-dev/taichi

Describe the bug
As mentioned in https://forum.taichi.graphics/t/homework-0-volumetric-clouds/331/5.
The main overhead seems to be to_numpy is set_image, according to the Taichi profiler.

Log/Screenshots

CUDA Profiler
[  3.01%] paint                                        min   0.873 ms   avg   0.880 ms    max   0.904 ms   total   0.048 s [     54x]
[ 61.51%] set_image                                    min  16.123 ms   avg  18.003 ms    max  20.595 ms   total   0.972 s [     54x]
[ 35.48%] to_numpy                                     min  10.060 ms   avg  10.383 ms    max  12.256 ms   total   0.561 s [     54x]
x64 Profiler
[  8.97%] paint                                        min   1.605 ms   avg   2.198 ms    max   3.506 ms   total   0.174 s [     79x]
[ 14.85%] to_numpy                                     min   2.205 ms   avg   3.639 ms    max   6.288 ms   total   0.287 s [     79x]
[ 76.18%] set_image                                    min  16.575 ms   avg  18.664 ms    max  21.561 ms   total   1.474 s [     79x]
OpenGL (mesa) Profiler
[  0.25%] paint                                        min   0.057 ms   avg   0.064 ms    max   0.092 ms   total   0.007 s [    112x]
[ 27.97%] to_numpy                                     min   5.478 ms   avg   7.261 ms    max  11.470 ms   total   0.813 s [    112x]
[ 71.78%] set_image                                    min  17.327 ms   avg  18.632 ms    max  28.779 ms   total   2.087 s [    112x]
OpenGL (NVIDIA) Profiler
[  0.19%] paint                                        min   0.062 ms   avg   0.067 ms    max   0.097 ms   total   0.006 s [     86x]
[ 52.33%] to_numpy                                     min  16.925 ms   avg  18.609 ms    max  28.687 ms   total   1.600 s [     86x]
[ 47.48%] set_image                                    min  15.515 ms   avg  16.886 ms    max  19.896 ms   total   1.452 s [     86x]

To Reproduce

import taichi as ti
import numpy as np

ti.init(arch=ti.cpu, print_ir=True)

res = 1280, 720
pixels = ti.Vector(3, dt=ti.f32, shape=res)

@ti.kernel
def paint(size_x: ti.template(), size_y: ti.template()):
  for i, j in pixels:
    u = i / size_x
    v = j / size_y
    pixels[i, j] = [u, v, 0]

paint(res[0], res[1]) # no lazy compile
pixels.to_numpy() # no lazy compile

if __name__ == '__main__':
  gui = ti.GUI('UV', res)

  while not gui.get_event(ti.GUI.ESCAPE):

    ti.profiler_start('paint')
    paint(res[0], res[1])
    ti.sync()
    ti.profiler_stop()

    ti.profiler_start('to_numpy')
    arr = pixels.to_numpy()
    ti.sync()
    ti.profiler_stop()

    ti.profiler_start('set_image')
    gui.set_image(arr)
    ti.sync()
    ti.profiler_stop()

    gui.show()

ti.profiler_print()

It can also be reproduced in v0.5.11.

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

GAMES201 potential bug

Most helpful comment

How about rewrite GUI system in opengl with glfw backends, then assigning buffer by code like ti.GUI.init(buffer)?I guess that maybe it鈥檚 zero overhead when passing tensor in different gpu backends

All 2 comments

Diving into set_image:

x64 Profiler
[  9.59%] paint                                        min   1.602 ms   avg   2.187 ms    max   4.732 ms   total   0.276 s [    126x]
[ 16.85%] to_numpy                                     min   2.560 ms   avg   3.845 ms    max   6.964 ms   total   0.484 s [    126x]
[  0.04%] cook_image                                   min   0.005 ms   avg   0.008 ms    max   0.028 ms   total   0.001 s [    126x]
[ 30.74%] astype                                       min   5.534 ms   avg   7.013 ms    max   9.685 ms   total   0.884 s [    126x]
[ 37.10%] reshape                                      min   7.886 ms   avg   8.465 ms    max  10.017 ms   total   1.067 s [    126x]
[  5.69%] set_img                                      min   1.133 ms   avg   1.299 ms    max   2.400 ms   total   0.164 s [    126x]

Seems it's numpy's fault...

How about rewrite GUI system in opengl with glfw backends, then assigning buffer by code like ti.GUI.init(buffer)?I guess that maybe it鈥檚 zero overhead when passing tensor in different gpu backends

Was this page helpful?
0 / 5 - 0 ratings