Describe the bug
As mentioned in https://forum.taichi.graphics/t/homework-0-volumetric-clouds/331/5.
The main overhead seems to be to_numpy is set_image, according to the Taichi profiler.
Log/Screenshots
CUDA Profiler
[ 3.01%] paint min 0.873 ms avg 0.880 ms max 0.904 ms total 0.048 s [ 54x]
[ 61.51%] set_image min 16.123 ms avg 18.003 ms max 20.595 ms total 0.972 s [ 54x]
[ 35.48%] to_numpy min 10.060 ms avg 10.383 ms max 12.256 ms total 0.561 s [ 54x]
x64 Profiler
[ 8.97%] paint min 1.605 ms avg 2.198 ms max 3.506 ms total 0.174 s [ 79x]
[ 14.85%] to_numpy min 2.205 ms avg 3.639 ms max 6.288 ms total 0.287 s [ 79x]
[ 76.18%] set_image min 16.575 ms avg 18.664 ms max 21.561 ms total 1.474 s [ 79x]
OpenGL (mesa) Profiler
[ 0.25%] paint min 0.057 ms avg 0.064 ms max 0.092 ms total 0.007 s [ 112x]
[ 27.97%] to_numpy min 5.478 ms avg 7.261 ms max 11.470 ms total 0.813 s [ 112x]
[ 71.78%] set_image min 17.327 ms avg 18.632 ms max 28.779 ms total 2.087 s [ 112x]
OpenGL (NVIDIA) Profiler
[ 0.19%] paint min 0.062 ms avg 0.067 ms max 0.097 ms total 0.006 s [ 86x]
[ 52.33%] to_numpy min 16.925 ms avg 18.609 ms max 28.687 ms total 1.600 s [ 86x]
[ 47.48%] set_image min 15.515 ms avg 16.886 ms max 19.896 ms total 1.452 s [ 86x]
To Reproduce
import taichi as ti
import numpy as np
ti.init(arch=ti.cpu, print_ir=True)
res = 1280, 720
pixels = ti.Vector(3, dt=ti.f32, shape=res)
@ti.kernel
def paint(size_x: ti.template(), size_y: ti.template()):
for i, j in pixels:
u = i / size_x
v = j / size_y
pixels[i, j] = [u, v, 0]
paint(res[0], res[1]) # no lazy compile
pixels.to_numpy() # no lazy compile
if __name__ == '__main__':
gui = ti.GUI('UV', res)
while not gui.get_event(ti.GUI.ESCAPE):
ti.profiler_start('paint')
paint(res[0], res[1])
ti.sync()
ti.profiler_stop()
ti.profiler_start('to_numpy')
arr = pixels.to_numpy()
ti.sync()
ti.profiler_stop()
ti.profiler_start('set_image')
gui.set_image(arr)
ti.sync()
ti.profiler_stop()
gui.show()
ti.profiler_print()
It can also be reproduced in v0.5.11.
If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.
Diving into set_image:
x64 Profiler
[ 9.59%] paint min 1.602 ms avg 2.187 ms max 4.732 ms total 0.276 s [ 126x]
[ 16.85%] to_numpy min 2.560 ms avg 3.845 ms max 6.964 ms total 0.484 s [ 126x]
[ 0.04%] cook_image min 0.005 ms avg 0.008 ms max 0.028 ms total 0.001 s [ 126x]
[ 30.74%] astype min 5.534 ms avg 7.013 ms max 9.685 ms total 0.884 s [ 126x]
[ 37.10%] reshape min 7.886 ms avg 8.465 ms max 10.017 ms total 1.067 s [ 126x]
[ 5.69%] set_img min 1.133 ms avg 1.299 ms max 2.400 ms total 0.164 s [ 126x]
Seems it's numpy's fault...
How about rewrite GUI system in opengl with glfw backends, then assigning buffer by code like ti.GUI.init(buffer)?I guess that maybe it鈥檚 zero overhead when passing tensor in different gpu backends
Most helpful comment
How about rewrite GUI system in opengl with glfw backends, then assigning buffer by code like
ti.GUI.init(buffer)?I guess that maybe it鈥檚 zero overhead when passing tensor in different gpu backends