Taichi: Use a condition in a kernel crashes taichi

Created on 24 Feb 2020 · 9Comments · Source: taichi-dev/taichi

I'm not sure if I'm using taichi incorrectly, but the following program crashes:

import taichi as ti

ti.init(arch=ti.cuda) # Run on GPU by default

n = 100
pixels = ti.var(dt=ti.i32, shape=(n, n))

@ti.kernel
def paint(t: ti.i32):
    if t < 100:
        for i, j in pixels:
            pixels[i, j] = 1

gui = ti.GUI("Automata", (n, n))
for t in range(n):
    paint(t)
    gui.set_image(pixels)
    gui.show()

[Release mode]
[Taichi version 0.5.2, cuda 10.1, commit 4d56959a]
[E 02/23/20 18:21:46.543] [statements.inc.h:visit@19] Not supported.
[E 02/23/20 18:21:46.543] Received signal 6 (Aborted)
***********************************
* Taichi Compiler Stack Traceback *
***********************************
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::signal_handler(int)
/lib64/libc.so.6(+0x361e0) [0x7f468dc031e0]
/lib64/libpthread.so.0: raise
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::IRVisitor::visit(taichi::Tlang::StructForStmt*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CodeGenLLVM::visit(taichi::Tlang::Block*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CodeGenLLVM::visit(taichi::Tlang::IfStmt*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CodeGenLLVM::visit(taichi::Tlang::Block*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CodeGenLLVMGPU::visit(taichi::Tlang::OffloadedStmt*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CodeGenLLVM::visit(taichi::Tlang::Block*)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::GPUCodeGen::codegen_llvm()
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::KernelCodeGen::compile(taichi::Tlang::Program&, taichi::Tlang::Kernel&)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Program::compile(taichi::Tlang::Kernel&)
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Kernel::compile()
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Kernel::operator()()
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so(+0xd78034) [0x7f466558c034]
/home/sci/karthik/.virtualenvs/deep/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so(+0xb25494) [0x7f4665339494]
/usr/lib64/libpython3.6m.so.1.0: _PyCFunction_FastCallDict
/usr/lib64/libpython3.6m.so.1.0: _PyObject_FastCallDict
/usr/lib64/libpython3.6m.so.1.0: _PyObject_Call_Prepend
/usr/lib64/libpython3.6m.so.1.0: PyObject_Call
/usr/lib64/libpython3.6m.so.1.0(+0x147c40) [0x7f468e2ecc40]
/usr/lib64/libpython3.6m.so.1.0: _PyObject_FastCallDict
/usr/lib64/libpython3.6m.so.1.0(+0x18884d) [0x7f468e32d84d]
/usr/lib64/libpython3.6m.so.1.0: _PyEval_EvalFrameDefault
/usr/lib64/libpython3.6m.so.1.0: PyEval_EvalCodeEx
/usr/lib64/libpython3.6m.so.1.0(+0x1103c3) [0x7f468e2b53c3]
/usr/lib64/libpython3.6m.so.1.0: PyObject_Call
/usr/lib64/libpython3.6m.so.1.0: _PyEval_EvalFrameDefault
/usr/lib64/libpython3.6m.so.1.0: _PyFunction_FastCallDict
/usr/lib64/libpython3.6m.so.1.0: _PyObject_FastCallDict
/usr/lib64/libpython3.6m.so.1.0: _PyObject_Call_Prepend
/usr/lib64/libpython3.6m.so.1.0: PyObject_Call
/usr/lib64/libpython3.6m.so.1.0(+0x147c40) [0x7f468e2ecc40]
/usr/lib64/libpython3.6m.so.1.0: _PyObject_FastCallDict
/usr/lib64/libpython3.6m.so.1.0(+0x18884d) [0x7f468e32d84d]
/usr/lib64/libpython3.6m.so.1.0: _PyEval_EvalFrameDefault
/usr/lib64/libpython3.6m.so.1.0: PyEval_EvalCodeEx
/usr/lib64/libpython3.6m.so.1.0: PyEval_EvalCode
/usr/lib64/libpython3.6m.so.1.0(+0x208d62) [0x7f468e3add62]
/usr/lib64/libpython3.6m.so.1.0: PyRun_FileExFlags
/usr/lib64/libpython3.6m.so.1.0: PyRun_SimpleFileExFlags
/usr/lib64/libpython3.6m.so.1.0: Py_Main
python(main+0x1e8) [0x564d106f5d18]
/lib64/libc.so.6: __libc_start_main
python(_start+0x2a) [0x564d106f5e8a]

Removing the condition from the kernel lets the program run:

@ti.kernel
def paint(t: ti.i32):
    for i, j in pixels:
        pixels[i, j] = 1

Additionally, the condition in this code never evaluates to false, even when the t passed in is less than 50:

@ti.kernel
def paint(t: ti.i32):
    if t > 50:
        return
    for i, j in pixels:
        pixels[i, j] = 1

Is this a bug, or am I misunderstanding how conditions work in taichi?

potential bug stale

Source

medakk

Most helpful comment

Oops, I misunderstood what was meant by "multi-return value" in functions. Thank you!

medakk on 24 Feb 2020

🎉3

All 9 comments

Hi,

For your first case,

if t < 100:
    for i, j in pixels:
        pixels[i, j] = 1

The for i, j in pixels: is what Taichi called a struct-for kernel (See https://taichi.readthedocs.io/en/latest/hello.html#parallel-for-loops). There are two features that Taichi couldn't support here:

Taichi can only parallelize the loop at the out-most scope automatically. Since in this case, the out-most scope is a if, so the entire kernel will map to a 1x1x1 CUDA kernel.
Taichi only allows struct-for loops at the out-most scope. Internally, Taichi does a series steps of transformations to figure out how to correctly parallelize such kind of for loops. That's why removing the if made Taichi work.

TL;DR Use struct-for only at the out-most scope.

As for the third one

@ti.kernel
def paint(t: ti.i32):
    if t > 50:
        return
    for i, j in pixels:
        pixels[i, j] = 1

I printed out the IR and it looks like something below:

[T 02/24/20 11:19:00.434] [codegen_x86.cpp:lower_llvm@22] Initial IR:
==========
kernel {
  if (arg[0] > 100) {
  }
}
# That's it, statements following the return are gone...

I guess Taichi couldn't handle return properly at this point yet, maybe similar to how ti.func cannot handle multiple return values (https://taichi.readthedocs.io/en/latest/syntax.html#functions). Will need @yuanming-hu for confirmation..

FYI, you can print out Taichi IR by doing something like

ti.init(..., print_ir=True)
ti.set_logging_level(ti.TRACE)  # finest grained logging level

For @yuanming-hu , do you think we should clarify the doc that it is the out-most scope loop that can be parallelized, not the out-most loop?

k-ye on 24 Feb 2020

👍3

As @k-ye said return is not yet supported in taichi kernels. However, you can write the result to a global scalar and read from it in Python-scope.

Also,

@ti.kernel
def paint(t: ti.i32):
    if t < 100:
        for i, j in pixels:
            pixels[i, j] = 1

paint(t)

could be transformed into a working version:

@ti.kernel
def paint(t: ti.i32):
    for i, j in pixels:
        pixels[i, j] = 1

if t < 100:
  paint(t)

yuanming-hu on 24 Feb 2020

👍2

For @yuanming-hu , do you think we should clarify the doc that it is the out-most scope loop that can be parallelized, not the out-most loop?

Good idea! I'm rushing for removing unified memory dependency right now and will work on these later. If you have a chance please go ahead.

yuanming-hu on 24 Feb 2020

👍1

OK, SG :-)

k-ye on 24 Feb 2020

Got it! Thank you for the detailed responses!

medakk on 24 Feb 2020

🎉1

I'm trying to split some of my code into a ti.func, but the generated IR seems incorrect(look at the if statements in the original code and the IR):


@ti.func
def rule_30(a: ti.i32, b: ti.i32, c: ti.i32) -> ti.i32:
    if a == 1:
        if b == 0 and c == 0:
            return 1
        else:
            return 0
    else:
        if b == 0 and c == 0:
            return 0
        else:
            return 1

@ti.kernel
def paint(t: ti.i32):
    for i in range(1, n-1):
        a = cells[t-1, i-1] 
        b = cells[t-1, i] 
        c = cells[t-1, i+1] 

        cells[t, i] = rule_30(a, b, c)

IR: (there is no IR for the ti.func "rule_30". Presumably it was optimized out and inlined into the kernel)

[T 02/23/20 21:09:59.335] [/home/karthik/.virtualenvs/automata/lib/python3.6/site-packages/taichi/lang/kernel.py:__call__@347] Compiling kernel paint_c8_0_...
[T 02/23/20 21:09:59.343] [codegen_x86.cpp:lower_llvm@620] Initial IR:
==========
kernel {
  for @tmp7 in range((cast<int32> 1), (cast<int32> 199)) {
    $1 = alloca @tmp9
    @tmp9 = gbl load #@tmp0[(arg[0] - 1), (@tmp7 - 1)]
    $3 = alloca @tmp10
    @tmp10 = gbl load #@tmp0[(arg[0] - 1), @tmp7]
    $5 = alloca @tmp11
    @tmp11 = gbl load #@tmp0[(arg[0] - 1), (@tmp7 + 1)]
    $7 = alloca @tmp12
    @tmp12 = @tmp9
    $9 = alloca @tmp13
    @tmp13 = @tmp10
    $11 = alloca @tmp14
    @tmp14 = @tmp11
    if (@tmp12 == 1) {
      if ((@tmp13 == 0) & (@tmp14 == 0)) {
        #@tmp0[arg[0], @tmp7] = 1
      }
    }
  }
}

If I copy paste the function's body into paint(), I get the correct IR:

@ti.kernel
def paint(t: ti.i32):
    for i in range(1, n-1):
        a = cells[t-1, i-1] 
        b = cells[t-1, i] 
        c = cells[t-1, i+1] 

        if a == 1:
            if b == 0 and c == 0:
                cells[t, i] = 1
            else:
                cells[t, i] = 0
        else:
            if b == 0 and c == 0:
                cells[t, i] = 0
            else:
                cells[t, i] = 1

IR:

[T 02/23/20 21:14:34.735] [/home/karthik/.virtualenvs/automata/lib/python3.6/site-packages/taichi/lang/kernel.py:__call__@347] Compiling kernel paint_c8_0_...
[T 02/23/20 21:14:34.744] [codegen_x86.cpp:lower_llvm@620] Initial IR:
==========
kernel {
  for @tmp7 in range((cast<int32> 1), (cast<int32> 199)) {
    $1 = alloca @tmp9
    @tmp9 = gbl load #@tmp0[(arg[0] - 1), (@tmp7 - 1)]
    $3 = alloca @tmp10
    @tmp10 = gbl load #@tmp0[(arg[0] - 1), @tmp7]
    $5 = alloca @tmp11
    @tmp11 = gbl load #@tmp0[(arg[0] - 1), (@tmp7 + 1)]
    if (@tmp9 == 1) {
      if ((@tmp10 == 0) & (@tmp11 == 0)) {
        #@tmp0[arg[0], @tmp7] = 1
      } else {
        #@tmp0[arg[0], @tmp7] = 0
      }
    } else {
      if ((@tmp10 == 0) & (@tmp11 == 0)) {
        #@tmp0[arg[0], @tmp7] = 0
      } else {
        #@tmp0[arg[0], @tmp7] = 1
      }
    }
  }
}

Here is the complete program:

import taichi as ti

ti.init(arch=ti.cuda, print_ir=True) # Run on GPU by default
ti.set_logging_level(ti.TRACE)

n = 200
cells = ti.var(dt=ti.i32, shape=(n, n))

img_n = 700
pixels = ti.var(dt=ti.f32, shape=(img_n, img_n))

@ti.kernel
def gen_image():
    for i, j in pixels:
        x = (n * i) // img_n
        y = (n * j) // img_n
        pixels[i, j] = cells[x, y]

@ti.kernel
def init_cells():
    for i, j in cells:
        cells[i, j] = 0

    cells[0, n//2] = 1

@ti.func
def rule_30(a: ti.i32, b: ti.i32, c: ti.i32) -> ti.i32:
    if a == 1:
        if b == 0 and c == 0:
            return 1
        else:
            return 0
    else:
        if b == 0 and c == 0:
            return 0
        else:
            return 1

@ti.kernel
def paint(t: ti.i32):
    for i in range(1, n-1):
        a = cells[t-1, i-1] 
        b = cells[t-1, i] 
        c = cells[t-1, i+1] 

        cells[t, i] = rule_30(a, b, c)

gui = ti.GUI("Automata", (img_n, img_n))

init_cells()
for i in range(1, n):
    paint(i)
    gen_image()
    gui.set_image(pixels)
    gui.show()

gui.set_image(pixels)
# gui.wait_key()

medakk on 24 Feb 2020

Multi-return value in ti.func is not supported yet: https://taichi.readthedocs.io/en/latest/syntax.html#functions. Could you try use a local variable to hold the return values?

k-ye on 24 Feb 2020

Oops, I misunderstood what was meant by "multi-return value" in functions. Thank you!

medakk on 24 Feb 2020

🎉3

Warning: The issue has been out-of-update for 50 days, marking stale.