Taichi: [Bug] [ir] "continue" not working properly in non-dense struct-for

Created on 2 Jul 2020  路  7Comments  路  Source: taichi-dev/taichi

Describe the bug
I found that continue is not working well in struct-for on bitmasked structure.

To Reproduce

import taichi as ti
ti.init()
ti.core.toggle_advanced_optimization(False)

x = ti.var(ti.f32)
ti.root.bitmasked(ti.i, 32).place(x)

@ti.kernel
def error():
    for i in x:
        if i == 0:
            continue
        print(i)

@ti.kernel
def good():
    for i in x:
        if i == 0:
            pass
        else:
            print(i)


x[0] = 3
x[1] = 3
x[2] = 3
x[5] = 3
x[10] = 3
print('error:')
error()
print('good:')
good()

Log/Screenshots

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-nd61fh7e
[Taichi] <dev mode>, llvm 10.0.0, commit 67c6ea62, python 3.8.3
[Taichi] Starting on arch=x64
error:
good:
1
2
5
10

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

ir potential bug

Most helpful comment

Thank you guys for all the discussions here! I'll take these into consideration when I refactor the LLVM backend/runtime...

All 7 comments

I found this is caused by ContinueStmt::as_return().

@ti.kernel
def error():
    for i in x:
        if i == 0:
            continue
        print(i)

is compiled as the same as

@ti.kernel
def error():
    for i in x:
        if i == 0:
            return
        print(i)

which should be OK if the for is parallel. But it seems that the loop is serial. Why?

The IR looks pretty good:

[I 07/02/20 22:58:34.340] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_a4464fe7c75e1f42a3a490ee54c7ec3e>::operator ()@23] Simplified III:
kernel {
  $0 = offloaded clear_list S1bitmasked
  $1 = offloaded listgen S0root->S1bitmasked
  $2 = offloaded struct_for(S1bitmasked) block_dim=adaptive
  body {
    <i32 x1> $3 = loop $2 index 0
    <i32 x1> $4 = const [0]
    <i32 x1> $5 = const [1]
    <i32 x1> $6 = cmp_eq $3 $4
    <i32 x1> $7 = bit_and $5 $6
    $8 : if $7 {
      $9 continue (scope=$2)
    }
    print $3, "
"
  }
}

For a better feeling of return, try

@ti.kernel
def error():
    for i in x:
        if i == 2:
            continue
        print(i)

This is a problem that is subject to backend implementation.

We translated the top-level continue as return because in most cases, each kernel function handles one index group. However, the struct-for function on LLVM has a loop that can handle more than one indices, see https://github.com/taichi-dev/taichi/blob/ad3fe31c0d5e0762efe1da61859629420b61a884/taichi/codegen/codegen_llvm.cpp#L1314-L1319

So in this case, we have two options:

  1. Make the loop body into a function. This will make the LLVM struct-for's impl consistent with the other backends' impl
  2. Treat continue in LLVM struct-for as a special case.

I prefer the second option: it makes sense to me if we treat continue in LLVM struct-for as a real continue.

I prefer the second option: it makes sense to me if we treat continue in LLVM struct-for as a real continue.

Talked more on this later yesterday. The problem was that range_for does follow this way to create a function wrapper: https://github.com/taichi-dev/taichi/blob/4e37b9a3d93ac85297b9db2a1b15e16d165d4fa3/taichi/backends/cpu/codegen_cpu.cpp#L41-L51

so we cannot use continue in that case.

I think @yuanming-hu will do some refactoring to the LLVM backend to make this more consistent. Maybe we should wait for that before this fix. Meanwhile, maybe I can rename as_return() to from_kernel() or something, so that each backend impl can handle the top-level continue in their most suited ways.

Thank you guys for all the discussions here! I'll take these into consideration when I refactor the LLVM backend/runtime...

Was this page helpful?
0 / 5 - 0 ratings