The output for the following code should be 1, 2, 3 for each line. However, it occurs 0, 0, 0.
It seems that this only happens after I update v0.6.18.
import taichi as ti
ti.init(arch=ti.cpu)
real = ti.f32
mat = ti.var(real, shape=(3, 16))
@ti.kernel
def do_something():
for r in range(4):
c = r * 4
for i in ti.static(range(4)):
mat[0, c], mat[1, c], mat[2, c] = 1.0, 2.0, 3.0
c += 1
for c in range(16):
print(mat[0, c], mat[1, c], mat[2, c])
if __name__ == "__main__":
do_something()
mrp:
import taichi as ti
ti.init(arch=ti.cpu, advanced_optimization=False, print_ir=True)
x = ti.var(ti.i32, shape=4)
@ti.kernel
def func():
c = 0
for i in ti.static(range(4)):
x[c] = 1
c += 1
func()
print(x.to_numpy())
ir:
[I 07/12/20 10:38:27.916] [compile_to_offloads.cpp:operator()@24] Offloaded:
kernel {
$0 = offloaded
body {
<i32 x1> $1 = alloca
<i32 x1> $2 = const [0]
<i32 x1> $3 : local store [$1 <- $2]
<i32 x1> $4 = const [1]
<i32 x1> $5 = local load [ [$1[0]]]
<i32*x1> $6 = global ptr [S2place_i32], index [$5] activate=true
<i32*x1> $7 : global store [$6 <- $4] # <--- 1
<i32 x1> $8 = alloca
<i32 x1> $9 = atomic add($1, $4)
<i32 x1> $10 : local store [$8 <- $9]
<i32 x1> $11 = local load [ [$1[0]]]
<i32*x1> $12 = global ptr [S2place_i32], index [$11] activate=true
<i32*x1> $13 : global store [$12 <- $4] # <--- 2
<i32 x1> $14 = alloca
<i32 x1> $15 = atomic add($1, $4)
<i32 x1> $16 : local store [$14 <- $15]
<i32 x1> $17 = local load [ [$1[0]]]
<i32*x1> $18 = global ptr [S2place_i32], index [$17] activate=true
<i32*x1> $19 : global store [$18 <- $4] # <--- 3
<i32 x1> $20 = alloca
<i32 x1> $21 = atomic add($1, $4)
<i32 x1> $22 : local store [$20 <- $21]
<i32 x1> $23 = local load [ [$1[0]]]
<i32*x1> $24 = global ptr [S2place_i32], index [$23] activate=true
<i32*x1> $25 : global store [$24 <- $4] # <--- 4
<i32 x1> $26 = alloca
<i32 x1> $27 = atomic add($1, $4)
<i32 x1> $28 : local store [$26 <- $27]
}
}
[I 07/12/20 10:38:27.917] [compile_to_offloads.cpp:operator()@24] Optimized by CFG:
kernel {
$0 = offloaded
body {
<i32 x1> $1 = alloca
<i32 x1> $2 = const [0]
<i32 x1> $3 : local store [$1 <- $2]
<i32 x1> $4 = const [1]
<i32*x1> $5 = global ptr [S2place_i32], index [$2] activate=true
<i32*x1> $6 : global store [$5 <- $4] # <--- 1
<i32 x1> $7 = atomic add($1, $4)
<i32 x1> $8 = local load [ [$1[0]]]
<i32*x1> $9 = global ptr [S2place_i32], index [$8] activate=true
<i32 x1> $10 = atomic add($1, $4)
<i32 x1> $11 = local load [ [$1[0]]]
<i32*x1> $12 = global ptr [S2place_i32], index [$11] activate=true
<i32 x1> $13 = atomic add($1, $4)
<i32 x1> $14 = local load [ [$1[0]]]
<i32*x1> $15 = global ptr [S2place_i32], index [$14] activate=true
<i32*x1> $16 : global store [$15 <- $4] # <--- 4
}
}
@xumingkuan
I see. A systematic solution will be implementing value_diff and making use of it to improve alias_analysis, but it will take a lot of time. I'll write a hotfix for now.
Btw I thought CFG was in advanced_optimization and I did turn it off?
Btw I thought CFG was in
advanced_optimizationand I did turn it off?
Oh, we have a CFG optimization pass even when advanced_optimization=False...
Let's design an optimization level later, if there are 3 levels, which level do you think CFG is?
Let's design an optimization level later, if there are 3 levels, which level do you think CFG is?
I don't know... Currently, as addressed at https://github.com/taichi-dev/taichi/pull/1470#issue-447815733, it's non-trivial (and even probably ill-defined) to implement something like optimization_level=0. And we're still doing some refactoring on the IR. Maybe we should design optimization levels when our IR becomes more mature.
Most helpful comment
mrp:
ir:
@xumingkuan