Taichi: [Bug] [ir] a simple for loop in function fails

Created on 2 Jun 2020  Â·  6Comments  Â·  Source: taichi-dev/taichi

Describe the bug

import taichi as ti

ti.init(print_ir=True)

m = ti.var(ti.f32, 3)
x = ti.var(ti.f32, ())

@ti.func
def func(a):
  for j in range(1):
    a = a

@ti.kernel
def kern1():
  a = x[None]
  for i in m:
    func(a)

@ti.kernel
def kern2():
  a = x[None]
  for i in m:
    for j in range(1):
      a = a


kern1() # Error
kern2() # OK

Log/Screenshots

[Taichi] mode=release
[Taichi] version 0.6.7, supported archs: [cpu, cuda, opengl], commit ca4d9dda, python 3.8.2
[I 06/02/20 18:33:39.403] [compile_to_offloads.cpp:operator()@21] Initial IR:
kernel {
  $0 = alloca @tmp4
  @tmp4 = gbl load #@tmp2[]
  $2 : for @tmp5 where S2place_f32 active {
    $3 = alloca @tmp7
    @tmp7 = @tmp4
    $5 : for @tmp8 in range((cast_value<int32> 0), (cast_value<int32> 1)) {
      @tmp7 = @tmp7
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Lowered:
kernel {
  $0 = alloca
  <f32 x1> $1 = global ptr [S4place_f32], index [] activate=true
  $2 = global load $1
  $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    $5 = loop $4 index 0
    $6 = alloca
    $7 = local load [ [$0[0]]]
    $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      $14 = loop $13 index 0
      $15 = local load [ [$6[0]]]
      $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Typechecked:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Loop Vectorized:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Loop Split:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Simplified I:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = const [1]
    $11 : for in range($9, $10, step 1) {
      <i32 x1> $12 = loop $11 index 0
      <f32 x1> $13 = local load [ [$6[0]]]
      <f32 x1> $14 : local store [$6 <- $13]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Dense struct-for demoted:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  <i32 x1> $4 = const [0]
  <i32 x1> $5 = const [4]
  $6 : for in range($4, $5, step 1) {
    <i32 x1> $7 = const [0]
    <i32 x1> $8 = loop $6 index 0
    <i32 x1> $9 = const [-1]
    <i32 x1> $10 = bit_extract($8 + 0, 0~2)
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = mul $10 $11
    <i32 x1> $13 = add $7 $12
    <i32 x1> $14 = const [3]
    <i32 x1> $15 = cmp_lt $13 $14
    <i32 x1> $16 = bit_and $9 $15
    <i32 x1> $17 = alloca
    <i32 x1> $18 : local store [$17 <- $13]
    $19 : if $16 {
      <i32 x1> $20 = local load [ [$17[0]]]
      <f32 x1> $21 = alloca
      <f32 x1> $22 = local load [ [$0[0]]]
      <f32 x1> $23 : local store [$21 <- $22]
      <i32 x1> $24 = const [0]
      <i32 x1> $25 = const [1]
      $26 : for in range($24, $25, step 1) {
        <i32 x1> $27 = loop $26 index 0
        <f32 x1> $28 = local load [ [$21[0]]]
        <f32 x1> $29 : local store [$21 <- $28]
      }
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Constant extracted:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32 x1> $6 = alloca
  <f32*x1> $7 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $8 = global load $7
  <f32 x1> $9 : local store [$6 <- $8]
  <i32 x1> $10 = const [0]
  <i32 x1> $11 = const [4]
  $12 : for in range($10, $11, step 1) {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = mul $14 $3
    <i32 x1> $16 = add $5 $15
    <i32 x1> $17 = cmp_lt $16 $2
    <i32 x1> $18 = bit_and $4 $17
    <i32 x1> $19 = alloca
    <i32 x1> $20 : local store [$19 <- $16]
    $21 : if $18 {
      <i32 x1> $22 = local load [ [$19[0]]]
      <f32 x1> $23 = alloca
      <f32 x1> $24 = local load [ [$6[0]]]
      <f32 x1> $25 : local store [$23 <- $24]
      $26 : for in range($1, $0, step 1) {
        <i32 x1> $27 = loop $26 index 0
        <f32 x1> $28 = local load [ [$23[0]]]
        <f32 x1> $29 : local store [$23 <- $28]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Store forwarded:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $7 = global load $6
  <i32 x1> $8 = const [0]
  <i32 x1> $9 = const [4]
  $10 : for in range($8, $9, step 1) {
    <i32 x1> $11 = loop $10 index 0
    <i32 x1> $12 = bit_extract($11 + 0, 0~2)
    <i32 x1> $13 = mul $12 $3
    <i32 x1> $14 = add $5 $13
    <i32 x1> $15 = cmp_lt $14 $2
    <i32 x1> $16 = bit_and $4 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $7]
      $20 : for in range($1, $0, step 1) {
        <i32 x1> $21 = loop $20 index 0
        <f32 x1> $22 = local load [ [$18[0]]]
        <f32 x1> $23 : local store [$18 <- $22]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Access lowered:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $7 = get root
  <i32 x1> $8 = linearized(ind {}, stride {})
  <gen*x1> $9 = [S0root][root]::lookup($7, $8) activate = false
  <gen*x1> $10 = get child [S0root->S3dense] $9
  <i32 x1> $11 = linearized(ind {}, stride {})
  <gen*x1> $12 = [S3dense][dense]::lookup($10, $11) activate = false
  <f32*x1> $13 = get child [S3dense->S4place_f32] $12
  <f32 x1> $14 = shuffle $13[0]
  <f32 x1> $15 = global load $14
  <i32 x1> $16 = const [0]
  <i32 x1> $17 = const [4]
  $18 : for in range($16, $17, step 1) {
    <i32 x1> $19 = loop $18 index 0
    <i32 x1> $20 = bit_extract($19 + 0, 0~2)
    <i32 x1> $21 = mul $20 $3
    <i32 x1> $22 = add $5 $21
    <i32 x1> $23 = cmp_lt $22 $2
    <i32 x1> $24 = bit_and $4 $23
    $25 : if $24 {
      <f32 x1> $26 = alloca
      <f32 x1> $27 : local store [$26 <- $15]
      $28 : for in range($1, $0, step 1) {
        <i32 x1> $29 = loop $28 index 0
        <f32 x1> $30 = local load [ [$26[0]]]
        <f32 x1> $31 : local store [$26 <- $30]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] DIE:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $7 = get root
  <i32 x1> $8 = linearized(ind {}, stride {})
  <gen*x1> $9 = [S0root][root]::lookup($7, $8) activate = false
  <gen*x1> $10 = get child [S0root->S3dense] $9
  <i32 x1> $11 = linearized(ind {}, stride {})
  <gen*x1> $12 = [S3dense][dense]::lookup($10, $11) activate = false
  <f32*x1> $13 = get child [S3dense->S4place_f32] $12
  <f32 x1> $14 = shuffle $13[0]
  <f32 x1> $15 = global load $14
  <i32 x1> $16 = const [0]
  <i32 x1> $17 = const [4]
  $18 : for in range($16, $17, step 1) {
    <i32 x1> $19 = loop $18 index 0
    <i32 x1> $20 = bit_extract($19 + 0, 0~2)
    <i32 x1> $21 = mul $20 $3
    <i32 x1> $22 = add $5 $21
    <i32 x1> $23 = cmp_lt $22 $2
    <i32 x1> $24 = bit_and $4 $23
    $25 : if $24 {
      <f32 x1> $26 = alloca
      <f32 x1> $27 : local store [$26 <- $15]
      $28 : for in range($1, $0, step 1) {
        <f32 x1> $29 = local load [ [$26[0]]]
        <f32 x1> $30 : local store [$26 <- $29]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Simplified II:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Access flagged:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=false
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Constant folded:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=false
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.406] [compile_to_offloads.cpp:operator()@21] Offloaded:
kernel {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $5 = get root
    <gen*x1> $6 = [S0root][root]::lookup($5, $2) activate = false
    <gen*x1> $7 = get child [S0root->S3dense] $6
    <gen*x1> $8 = [S3dense][dense]::lookup($7, $2) activate = false
    <f32*x1> $9 = get child [S3dense->S4place_f32] $8
    <f32 x1> $10 = global load $9
    <i32 x1> $11 = const [4]
  }
  $12 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = const [3]
    <i32 x1> $16 = cmp_lt $14 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $10]
      <i32 x1> $20 = const [0]
      <i32 x1> $21 = const [1]
      $22 : for in range($20, $21, step 1) {
        <f32 x1> $23 = local load [ [$18[0]]]
        <f32 x1> $24 : local store [$18 <- $23]
      }
    }
  }
}
[E 06/02/20 18:33:39.406] [verify.cpp:basic_verify@39] stmt 19 cannot have operand 10.


***********************************
* Taichi Compiler Stack Traceback *                                                          
***********************************                                                          
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)                                                                                   
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::basic_verify(taichi::lang::Stmt*)                                              
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::LocalStoreStmt*)                                           
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::IfStmt*)                                             
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::irpass::analysis::verify(taichi::lang::IRNode*)                                            
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::irpass::compile_to_offloads(taichi::lang::IRNode*, taichi::lang::CompileConfig const&, bool, bool, bool, bool, bool)                                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::lower(bool)                                                                        
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::compile(taichi::lang::Kernel&)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::compile()                                                                          
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::operator()()                                                                       
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x6ec234) [0x7f989ebd6234]                                                                               
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x655d50) [0x7f989eb3fd50]                                                                               
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call                                               
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall                                           
/usr/lib/libpython3.8.so.1.0(+0xfeb1d) [0x7f98aeaaeb1d]                                      
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0(+0xb121b) [0x7f98aea6121b]                                      
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall                                           
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict                                         
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend                                         
/usr/lib/libpython3.8.so.1.0(+0x23d0e9) [0x7f98aebed0e9]                                     
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode                                                
/usr/lib/libpython3.8.so.1.0(+0x2668c8) [0x7f98aec168c8]
/usr/lib/libpython3.8.so.1.0(+0x26aba3) [0x7f98aec1aba3]
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
python(_start+0x2e) [0x55a034ce805e]
Traceback (most recent call last):
  File "bug.py", line 27, in <module>
    kern1()
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 533, in wrapped
    return primal(*args, **kwargs)
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 464, in __call__
    return self.compiled_functions[key](*args)
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 428, in func__
    t_kernel()
RuntimeError: [verify.cpp:basic_verify@39] stmt 19 cannot have operand 10.

To Reproduce
Run the above code.

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

ir potential bug welcome contribution

All 6 comments

If it's my syntax error, can we improve the error message? I can't do my homework now.

@xumingkuan Seems function arguments can't be gtmp?

kernel "kern1" {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $5 = get root
    <gen*x1> $6 = [S0root][root]::lookup($5, $2) activate = false
    <gen*x1> $7 = get child [S0root->S3dense] $6
    <gen*x1> $8 = [S3dense][dense]::lookup($7, $2) activate = false
    <f32*x1> $9 = get child [S3dense->S4place_f32] $8
    <f32 x1> $10 = global load $9
    <i32 x1> $11 = const [4]
  }
  $12 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = const [3]
    <i32 x1> $16 = cmp_lt $14 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $10]
      <i32 x1> $20 = const [0]
      <i32 x1> $21 = const [1]
      $22 : for in range($20, $21, step 1) {
        <f32 x1> $23 = local load [ [$18[0]]]
        <f32 x1> $24 : local store [$18 <- $23]
      }
    }
  }
}

kernel "kern2" {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global tmp var (offset = 0 B)
    <f32 x1> $5 = const [0.0]
    <f32*x1> $6 : global store [$4 <- $5]
    <f32*x1> $7 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $8 = get root
    <gen*x1> $9 = [S0root][root]::lookup($8, $2) activate = false
    <gen*x1> $10 = get child [S0root->S3dense] $9
    <gen*x1> $11 = [S3dense][dense]::lookup($10, $2) activate = false
    <f32*x1> $12 = get child [S3dense->S4place_f32] $11
    <f32 x1> $13 = global load $12
    <f32*x1> $14 = global tmp var (offset = 0 B)
    <f32*x1> $15 : global store [$14 <- $13]
    <i32 x1> $16 = const [4]
  }
  $17 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $18 = loop $17 index 0
    <i32 x1> $19 = bit_extract($18 + 0, 0~2)
    <i32 x1> $20 = const [3]
    <i32 x1> $21 = cmp_lt $19 $20
    $22 : if $21 {
      <i32 x1> $23 = const [0]
      <i32 x1> $24 = const [1]
      $25 : for in range($23, $24, step 1) {
        <f32*x1> $26 = global tmp var (offset = 0 B)
        <f32 x1> $27 = global load $26
        <f32*x1> $28 = global tmp var (offset = 0 B)
        <f32*x1> $29 : global store [$28 <- $27]
      }
    }
  }
}

new mrp (minimal-reproduceable):

import taichi as ti

ti.init(print_ir=True, print_preprocessed=True)

m = ti.var(ti.f32, 3)
x = ti.var(ti.f32, ())

@ti.kernel
def kern():
  a = x[None]
  for i in m:
    b = a
    for j in range(1):
      b = b

kern()

Could we have some unit test for Store Forwarded? It caused the error.

Thanks for proposing this! Looks like a bug in offload. Will investigate later.

(base) ➜  ~ python  difftaichi/examples/billiards.py
[Taichi] mode=release
[Taichi] version 0.6.11, supported archs: [cpu, metal], commit 762aca58, python 3.7.3
difftaichi/examples/billiards.py:42: PendingDeprecationWarning: @ti.layout will be deprecated in the future, use ti.root directly to specify data layout anytime before the data structure materializes.
  @ti.layout
[E 06/18/20 17:28:09.242] [verify.cpp:basic_verify@39] stmt 5249 cannot have operand 4663.



                            * Taichi Core - Stack Traceback *
==========================================================================================
|                       Module |  Offset | Function                                      |
|----------------------------------------------------------------------------------------|
*               taichi_core.so |     110 | taichi::Logger::error(std::__1::basic_string< |
                                         | char, std::__1::char_traits<char>, std::__1:: |
                                         | allocator<char> > const&, bool)               |
*               taichi_core.so |    1136 | taichi::lang::IRVerifier::basic_verify(taichi |
                                         | ::lang::Stmt*)                                |
*               taichi_core.so |      25 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :LocalLoadStmt*)                              |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |      42 | taichi::lang::BasicStmtVisitor::visit(taichi: |
                                         | :lang::IfStmt*)                               |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     111 | taichi::lang::irpass::analysis::verify(taichi |
                                         | ::lang::IRNode*)                              |
*               taichi_core.so |    1007 | taichi::lang::irpass::compile_to_offloads(tai |
                                         | chi::lang::IRNode*, taichi::lang::CompileConf |
                                         | ig const&, bool, bool, bool, bool, bool)      |
*               taichi_core.so |     280 | taichi::lang::Kernel::lower(bool)             |
*               taichi_core.so |     211 | taichi::lang::Program::compile(taichi::lang:: |
                                         | Kernel&)                                      |
*               taichi_core.so |      62 | taichi::lang::Kernel::compile()               |
*               taichi_core.so |     132 | taichi::lang::Kernel::operator()()            |
*               taichi_core.so |     103 | void pybind11::cpp_function::initialize<taich |
                                         | i::export_lang(pybind11::module&)::$_9, void, |
                                         |  taichi::lang::Kernel*, pybind11::name, pybin |
                                         | d11::is_method, pybind11::sibling>(taichi::ex |
                                         | port_lang(pybind11::module&)::$_9&&, void (*) |
                                         | (taichi::lang::Kernel*), pybind11::name const |
                                         | &, pybind11::is_method const&, pybind11::sibl |
                                         | ing const&)::'lambda'(pybind11::detail::funct |
                                         | ion_call&)::__invoke(pybind11::detail::functi |
                                         | on_call&)                                     |
*               taichi_core.so |    4075 | pybind11::cpp_function::dispatcher(_object*,  |
                                         | _object*, _object*)                           |
*                       python |     437 | (null)                                        |
*                       python |     111 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     370 | (null)                                        |
*                       python |     179 | (null)                                        |
*                       python |     453 | (null)                                        |
*                       python |   46151 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     231 | (null)                                        |
*                       python |   46712 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     231 | (null)                                        |
*                       python |     189 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |   46712 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     183 | (null)                                        |
*                       python |   45942 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     245 | (null)                                        |
*                       python |   44976 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     183 | (null)                                        |
*                       python |   46151 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     256 | (null)                                        |
*                       python |     391 | (null)                                        |
*                       python |    9663 | (null)                                        |
*                       python |     125 | (null)                                        |
*                libdyld.dylib |       1 | (null)                                        |
*                          ??? |       2 | (null)                                        |
==========================================================================================


Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
  File "difftaichi/examples/billiards.py", line 216, in <module>
    optimize()
  File "difftaichi/examples/billiards.py", line 173, in optimize
    forward(visualize=True, output=output)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/tape.py", line 18, in __exit__
    self.grad()
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/tape.py", line 27, in grad
    func.grad(*args)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/kernel.py", line 459, in __call__
    return self.compiled_functions[key](*args)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/kernel.py", line 423, in func__
    t_kernel()
RuntimeError: [verify.cpp:basic_verify@39] stmt 5249 cannot have operand 4663.
Was this page helpful?
0 / 5 - 0 ratings