Tvm: [TEXPR][PASS] Loop distribution pass generates incorrect code

Created on 26 Mar 2019 · 6Comments · Source: apache/tvm

With the following input DSL:

import tvm
m = 48
A = tvm.placeholder((m,), name='A', dtype="float16")
C = tvm.compute((m,), lambda i: A[i], name='C')
D = tvm.compute((m,), lambda i: C[i], name='D')

s = tvm.create_schedule(D.op)
# We split the two axis with factors where neither counts the other
co, ci = s[C].split(C.op.axis[0], 10)
do, di = s[D].split(D.op.axis[0], 32)
s[C].compute_at(s[D], do)

bounds = tvm.schedule.InferBound(s)
stmt = tvm.schedule.ScheduleOps(s, bounds)
stmt = tvm.ir_pass.CanonicalSimplify(stmt)
print(stmt)

stmt = tvm.ir_pass.LoopPartition(stmt, True)
stmt = tvm.ir_pass.CanonicalSimplify(stmt)
print(stmt)

The following is the output of first print statement. Code is correct.

// attr [compute(D, 0x1e4d7e0)] realize_scope = ""
realize D([0, 48]) {
  produce D {
    for (i.outer, 0, 2) {
      // attr [compute(C, 0x195b3a0)] realize_scope = ""
      realize C([(i.outer*32), 32]) {
        produce C {
          for (i.outer, 0, 4) {
            for (i.inner, 0, 10) {
              if (likely((((i.outer*10) + i.inner) < 32))) {
                if (likely(((((i.outer*32) + (i.outer*10)) + i.inner) < 48))) {
                  C((((i.outer*32) + (i.outer*10)) + i.inner)) =A((((i.outer*32) + (i.outer*10)) + i.inner))
                }
              }
            }
          }
        }
        for (i.inner, 0, 32) {
          if (likely((((i.outer*32) + i.inner) < 48))) {
            if (likely((((i.outer*32) + i.inner) < 48))) {
              D(((i.outer*32) + i.inner)) =C(((i.outer*32) + i.inner))
            }
          }
        }
      }
    }
  }
}

The following is the output of the second print statement. Code is incorrect in that in the second produce C only 10 (instead of 16) elements of A are copied to C.

// attr [compute(D, 0x1e4d7e0)] realize_scope = ""
realize D([0, 48]) {
  produce D {
    for (i.outer, 0, 1) {
      // attr [compute(C, 0x195b3a0)] realize_scope = ""
      realize C([(i.outer*32), 32]) {
        produce C {
          for (i.outer, 0, 3) {
            for (i.inner, 0, 10) {
              C((((i.outer*32) + (i.outer*10)) + i.inner)) =A((((i.outer*32) + (i.outer*10)) + i.inner))
            }
          }
          for (i.inner, 0, 2) {
            C(((30 + (i.outer*32)) + i.inner)) =A(((30 + (i.outer*32)) + i.inner))
          }
        }
        for (i.inner, 0, 32) {
          D(((i.outer*32) + i.inner)) =C(((i.outer*32) + i.inner))
        }
      }
    }
    // attr [compute(C, 0x195b3a0)] realize_scope = ""
    realize C([32, 32]) {
      produce C {
        for (i.outer, 0, 1) {
          for (i.inner, 0, 10) {
            C(((32 + (i.outer*10)) + i.inner)) =A(((32 + (i.outer*10)) + i.inner))
          }
        }
      }
      for (i.inner, 0, 16) {
        D((32 + i.inner)) =C((32 + i.inner))
      }
    }
  }
}

help wanted bug

Source

derisavi

Most helpful comment

I'm working on a fix now. In addition to this problem, I'm trying to fix general problems that I have found in the code as well. I will submit a PR as soon as I have it ready (likely in the next day or two).

To answer your question, I don't know whether this has been caused by recent changes in loop_partition.cc or not.

derisavi on 26 Mar 2019

👍2

All 6 comments

Thanks for reporting this, we should fix this asap. Let us see if it is caused by some of the recent changes https://github.com/dmlc/tvm/commits/master/src/pass/loop_partition.cc or was just something we overlooked.

tqchen on 26 Mar 2019

To answer your question, I don't know whether this has been caused by recent changes in loop_partition.cc or not.

derisavi on 26 Mar 2019

👍2

Please also check if it has something to do with the fact that high-level produce consume pattern is involved, i.e. if we can reproduce the same error using tvm.lower

tqchen on 26 Mar 2019

The bug is also reproducible using tvm.lower. Setting the input DSL to:

import tvm
m = 48
A = tvm.placeholder((m,), name='A', dtype="float16")
C = tvm.compute((m,), lambda i: A[i], name='C')
D = tvm.compute((m,), lambda i: C[i], name='D')

s = tvm.create_schedule(D.op)
co, ci = s[C].split(C.op.axis[0], factor=10)
do, di = s[D].split(D.op.axis[0], 32)
s[C].compute_at(s[D], do)

with tvm.build_config(partition_const_loop=True):
    print(tvm.lower(s, [A, C, D], name="fadd1", simple_mode=False).body);

generates the following output, which has the same problem explained above:

...
// attr [0] compute_scope = "fadd1_compute_"
produce D {
  for (i.outer, 0, 1) {
    produce C {
      for (i.outer, 0, 3) {
        for (i.inner, 0, 10) {
          C[(((i.outer*32) + (i.outer*10)) + i.inner)] = A[(((i.outer*32) + (i.outer*10)) + i.inner)]
        }
      }
      for (i.inner, 0, 2) {
        C[(((i.outer*32) + i.inner) + 30)] = A[(((i.outer*32) + i.inner) + 30)]
      }
    }
    for (i.inner, 0, 32) {
      D[((i.outer*32) + i.inner)] = C[((i.outer*32) + i.inner)]
    }
  }
  produce C {
    for (i.outer, 0, 1) {
      for (i.inner, 0, 10) {
        C[(((i.outer*10) + i.inner) + 32)] = A[(((i.outer*10) + i.inner) + 32)]
      }
    }
  }
  for (i.inner, 0, 16) {
    D[(i.inner + 32)] = C[(i.inner + 32)]
  }
}

derisavi on 1 Apr 2019

This bug will not be fixed only by the change in Halide IR. It also requires quite a bit of code in TVM (I am almost ready to send a PR for that part). So should we reopen this issue?

derisavi on 3 Apr 2019

it was automatically closed by github association rule :)

tqchen on 3 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Cross-Compilation error: error adding symbols: File in wrong format

ysh329 · 6Comments

[WINDOWS][AutoTVM] OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted and OSError: [WinError 10049] The requested address is not valid in its context

Coderx7 · 5Comments

[RFC][Relay] Pass API Discussion

MarisaKirisame · 5Comments

[RPC] Make RPC fork safe on macOS?

eqy · 3Comments

[RFC][Relay] Dynamic Dimensions

jroesch · 6Comments