Tvm: [TEXPR][PASS] Loop distribution pass generates incorrect code

Created on 26 Mar 2019  路  6Comments  路  Source: apache/tvm

With the following input DSL:

import tvm
m = 48
A = tvm.placeholder((m,), name='A', dtype="float16")
C = tvm.compute((m,), lambda i: A[i], name='C')
D = tvm.compute((m,), lambda i: C[i], name='D')

s = tvm.create_schedule(D.op)
# We split the two axis with factors where neither counts the other
co, ci = s[C].split(C.op.axis[0], 10)
do, di = s[D].split(D.op.axis[0], 32)
s[C].compute_at(s[D], do)

bounds = tvm.schedule.InferBound(s)
stmt = tvm.schedule.ScheduleOps(s, bounds)
stmt = tvm.ir_pass.CanonicalSimplify(stmt)
print(stmt)

stmt = tvm.ir_pass.LoopPartition(stmt, True)
stmt = tvm.ir_pass.CanonicalSimplify(stmt)
print(stmt)

The following is the output of first print statement. Code is correct.

// attr [compute(D, 0x1e4d7e0)] realize_scope = ""
realize D([0, 48]) {
  produce D {
    for (i.outer, 0, 2) {
      // attr [compute(C, 0x195b3a0)] realize_scope = ""
      realize C([(i.outer*32), 32]) {
        produce C {
          for (i.outer, 0, 4) {
            for (i.inner, 0, 10) {
              if (likely((((i.outer*10) + i.inner) < 32))) {
                if (likely(((((i.outer*32) + (i.outer*10)) + i.inner) < 48))) {
                  C((((i.outer*32) + (i.outer*10)) + i.inner)) =A((((i.outer*32) + (i.outer*10)) + i.inner))
                }
              }
            }
          }
        }
        for (i.inner, 0, 32) {
          if (likely((((i.outer*32) + i.inner) < 48))) {
            if (likely((((i.outer*32) + i.inner) < 48))) {
              D(((i.outer*32) + i.inner)) =C(((i.outer*32) + i.inner))
            }
          }
        }
      }
    }
  }
}

The following is the output of the second print statement. Code is incorrect in that in the second produce C only 10 (instead of 16) elements of A are copied to C.

// attr [compute(D, 0x1e4d7e0)] realize_scope = ""
realize D([0, 48]) {
  produce D {
    for (i.outer, 0, 1) {
      // attr [compute(C, 0x195b3a0)] realize_scope = ""
      realize C([(i.outer*32), 32]) {
        produce C {
          for (i.outer, 0, 3) {
            for (i.inner, 0, 10) {
              C((((i.outer*32) + (i.outer*10)) + i.inner)) =A((((i.outer*32) + (i.outer*10)) + i.inner))
            }
          }
          for (i.inner, 0, 2) {
            C(((30 + (i.outer*32)) + i.inner)) =A(((30 + (i.outer*32)) + i.inner))
          }
        }
        for (i.inner, 0, 32) {
          D(((i.outer*32) + i.inner)) =C(((i.outer*32) + i.inner))
        }
      }
    }
    // attr [compute(C, 0x195b3a0)] realize_scope = ""
    realize C([32, 32]) {
      produce C {
        for (i.outer, 0, 1) {
          for (i.inner, 0, 10) {
            C(((32 + (i.outer*10)) + i.inner)) =A(((32 + (i.outer*10)) + i.inner))
          }
        }
      }
      for (i.inner, 0, 16) {
        D((32 + i.inner)) =C((32 + i.inner))
      }
    }
  }
}
help wanted bug

Most helpful comment

I'm working on a fix now. In addition to this problem, I'm trying to fix general problems that I have found in the code as well. I will submit a PR as soon as I have it ready (likely in the next day or two).

To answer your question, I don't know whether this has been caused by recent changes in loop_partition.cc or not.

All 6 comments

Thanks for reporting this, we should fix this asap. Let us see if it is caused by some of the recent changes https://github.com/dmlc/tvm/commits/master/src/pass/loop_partition.cc or was just something we overlooked.

I'm working on a fix now. In addition to this problem, I'm trying to fix general problems that I have found in the code as well. I will submit a PR as soon as I have it ready (likely in the next day or two).

To answer your question, I don't know whether this has been caused by recent changes in loop_partition.cc or not.

Please also check if it has something to do with the fact that high-level produce consume pattern is involved, i.e. if we can reproduce the same error using tvm.lower

The bug is also reproducible using tvm.lower. Setting the input DSL to:

import tvm
m = 48
A = tvm.placeholder((m,), name='A', dtype="float16")
C = tvm.compute((m,), lambda i: A[i], name='C')
D = tvm.compute((m,), lambda i: C[i], name='D')

s = tvm.create_schedule(D.op)
co, ci = s[C].split(C.op.axis[0], factor=10)
do, di = s[D].split(D.op.axis[0], 32)
s[C].compute_at(s[D], do)

with tvm.build_config(partition_const_loop=True):
    print(tvm.lower(s, [A, C, D], name="fadd1", simple_mode=False).body);

generates the following output, which has the same problem explained above:

...
// attr [0] compute_scope = "fadd1_compute_"
produce D {
  for (i.outer, 0, 1) {
    produce C {
      for (i.outer, 0, 3) {
        for (i.inner, 0, 10) {
          C[(((i.outer*32) + (i.outer*10)) + i.inner)] = A[(((i.outer*32) + (i.outer*10)) + i.inner)]
        }
      }
      for (i.inner, 0, 2) {
        C[(((i.outer*32) + i.inner) + 30)] = A[(((i.outer*32) + i.inner) + 30)]
      }
    }
    for (i.inner, 0, 32) {
      D[((i.outer*32) + i.inner)] = C[((i.outer*32) + i.inner)]
    }
  }
  produce C {
    for (i.outer, 0, 1) {
      for (i.inner, 0, 10) {
        C[(((i.outer*10) + i.inner) + 32)] = A[(((i.outer*10) + i.inner) + 32)]
      }
    }
  }
  for (i.inner, 0, 16) {
    D[(i.inner + 32)] = C[(i.inner + 32)]
  }
}

This bug will not be fixed only by the change in Halide IR. It also requires quite a bit of code in TVM (I am almost ready to send a PR for that part). So should we reopen this issue?

it was automatically closed by github association rule :)

Was this page helpful?
0 / 5 - 0 ratings