Taichi: [Opt] Add a strength reduction pass

Created on 10 May 2020  路  11Comments  路  Source: taichi-dev/taichi

Update after #1065:
TODO:

  • [ ] Add operators shl and shr for optimization

- [ ] a % pot (power of two constant) -> a & (pot - 1)

Concisely describe the proposed feature
I would like to add a pass like https://en.wikipedia.org/wiki/Strength_reduction.
A list of peephole optimizations in this pass (done in #1065):

  • a * 2 -> a + a
  • a / const -> a * (1 / const) (floating point only)
  • a ** 2 -> a * a (this should fix https://github.com/taichi-dev/taichi/pull/937#issuecomment-626289471 in a more systematic way)
  • (a ** 1 -> a should be added in the algebraic simplification pass)

(Feel free to add more optimizations to this list)

Describe the solution you'd like (if any)
I would like to place this pass in full_simplify like alg_simp.

Additional comments
Shall we replace <i32x1> a * c where c is a constant power of 2 with a << log2(c) (and also replace <i32x1> a / c with a >> log2(c))? Do we have << and >> operators?

Shall we just write the optimization in the alg_simp pass and add an AlgSimp::alg_is_two function instead of introducing a new pass?

Shall we expose functions AlgSimp::alg_is_one to somewhere so that more passes can use them, and we can split the useless assertion elimination to a new pass?

enhancement welcome contribution

Most helpful comment

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

Right now if I change this https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/examples/sdf_renderer.py#L53 to i & 1 == 1, then the example runs faster by ~4% on Metal... (i.e. 24sps -> 25sps)

All 11 comments

Thanks for proposing this!

Shall we replace <i32x1> a * c where c is a constant power of 2 with a << log2(c) (and also replace <i32x1> a / c with a >> log2(c))? Do we have << and >> operators?

Yes, that would be helpful. But we need to add shl and shr (and their signed versions) as 4 UnaryOpType first. We will also have to implement the code generators for a few backends. This should be a separate issue.

Shall we just write the optimization in the alg_simp pass and add an AlgSimp::alg_is_two function instead of introducing a new pass?

Sounds good. I don't think there's a need to separate two passes.

Shall we expose functions AlgSimp::alg_is_one to somewhere so that more passes can use them, and we can split the useless assertion elimination to a new pass?

All SGTM.

<< and >>

Cool! power of 2 is really common. But we don't have that yet, but we can have that in a sep PR for that purpose. eg. bit_shl, bit_shr.

pow(x, 2)

Sounds better than the current sol, thx! I guess we will move the python-side pow opt all into this c++ pass.

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

Right now if I change this https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/examples/sdf_renderer.py#L53 to i & 1 == 1, then the example runs faster by ~4% on Metal... (i.e. 24sps -> 25sps)

And it's preventing such an optimization.

Let's make it expr_python_mod at some point then.

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

Right now if I change this

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/examples/sdf_renderer.py#L53

to i & 1 == 1, then the example runs faster by ~4% on Metal... (i.e. 24sps -> 25sps)

I wonder why we didn't just use BinaryOpType::mod...

Test case:

ti.init(print_ir=True)

@ti.kernel
def func(i: ti.i32):
    if i % 2 == 1:
        print(i)

func(1)

Result:

kernel {
  $0 = offloaded  {
    <i32 x1> $1 = arg[0]
    <i32 x1> $2 = const [2]
    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4
    <i32 x1> $6 = const [1]
    <i32 x1> $7 = cmp_eq $5 $6
    <i32 x1> $8 = bit_and $6 $7
    $9 : if $8 {
      <i32 x1> $10 = arg[0]
      print i, $10
    }
  }
}
[debug] i = 1

I wonder why we didn't just use BinaryOpType::mod...

BinaryOpType::mod is a C/C++ mod (ti.raw_mod), while the % is python mod (ti.mod).
One day we'll refactor the name to BinaryOpType::raw_mod then.

    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4

Maybe we can just replace $5 with mod $1 $2 here?

    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4

Maybe we can just replace $5 with mod $1 $2 here?

Yeah, but before that maybe we need an IR pattern matcher so that we can quickly locate patterns like these in Taichi IR.

Yeah, but before that maybe we need an IR pattern matcher so that we can quickly locate patterns like these in Taichi IR.

I wonder how an "IR pattern matcher" works? binary_op_simplify does hand-written pattern checking which looks very simple to me, and how will it shorten the code or increase the speed if we use a pattern matcher?

Halide's source files that starts with Simplify_ are what we can learn from. For example:
https://github.com/halide/Halide/blob/666d4cfa690ae877be7d207743e3a501ea9f4625/src/Simplify_Add.cpp#L51

Was this page helpful?
0 / 5 - 0 ratings