Taichi: [Opt] Add a strength reduction pass

Created on 10 May 2020 · 11Comments · Source: taichi-dev/taichi

Update after #1065:
TODO:

[ ] Add operators shl and shr for optimization

- [ ] `a % pot (power of two constant) -> a & (pot - 1)`

Concisely describe the proposed feature
I would like to add a pass like https://en.wikipedia.org/wiki/Strength_reduction.
A list of peephole optimizations in this pass (done in #1065):

a * 2 -> a + a
a / const -> a * (1 / const) (floating point only)
a ** 2 -> a * a (this should fix https://github.com/taichi-dev/taichi/pull/937#issuecomment-626289471 in a more systematic way)
(a ** 1 -> a should be added in the algebraic simplification pass)

(Feel free to add more optimizations to this list)

Describe the solution you'd like (if any)
I would like to place this pass in full_simplify like alg_simp.

Additional comments
Shall we replace <i32x1> a * c where c is a constant power of 2 with a << log2(c) (and also replace <i32x1> a / c with a >> log2(c))? Do we have << and >> operators?

Shall we just write the optimization in the alg_simp pass and add an AlgSimp::alg_is_two function instead of introducing a new pass?

Shall we expose functions AlgSimp::alg_is_one to somewhere so that more passes can use them, and we can split the useless assertion elimination to a new pass?

enhancement welcome contribution

Source

xumingkuan

👍3

Most helpful comment

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

Right now if I change this https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/examples/sdf_renderer.py#L53 to i & 1 == 1, then the example runs faster by ~4% on Metal... (i.e. 24sps -> 25sps)

k-ye on 11 May 2020

👍4

All 11 comments

Thanks for proposing this!

Shall we replace <i32x1> a * c where c is a constant power of 2 with a << log2(c) (and also replace <i32x1> a / c with a >> log2(c))? Do we have << and >> operators?

Yes, that would be helpful. But we need to add shl and shr (and their signed versions) as 4 UnaryOpType first. We will also have to implement the code generators for a few backends. This should be a separate issue.

Shall we just write the optimization in the alg_simp pass and add an AlgSimp::alg_is_two function instead of introducing a new pass?

Sounds good. I don't think there's a need to separate two passes.

Shall we expose functions AlgSimp::alg_is_one to somewhere so that more passes can use them, and we can split the useless assertion elimination to a new pass?

All SGTM.

yuanming-hu on 11 May 2020

<< and >>

Cool! power of 2 is really common. But we don't have that yet, but we can have that in a sep PR for that purpose. eg. bit_shl, bit_shr.

pow(x, 2)

Sounds better than the current sol, thx! I guess we will move the python-side pow opt all into this c++ pass.

archibate on 11 May 2020

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

k-ye on 11 May 2020

👍4

And it's preventing such an optimization.

Let's make it expr_python_mod at some point then.

archibate on 11 May 2020

Great! Could you also optimize a % pot (power of two constant) -> a & (pot - 1)?

Currently, % is translated to:

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/python/taichi/lang/expr.py#L91-L93

And it's preventing such an optimization.

Right now if I change this

https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/examples/sdf_renderer.py#L53

to i & 1 == 1, then the example runs faster by ~4% on Metal... (i.e. 24sps -> 25sps)

I wonder why we didn't just use BinaryOpType::mod...

xumingkuan on 11 May 2020

Test case:

ti.init(print_ir=True)

@ti.kernel
def func(i: ti.i32):
    if i % 2 == 1:
        print(i)

func(1)

Result:

kernel {
  $0 = offloaded  {
    <i32 x1> $1 = arg[0]
    <i32 x1> $2 = const [2]
    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4
    <i32 x1> $6 = const [1]
    <i32 x1> $7 = cmp_eq $5 $6
    <i32 x1> $8 = bit_and $6 $7
    $9 : if $8 {
      <i32 x1> $10 = arg[0]
      print i, $10
    }
  }
}
[debug] i = 1

xumingkuan on 11 May 2020

I wonder why we didn't just use BinaryOpType::mod...

BinaryOpType::mod is a C/C++ mod (ti.raw_mod), while the % is python mod (ti.mod).
One day we'll refactor the name to BinaryOpType::raw_mod then.

archibate on 12 May 2020

👍1

    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4

Maybe we can just replace $5 with mod $1 $2 here?

xumingkuan on 5 Jul 2020

👍1

    <i32 x1> $3 = floordiv $1 $2
    <i32 x1> $4 = mul $2 $3
    <i32 x1> $5 = sub $1 $4
Maybe we can just replace $5 with mod $1 $2 here?

Yeah, but before that maybe we need an IR pattern matcher so that we can quickly locate patterns like these in Taichi IR.

yuanming-hu on 5 Jul 2020

Yeah, but before that maybe we need an IR pattern matcher so that we can quickly locate patterns like these in Taichi IR.

I wonder how an "IR pattern matcher" works? binary_op_simplify does hand-written pattern checking which looks very simple to me, and how will it shorten the code or increase the speed if we use a pattern matcher?

xumingkuan on 5 Jul 2020

Halide's source files that starts with Simplify_ are what we can learn from. For example:
https://github.com/halide/Halide/blob/666d4cfa690ae877be7d207743e3a501ea9f4625/src/Simplify_Add.cpp#L51

yuanming-hu on 5 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Bug]"ti.sin()" doesn't work with a local variable.

Xayahp · 3Comments

Upgrade the Constant Folding pass

xumingkuan · 3Comments

[Lang] tensor as local temporary variable / do we have tensor slice support?

archibate · 4Comments

NAN check and throw exception on debug mode

g1n0st · 3Comments

module 'taichi' has no attribute 'gpu'

GeoffreyPlitt · 4Comments

Taichi: [Opt] Add a strength reduction pass

- [ ] a % pot (power of two constant) -> a & (pot - 1)

Most helpful comment

All 11 comments

Related issues

- [ ] `a % pot (power of two constant) -> a & (pot - 1)`