Taichi: [Question] What does the Taichi IR look like?

Created on 10 Jul 2020  路  5Comments  路  Source: taichi-dev/taichi

Hi, dear taichi community,

  • Sorry, this issure is not about any features or bugs, but I don't know where to ask this question now. If maintainer says I cannot open this issue, I will delete this issure later.

  • I listened the first conference online on the bilibili yesterday. I know Taichi will convert python ast to Taichi's IR. I am interested with it, also I am interested in the process about how to convert Taichi's IR to LLVM IR. But I don't know does taichi support to dump taichi's IR now? If yes, how to do it? and if not, what does taichi's IR look like?

  • From the code I read, I know taichi use python astor to parse python code, and modify the provided template to get the final executable code. But I don't find the transformation about python ast to Taichi IR.

    Any reply is Ok, thanks.

question

Most helpful comment

so, the kernel wrapped code is Taichi IR?

what's Lowered/DIE/Simplified I(II/III)/Atomics demoted/Access flagged III/Access Lowered/Make thread local stands for?

Are they the Optimization pass in LLVM or Taichi?

Yes and yes :)

If yes, what actions this optimization take?

I'm not super clear about IR optimizations. Things that I can tell:

Atomics demoted

Sometimes variable operations are not necessary to be atomic, this pass detect that and demote x[i] += 1 into x[i] = x[i] + 1.

DIE

This pass remove unused variables, e.g.:

x = 233
x = 4

will be optimized into:

x = 4

Simplified III

This pass combine constant operations, e.g.:

x = 233
y = 4
z = x + y

will be optimized into:

z = 237`1

Let me show you an example:

import taichi as ti
ti.init(print_ir=True)
#ti.core.toggle_advanced_optimization(False)

@ti.kernel
def calc_pi() -> ti.f32:
    term = 1.0
    sum = 0.0
    divisor = 1
    for i in ti.static(range(10)):
        sum += term / divisor
        term *= -1 / 3
        divisor += 2
    return sum * ti.sqrt(12.0)

print(calc_pi())

In the last optimization pass Simplified III, it eliminated all statements, except for one const [3.1415], that is, this pass calculates all the works in compile-time!

All 5 comments

Sorry, this issure is not about any features or bugs, but I don't know where to ask this question now. If maintainer says I cannot open this issue, I will delete this issure later.

Thank for asking this! All kinds of issues are welcomed :)

I listened the first conference online on the bilibili yesterday. I know Taichi will convert python ast to Taichi's IR. I am interested with it, also I am interested in the process about how to convert Taichi's IR to LLVM IR. But I don't know does taichi support to dump taichi's IR now? If yes, how to do it? and if not, what does taichi's IR look like?

From the code I read, I know taichi use python astor to parse python code, and modify the provided template to get the final executable code. But I don't find the transformation about python ast to Taichi IR.

  1. To print transformed Python AST: use ti.init(print_preprocessed=True).
  2. To print Taichi IR: use ti.init(print_ir=True).
  3. To print LLVM IR: use ti.cfg.print_kernel_llvm_ir = True.

e.g.:

import taichi as ti
ti.init(print_ir=True)

@ti.kernel
def p():
    print(42)

p()

From the code I read, I know taichi use python astor to parse python code, and modify the provided template to get the final executable code. But I don't find the transformation about python ast to Taichi IR.

Transformation from Python AST to Taichi IR needs 2 steps.

First, we transform Python AST:

import taichi as ti
ti.init(print_preprocessed=True)

@ti.kernel
def func() -> ti.f32:
    x = 233
    return x

print(func())

into Taichi API invocations:

def func():
  import taichi as ti
  ti.decl_scalar_ret(ti.f32)
  x = ti.expr_init(233)
  ti.core.create_kernel_return(ti.cast(ti.Expr(x), ti.f32).ptr)

Then, these C++ APIs in ti.core will create the corresponding statements in Taichi IR, e.g.:
https://github.com/taichi-dev/taichi/blob/2544d1f3a6740de6f9ff4e56be0f7d73142173eb/taichi/python/export_lang.cpp#L315-L325

import taichi as ti
ti.init(print_ir=True)

@ti.kernel
def p():
    print(42)

p()

Thanks for your reply.

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-8dibpesi
[Taichi] <dev mode>, llvm 8.0.1, commit fe17ca1f, python 3.7.3
[Taichi] Starting on arch=x64
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Initial IR:
kernel {
  $0 = eval 42
  print %0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Lowered:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Typechecked:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Loop Vectorized:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Loop Split:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Simplified I:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Dense struct-for demoted:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Optimized by CFG I:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Access flagged I:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Simplified II:
kernel {
  <i32 x1> $0 = const [42]
  print $0, "\n"
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Offloaded:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Optimized by CFG II:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Access flagged II:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Make thread local:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Access lowered:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] DIE:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Access flagged III:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Atomics demoted:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Optimized by CFG III:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}
[I 07/10/20 19:47:44.751] [compile_to_offloads.cpp:operator()@23] Simplified III:
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [42]
    print $1, "\n"
  }
}

so, the kernel wrapped code is Taichi IR?

Besides, I run examples/waterwave.py by setting print_ir=True in ti.init,
I found that the Initial IR occurs 8 times totally, while the @ti.kernel or @ti.func occurs 7 times in total in waterwave.py.
So, why 8 and 7? not equal?

And

  • what's Lowered/DIE/Simplified I(II/III)/Atomics demoted/Access flagged III/Access Lowered/Make thread local stands for?

  • Are they the Optimization pass in LLVM or Taichi?

  • If yes, what actions this optimization take?

  • And Why some @ti.kernel or @ti.func has just one of it, while another has serveral of them? What determine it?

so, the kernel wrapped code is Taichi IR?

what's Lowered/DIE/Simplified I(II/III)/Atomics demoted/Access flagged III/Access Lowered/Make thread local stands for?

Are they the Optimization pass in LLVM or Taichi?

Yes and yes :)

If yes, what actions this optimization take?

I'm not super clear about IR optimizations. Things that I can tell:

Atomics demoted

Sometimes variable operations are not necessary to be atomic, this pass detect that and demote x[i] += 1 into x[i] = x[i] + 1.

DIE

This pass remove unused variables, e.g.:

x = 233
x = 4

will be optimized into:

x = 4

Simplified III

This pass combine constant operations, e.g.:

x = 233
y = 4
z = x + y

will be optimized into:

z = 237`1

Let me show you an example:

import taichi as ti
ti.init(print_ir=True)
#ti.core.toggle_advanced_optimization(False)

@ti.kernel
def calc_pi() -> ti.f32:
    term = 1.0
    sum = 0.0
    divisor = 1
    for i in ti.static(range(10)):
        sum += term / divisor
        term *= -1 / 3
        divisor += 2
    return sum * ti.sqrt(12.0)

print(calc_pi())

In the last optimization pass Simplified III, it eliminated all statements, except for one const [3.1415], that is, this pass calculates all the works in compile-time!

Thanks for your clear explanation.

Besides, I want to say sorry, because I found what I asked, for certain, is in the document:-)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

archibate picture archibate  路  4Comments

archibate picture archibate  路  3Comments

xumingkuan picture xumingkuan  路  3Comments

yuanming-hu picture yuanming-hu  路  4Comments

zdxpan picture zdxpan  路  3Comments