Taichi: [Lang] [IR] Coordinate offsets and negative coordinates

Created on 10 May 2020 · 18Comments · Source: taichi-dev/taichi

Concisely describe the proposed feature

Currently, all Taichi tensor bounds start from 0. For example,

a = ti.var(dt=ti.i32)
ti.dense(ti.ij, 256).place(a)

gives you a tensor of shape [0, 256) x [0, 256). In some cases we want a (-64, 128) offset to the coordinates so that we can have shape [-64, 192) x [128, 384).

Use cases

One typical use case is grids in physical simulation. Currently it's easy to allocate sparse voxel grids of bounds [0, 4096) x [0, 4096) x [0, 4096). 4096 is a big number and will almost never be used up. This means the domain can automatically extend along X+, Y+, and Z+.

With (-2048, -2048, -2048) coordinate offset, we will end up with domain [-2048, 2048) x [-2048, 2048) x [-2048, 2048). This means we can extend along X-, Y- and Z- as well.

Describe the solution you'd like (if any)

ti.root.dense(ti.ijk, 256).place(a, b, c, offset=(-128, -128, -128))

Then modify a few IR passes. I'll take care of this since I've been thinking about this feature for a couple of weeks...

Steps

[x] Add index_offsets to place SNodes
[x] Implement struct for with offsets
[x] Support handy tensor definitions with offsets (e.g. ti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45))
[x] Fix bound checks with coordinate offsets
[ ] Add corresponding optimizations
[x] Add doc

GAMES201 feature request welcome contribution

Source

yuanming-hu

👍3

Most helpful comment

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

Sorry about the delayed reply. Let's say you have a tensor

ti.var(dt=ti.f32, shape=(64), offset=(-32))

and kernel

@ti.kernel
def fill():
    for i in a:
        a[i] = 0

During lower_ast, you actually generate indices in range [0, 64) and add a -32 offset to it so that you get [-32, 32). However, later when you access the tensor a, you add a +32 offset to it. This gives you instructions like i - 32 + 32. I think currently Taichi can't optimize that into i, but it would be very helpful to optimize this case.

@xumingkuan and @archibate may have some insight on the best way to do optimization.

yuanming-hu on 10 Jun 2020

👍2

All 18 comments

Well, I don't know if I'm getting this correctly...But the following code

x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])

will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Luohaothu on 18 May 2020

Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])
will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Hi Luohao, please see https://github.com/taichi-dev/taichi/issues/886 https://github.com/taichi-dev/taichi/pull/887 short answer: it's broken for SNode accessors for now.

yuanming-hu on 18 May 2020

Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])
will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?
Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.

Thanks for your reply. Actually I found this in the example laplace.py for there are some reads of x out of the boundary. Maybe you can also fix that later.

Luohaothu on 18 May 2020

Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])
will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?
Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.
Thanks for your reply. Actually I found this in the example laplace.py for there are some reads of x out of the boundary. Maybe you can also fix that later.

The accessor issue is already fixed. If you run laplace.py in debug mode it will notify

[E 05/24/20 08:56:39.074] [program.cpp:check_runtime_error@322] Assertion failure: Accessing Tensor of Size [16, 16] with indices (-1, 0)

Support handy tensor definitions with offsets

@Rullec

This feature allows users to conveniently define dense scalar/matrix tensors with a coordinate offset. For example, ti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45)) declares a scalar tensor with coordinate range [-3, 16) x [23, 68)

You will need to modify
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/impl.py#L279-L283 and
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/matrix.py#L106-L126

(The modification should be only a few lines of code each.)

Of course, add your tests here in this file: https://github.com/taichi-dev/taichi/blob/master/tests/python/test_offset.py

yuanming-hu on 24 May 2020

👍1

Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D

Rullec on 24 May 2020

Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D

Sure, let me know if you run into problems that you can't solve on your own :-)

yuanming-hu on 24 May 2020

I will, thanks! ;D

Rullec on 24 May 2020

@yuanming-hu
Hi Yuanming,

It seems I have finished "Support handy tensor definitions with offsets" this single step in my personal fork of Taichi.

As you said above, some minor modifications has been implemented in this two .pys (impl.py and matrix.py).

The dev guidelines show me that it's time for submitting a PR. Do I try to do it? I'm not quite sure. :D

Bests,
Xudong

Rullec on 29 May 2020

👍1

Hi @Rullec, thanks for the update! Yes, please submit a PR. It would be good to add some docs (please create a new rst file in named Coordinate offsets and include related features) and tests for what you have implemented.

yuanming-hu on 29 May 2020

Hi @yuanming-hu , I have submit my (first) PR. But it seems something goes wrong in it. Would you be happy to given me some advice, in both the PR content and the workflow TAT. Thanks a lot in advance!

Bests,
Xudong

Rullec on 29 May 2020

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

TH3CHARLie on 5 Jun 2020

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

Sorry about the delayed reply. Let's say you have a tensor

ti.var(dt=ti.f32, shape=(64), offset=(-32))

and kernel

@ti.kernel
def fill():
    for i in a:
        a[i] = 0

@xumingkuan and @archibate may have some insight on the best way to do optimization.

yuanming-hu on 10 Jun 2020

👍2

kernel {
  $0 = offloaded range_for(0, 64) block_dim=adaptive {
    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)
    <i32 x1> $3 = const [-32]
    <i32 x1> $4 = add $2 $3
    <f32 x1> $5 = const [0.0]
    <gen*x1> $6 = get root
    <i32 x1> $7 = sub $4 $3
    <i32 x1> $8 = const [0]
    <gen*x1> $9 = [S0root][root]::lookup($6, $8) activate = false
    <gen*x1> $10 = get child [S0root->S1dense] $9
    <i32 x1> $11 = bit_extract($7 + 0, 0~6)
    <gen*x1> $12 = [S1dense][dense]::lookup($10, $11) activate = false
    <f32*x1> $13 = get child [S1dense->S2place_f32] $12
    <f32 x1> $14 : global store [$13 <- $5]
  }
}

I see. In this case, maybe the best way is to implement a pass to replace $7 = ($2 + $3) - $3 with $7 = $2 + ($3 - $3).

To do that, we can

For each commutative binary op, if lhs is a conststmt, swap lhs and rhs.
If a binaryopstmt has rhs being a conststmt and lhs being a binaryopstmt with rhs being a conststmt and fast_math=True and the binary op is associative, apply the associative law.
Place the pass in full_simplify and let constant_fold evaluate $3 - $3 and let alg_simp simplify $2 + 0.

xumingkuan on 10 Jun 2020

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.

    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

xumingkuan on 10 Jun 2020

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.
    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...

yuanming-hu on 10 Jun 2020

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.
    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)
I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...

I found that the bit_extract is introduced in demote_dense_struct_fors. Maybe we can remove it if physical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49

xumingkuan on 10 Jun 2020

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.
    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)
I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...
I found that the bit_extract is introduced in demote_dense_struct_fors. Maybe we can remove it if physical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49