Taichi: [Lang] [IR] Coordinate offsets and negative coordinates

Created on 10 May 2020  路  18Comments  路  Source: taichi-dev/taichi

Concisely describe the proposed feature

Currently, all Taichi tensor bounds start from 0. For example,

a = ti.var(dt=ti.i32)
ti.dense(ti.ij, 256).place(a)

gives you a tensor of shape [0, 256) x [0, 256). In some cases we want a (-64, 128) offset to the coordinates so that we can have shape [-64, 192) x [128, 384).

Use cases

One typical use case is grids in physical simulation. Currently it's easy to allocate sparse voxel grids of bounds [0, 4096) x [0, 4096) x [0, 4096). 4096 is a big number and will almost never be used up. This means the domain can automatically extend along X+, Y+, and Z+.

With (-2048, -2048, -2048) coordinate offset, we will end up with domain [-2048, 2048) x [-2048, 2048) x [-2048, 2048). This means we can extend along X-, Y- and Z- as well.

Describe the solution you'd like (if any)

ti.root.dense(ti.ijk, 256).place(a, b, c, offset=(-128, -128, -128))

Then modify a few IR passes. I'll take care of this since I've been thinking about this feature for a couple of weeks...

Steps

  • [x] Add index_offsets to place SNodes
  • [x] Implement struct for with offsets
  • [x] Support handy tensor definitions with offsets (e.g. ti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45))
  • [x] Fix bound checks with coordinate offsets
  • [ ] Add corresponding optimizations
  • [x] Add doc
GAMES201 feature request welcome contribution

Most helpful comment

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

Sorry about the delayed reply. Let's say you have a tensor

ti.var(dt=ti.f32, shape=(64), offset=(-32))

and kernel

@ti.kernel
def fill():
    for i in a:
        a[i] = 0

During lower_ast, you actually generate indices in range [0, 64) and add a -32 offset to it so that you get [-32, 32). However, later when you access the tensor a, you add a +32 offset to it. This gives you instructions like i - 32 + 32. I think currently Taichi can't optimize that into i, but it would be very helpful to optimize this case.

@xumingkuan and @archibate may have some insight on the best way to do optimization.

All 18 comments

Well, I don't know if I'm getting this correctly...But the following code

x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])

will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Well, I don't know if I'm getting this correctly...But the following code

x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])

will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Hi Luohao, please see https://github.com/taichi-dev/taichi/issues/886 https://github.com/taichi-dev/taichi/pull/887 short answer: it's broken for SNode accessors for now.

Well, I don't know if I'm getting this correctly...But the following code

x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])

will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.

Thanks for your reply. Actually I found this in the example laplace.py for there are some reads of x out of the boundary. Maybe you can also fix that later.

Well, I don't know if I'm getting this correctly...But the following code

x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])

will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?

Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.

Thanks for your reply. Actually I found this in the example laplace.py for there are some reads of x out of the boundary. Maybe you can also fix that later.

The accessor issue is already fixed. If you run laplace.py in debug mode it will notify

[E 05/24/20 08:56:39.074] [program.cpp:check_runtime_error@322] Assertion failure: Accessing Tensor of Size [16, 16] with indices (-1, 0)

Support handy tensor definitions with offsets

@Rullec

This feature allows users to conveniently define dense scalar/matrix tensors with a coordinate offset. For example, ti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45)) declares a scalar tensor with coordinate range [-3, 16) x [23, 68)

You will need to modify
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/impl.py#L279-L283 and
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/matrix.py#L106-L126

(The modification should be only a few lines of code each.)

Of course, add your tests here in this file: https://github.com/taichi-dev/taichi/blob/master/tests/python/test_offset.py

Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D

Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D

Sure, let me know if you run into problems that you can't solve on your own :-)

I will, thanks! ;D

@yuanming-hu
Hi Yuanming,

It seems I have finished "Support handy tensor definitions with offsets" this single step in my personal fork of Taichi.

As you said above, some minor modifications has been implemented in this two .pys (impl.py and matrix.py).

The dev guidelines show me that it's time for submitting a PR. Do I try to do it? I'm not quite sure. :D

Bests,
Xudong

Hi @Rullec, thanks for the update! Yes, please submit a PR. It would be good to add some docs (please create a new rst file in named Coordinate offsets and include related features) and tests for what you have implemented.

Hi @yuanming-hu , I have submit my (first) PR. But it seems something goes wrong in it. Would you be happy to given me some advice, in both the PR content and the workflow TAT. Thanks a lot in advance!

Bests,
Xudong

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?

Sorry about the delayed reply. Let's say you have a tensor

ti.var(dt=ti.f32, shape=(64), offset=(-32))

and kernel

@ti.kernel
def fill():
    for i in a:
        a[i] = 0

During lower_ast, you actually generate indices in range [0, 64) and add a -32 offset to it so that you get [-32, 32). However, later when you access the tensor a, you add a +32 offset to it. This gives you instructions like i - 32 + 32. I think currently Taichi can't optimize that into i, but it would be very helpful to optimize this case.

@xumingkuan and @archibate may have some insight on the best way to do optimization.

kernel {
  $0 = offloaded range_for(0, 64) block_dim=adaptive {
    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)
    <i32 x1> $3 = const [-32]
    <i32 x1> $4 = add $2 $3
    <f32 x1> $5 = const [0.0]
    <gen*x1> $6 = get root
    <i32 x1> $7 = sub $4 $3
    <i32 x1> $8 = const [0]
    <gen*x1> $9 = [S0root][root]::lookup($6, $8) activate = false
    <gen*x1> $10 = get child [S0root->S1dense] $9
    <i32 x1> $11 = bit_extract($7 + 0, 0~6)
    <gen*x1> $12 = [S1dense][dense]::lookup($10, $11) activate = false
    <f32*x1> $13 = get child [S1dense->S2place_f32] $12
    <f32 x1> $14 : global store [$13 <- $5]
  }
}

I see. In this case, maybe the best way is to implement a pass to replace $7 = ($2 + $3) - $3 with $7 = $2 + ($3 - $3).

To do that, we can

  1. For each commutative binary op, if lhs is a conststmt, swap lhs and rhs.
  2. If a binaryopstmt has rhs being a conststmt and lhs being a binaryopstmt with rhs being a conststmt and fast_math=True and the binary op is associative, apply the associative law.
  3. Place the pass in full_simplify and let constant_fold evaluate $3 - $3 and let alg_simp simplify $2 + 0.

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.

    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.

    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.

    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...

I found that the bit_extract is introduced in demote_dense_struct_fors. Maybe we can remove it if physical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49

BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.

    <i32 x1> $1 = loop $0 index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~6)

I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...

I found that the bit_extract is introduced in demote_dense_struct_fors. Maybe we can remove it if physical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49

Oh, sure. We can also do a bound inference: if the range for loop index lies in [0, 2 << bits), then the bit_extract can be removed.

Hi, I found that from_numpy and to_numpy is not working well with tensors with offsets.

Was this page helpful?
0 / 5 - 0 ratings