Concisely describe the proposed feature
Currently, all Taichi tensor bounds start from 0. For example,
a = ti.var(dt=ti.i32)
ti.dense(ti.ij, 256).place(a)
gives you a tensor of shape [0, 256) x [0, 256). In some cases we want a (-64, 128) offset to the coordinates so that we can have shape [-64, 192) x [128, 384).
Use cases
One typical use case is grids in physical simulation. Currently it's easy to allocate sparse voxel grids of bounds [0, 4096) x [0, 4096) x [0, 4096). 4096 is a big number and will almost never be used up. This means the domain can automatically extend along X+, Y+, and Z+.
With (-2048, -2048, -2048) coordinate offset, we will end up with domain [-2048, 2048) x [-2048, 2048) x [-2048, 2048). This means we can extend along X-, Y- and Z- as well.
Describe the solution you'd like (if any)
ti.root.dense(ti.ijk, 256).place(a, b, c, offset=(-128, -128, -128))
Then modify a few IR passes. I'll take care of this since I've been thinking about this feature for a couple of weeks...
Steps
index_offsets to place SNodesstruct for with offsetsti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45))Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32)
ti.root.dense(ti.i, 16).place(x)
print(x[-1])
will give the result 0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?
Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32) ti.root.dense(ti.i, 16).place(x) print(x[-1])will give the result
0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?
Hi Luohao, please see https://github.com/taichi-dev/taichi/issues/886 https://github.com/taichi-dev/taichi/pull/887 short answer: it's broken for SNode accessors for now.
Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32) ti.root.dense(ti.i, 16).place(x) print(x[-1])will give the result
0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.
Thanks for your reply. Actually I found this in the example laplace.py for there are some reads of x out of the boundary. Maybe you can also fix that later.
Well, I don't know if I'm getting this correctly...But the following code
x = ti.var(ti.f32) ti.root.dense(ti.i, 16).place(x) print(x[-1])will give the result
0. And I think this is the feature of treating the inactive values (or out-of-bounds values?) as 0. If we support negative coordinates, will this feature be broken? Or I just get it wrong?Hi Luohao, please see #886 #887 short answer: it's broken for SNode accessors for now.
Thanks for your reply. Actually I found this in the example
laplace.pyfor there are some reads ofxout of the boundary. Maybe you can also fix that later.
The accessor issue is already fixed. If you run laplace.py in debug mode it will notify
[E 05/24/20 08:56:39.074] [program.cpp:check_runtime_error@322] Assertion failure: Accessing Tensor of Size [16, 16] with indices (-1, 0)
@Rullec
This feature allows users to conveniently define dense scalar/matrix tensors with a coordinate offset. For example, ti.var(dt=ti.i32, shape=(19, 23), offset=(-3, 45)) declares a scalar tensor with coordinate range [-3, 16) x [23, 68)
You will need to modify
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/impl.py#L279-L283 and
https://github.com/taichi-dev/taichi/blob/6c02c401a60625f730aedd3c675974fc8efae1cd/python/taichi/lang/matrix.py#L106-L126
(The modification should be only a few lines of code each.)
Of course, add your tests here in this file: https://github.com/taichi-dev/taichi/blob/master/tests/python/test_offset.py
Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D
Thanks a lot Yuanming!
I will check these files that you showed above and try to solve them.
Btw, Would it be a proper place for us to post some (superficial) thinkings/questions here (in this issue)? :D
Sure, let me know if you run into problems that you can't solve on your own :-)
I will, thanks! ;D
@yuanming-hu
Hi Yuanming,
It seems I have finished "Support handy tensor definitions with offsets" this single step in my personal fork of Taichi.
As you said above, some minor modifications has been implemented in this two .pys (impl.py and matrix.py).
The dev guidelines show me that it's time for submitting a PR. Do I try to do it? I'm not quite sure. :D
Bests,
Xudong
Hi @Rullec, thanks for the update! Yes, please submit a PR. It would be good to add some docs (please create a new rst file in named Coordinate offsets and include related features) and tests for what you have implemented.
Hi @yuanming-hu , I have submit my (first) PR. But it seems something goes wrong in it. Would you be happy to given me some advice, in both the PR content and the workflow TAT. Thanks a lot in advance!
Bests,
Xudong
Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?
Hi @yuanming-hu, I have fixed bound checks with coordinate offsets in #1143 and I want to know more about the last missing piece "Add corresponding optimizations". What kind of optimizations are we expecting here?
Sorry about the delayed reply. Let's say you have a tensor
ti.var(dt=ti.f32, shape=(64), offset=(-32))
and kernel
@ti.kernel
def fill():
for i in a:
a[i] = 0
During lower_ast, you actually generate indices in range [0, 64) and add a -32 offset to it so that you get [-32, 32). However, later when you access the tensor a, you add a +32 offset to it. This gives you instructions like i - 32 + 32. I think currently Taichi can't optimize that into i, but it would be very helpful to optimize this case.
@xumingkuan and @archibate may have some insight on the best way to do optimization.
kernel {
$0 = offloaded range_for(0, 64) block_dim=adaptive {
<i32 x1> $1 = loop $0 index 0
<i32 x1> $2 = bit_extract($1 + 0, 0~6)
<i32 x1> $3 = const [-32]
<i32 x1> $4 = add $2 $3
<f32 x1> $5 = const [0.0]
<gen*x1> $6 = get root
<i32 x1> $7 = sub $4 $3
<i32 x1> $8 = const [0]
<gen*x1> $9 = [S0root][root]::lookup($6, $8) activate = false
<gen*x1> $10 = get child [S0root->S1dense] $9
<i32 x1> $11 = bit_extract($7 + 0, 0~6)
<gen*x1> $12 = [S1dense][dense]::lookup($10, $11) activate = false
<f32*x1> $13 = get child [S1dense->S2place_f32] $12
<f32 x1> $14 : global store [$13 <- $5]
}
}
I see. In this case, maybe the best way is to implement a pass to replace $7 = ($2 + $3) - $3 with $7 = $2 + ($3 - $3).
To do that, we can
$3 - $3 and let alg_simp simplify $2 + 0.BTW, why is there always a bit_extract after loop indices? $1 and $2 have the same value.
<i32 x1> $1 = loop $0 index 0
<i32 x1> $2 = bit_extract($1 + 0, 0~6)
BTW, why is there always a bit_extract after loop indices?
$1and$2have the same value.<i32 x1> $1 = loop $0 index 0 <i32 x1> $2 = bit_extract($1 + 0, 0~6)
I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...
BTW, why is there always a bit_extract after loop indices?
$1and$2have the same value.<i32 x1> $1 = loop $0 index 0 <i32 x1> $2 = bit_extract($1 + 0, 0~6)I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...
I found that the bit_extract is introduced in demote_dense_struct_fors. Maybe we can remove it if physical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49
BTW, why is there always a bit_extract after loop indices?
$1and$2have the same value.<i32 x1> $1 = loop $0 index 0 <i32 x1> $2 = bit_extract($1 + 0, 0~6)I think that's because we first use a StructFor, which gives you the bit_extract, and then we demote it to a RangeFor, and the bit_extract is preserved...
I found that the bit_extract is introduced in
demote_dense_struct_fors. Maybe we can remove it ifphysical_indices.size() == 1?
https://github.com/taichi-dev/taichi/blob/e5cbf47356ce40b6fcdaa2aa9dc788555f460d95/taichi/transforms/demote_dense_struct_fors.cpp#L49
Oh, sure. We can also do a bound inference: if the range for loop index lies in [0, 2 << bits), then the bit_extract can be removed.
Hi, I found that from_numpy and to_numpy is not working well with tensors with offsets.
Most helpful comment
Sorry about the delayed reply. Let's say you have a tensor
and kernel
During
lower_ast, you actually generate indices in range[0, 64)and add a-32offset to it so that you get[-32, 32). However, later when you access the tensora, you add a+32offset to it. This gives you instructions likei - 32 + 32. I think currently Taichi can't optimize that intoi, but it would be very helpful to optimize this case.@xumingkuan and @archibate may have some insight on the best way to do optimization.