Taichi: Sparse computation with kernels

Created on 30 Apr 2020 · 11Comments · Source: taichi-dev/taichi

Dear Community,

If I were to simulate diffusion over a 2D grid with a square hole in the middle, how could I make use of sparse structures to avoid populating my kernel with a bunch of if statements ?

import taichi as ti
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#running this in jupyter 
import cv2
ti.reset() #running this inside a jupyter notebook helps relaunch. 
#Otherwise compiler (?) doesn't let jupyter rerun.
ti.cfg.arch = ti.cuda 

real = ti.f32
scalar = lambda: ti.var(dt=real)
vec = lambda: ti.Vector(2, dt=real)

steps=15000
n_grid=256
p=scalar()
ti.root.dense(ti.l, steps).dense(ti.ij, n_grid).place(p) #

thick=10
init_cond=np.ones((n_grid,n_grid))
init_cond[thick:-thick,thick:-thick]=0
plt.imshow(init_cond)
#the initial condition has a thick border of concentration 1. The rest is 0.


@ti.func 
def laplacian(t, i, j):
    return  (-4 * p[t, i, j] + p[t, i, j - 1] + p[t, i, j + 1] + p[t, i + 1, j] + p[t, i - 1, j])


###Request####
#Here below, I avoid computing the diffusion on a square 25x25 area.
#How can I engineer the field p() to be sparse, so the kernel does not care
#about those values
@ti.kernel
def diffuse(t: ti.i32):
    for i in range(n_grid):
        for j in range(n_grid):
            if not ((i>100)&(i<125)&(j>100)&(j<125)):
                lp = laplacian(t-1,i,j)
                p[t,i,j] = p[t-1,i,j]+0.24*lp
####Request#####
# I would not need to replenish the initial conditions if the borders
# of my field were also inactive in the sparse structure of p()
@ti.kernel
def replenish(t: ti.i32):
    for i in range(n_grid):
        for j in range(n_grid):
            if ((i<thick) or (i>n_grid-thick)) or  ((j<thick) or (j>n_grid-thick)):
                p[t,i,j]=1




@ti.kernel                
def init_condition(arr: ti.ext_arr()):
    for i in range(n_grid):
        for j in range(n_grid):
            p[0,i,j]=arr[i,j]


@ti.kernel
def put_into_img(t: ti.i32, arr: ti.ext_arr()):
    for i in range(n_grid):
        for j in range(n_grid):
            arr[i,j]=p[t,i,j]


#main             
def forward():
    init_condition(init_cond)
    for t in range(1,steps):
        diffuse(t)
        replenish(t)
        if (t+1)%16==0:
            put_into_img(t,display_img)
            cv2.imshow('',display_img)
            cv2.waitKey(1)

forward()

For completeness' sake, I pasted here a working example of a diffusion that avoids a central square hole. This runs in my jupyter notebook.

The concentration field that is updated at every time step, p(), is dense. So I need to tell the kernel to explicitly avoid that region, and also avoid (or rebuild) the initial condition (the borders).

To be honest, I tried playing around from the examples in the test functions for sparse structures, but to no avail. I tried for example this approach:


ti.reset()
n=256
x=ti.var(ti.f32)
ptr1 = ti.root.pointer(ti.ij, n)
ptr1.place(x)


@ti.kernel
def deact():
    for j in range(n):
        for i in range(n):
            if (i>100)&(i<125)&(j>100)&(j<125):
                ti.deactivate(x.parent(),[i,j])

@ti.kernel
def fill():
    for i,j in x:
            x[j,i]=i+j

It does not work. Also, I do not understand how to play with pointers. ti.root.pointer(ti.ij,16).dense(ti.ij,16).shape is (256,256), and not (16,16,16,16). Etc.

Long story short, I have no idea how to play with sparse data structures. I was not successful at deactivating elements, or playing with the bitmasked structures.

Eventually, I would like to iteratively solve an extremely sparse linear system, where handling the datastructure correctly would come in real handy.

Let me know if I should rather post this on the forum.

doc question

Source

mathusalem2020

Most helpful comment

@ti.kernel
def act():
    for i, j in ti.ndrange(n, n):
        if (i>100)&(i<125)&(j>100)&(j<125):
            x[0, i, j] = 0  # activate `x[0, i, j]`, but not activate `x[1, i, j]`.

A simpler way to write this:

@ti.kernel
def act():
    for I in ti.grouped(ti.ndrange(1, (101, 125), (101, 125))):
        x[I] = 0

xumingkuan on 1 May 2020

👍2

All 11 comments

ti.root.pointer(ti.ij,16).dense(ti.ij,16).shape is (256,256)

Try ti.root.pointer(ti.indices(0, 1),16).dense(ti.indices(2, 3),16).shape? If two node have same indices, they will multiply together. NOTE: ti.ij is an alias of ti.indices(0, 1).

Long story short, I have no idea how to play with sparse data structures. I was not successful at deactivating elements, or playing with the bitmasked structures.

Thank for asking this! Sorry about the lacks a good doc on sparse data structures. To be honest as a org member I don't know how to operate them well either...
(@yuanming-hu Again, could you complete the docs on sparse? Many thanks.)

Request

Here below, I avoid computing the diffusion on a square 25x25 area.

How can I engineer the field p() to be sparse, so the kernel does not care

about those values

Let me try to tell something I know about solving this:

bitmasked is simpler than pointer, it's suggested to start from bitmasked.
an element is activated when it's assigned. e.g.:

x[i, j] = 123

will mark x[i, j] as active.

to deactivate an element, use:

ti.deactivate(x.parent(), [i, j])

use struct-for syntax to iterate only the active elements (important!):

for i, j in x:  # will only loop on those who are activated!!
  ...

if you use range-for syntax, all elements whatever active or inactive will be iterated:

for i in range(n):  # will iterate all!!
  ...

archibate on 30 Apr 2020

👍2

import taichi as ti

ti.init()

n = 256
x = ti.var(ti.f32)
blk1 = ti.root.bitmasked(ti.ij, (n, n))
blk1.place(x)

@ti.kernel
def act():
    for i in range(n):
        for j in range(n):
            if (i>100)&(i<125)&(j>100)&(j<125):
                # NOTE: all elements in bitmasked is initially deactivated
                # So we have to **activate those elements we want**
                # Instead of **deactivate those elements we don't want** (your case)
                x[i, j] = 0  # assign any value to activate!

@ti.kernel
def paint(t: ti.f32):
    for i, j in x:  # use the struct-for syntax
        # now, the loop body is only executed for active elements
        # don't use: for i in range(n), it will execute on all element no matter it's active or not (your case)
        x[i, j] = ti.sin(t)  # make the activated part to blink


act()

gui = ti.GUI('Test', n)
for frame in range(10000):
   paint(frame * 0.01)
   gui.set_image(x)
   gui.show()

Hope this could be helpful.

archibate on 30 Apr 2020

Is there a way to deactivate all elements at once, without range iterating over the entire structure?
Does the following line do exactly that?

https://github.com/taichi-dev/taichi/blob/6751bd59a401f218feb1630fc4869ac4c51df7dd/examples/taichi_sparse.py#L87

digitalillusions on 30 Apr 2020

👍1

Is there a way to deactivate all elements at once, without range iterating over the entire structure?
Does the following line do exactly that?

https://github.com/taichi-dev/taichi/blob/6751bd59a401f218feb1630fc4869ac4c51df7dd/examples/taichi_sparse.py#L87

Yes, it does :)

archibate on 30 Apr 2020

Thanks a lot @archibate
I understand it better now, thanks for the details.
Going a bit further, I'm more interested in using diff-taichi, and the ti.Tape, etc. So I want to track the simulation in time. Therefore, my data structure is dense in time but would be bitmasked in space.

ti.root.dense(ti.l, steps).bitmasked(ti.ij, n_grid).place(x)

How do you then run a struct for only on the bitmasked part of the structure ?
I had a suspiscion it has to do with something like

@ti.kernel
def paint(t: ti.f32):
    for i, j in x:  # x.leaf(), x.children(), x.node()  ? How to access only the spatial part ?
        x[t, i, j] = ti.sin(t)

but again limited success :)

mathusalem2020 on 1 May 2020

👍1

Good question. You can do this by only activating x[0, i, j], so that x[1, i, j] is not activated. e.g.:

import taichi as ti

ti.init()

n = 256
steps = 3
x = ti.var(ti.f32)
img = ti.var(ti.f32, shape=(n, n))
blk1 = ti.root.dense(ti.l, steps).bitmasked(ti.ij, (n, n))
blk1.place(x)

@ti.kernel
def act():
    for i, j in ti.ndrange(n, n):
            if (i>100)&(i<125)&(j>100)&(j<125):
                x[0, i, j] = 0  # activate `x[0, i, j]`, but not activate `x[1, i, j]`.

@ti.kernel
def paint(n: ti.f32):
    for t, i, j in x:  # although t is included in indices, but it will be always 0 since only x[0, i, j] is activated in act()
        x[t, i, j] = ti.sin(n)

@ti.kernel
def to2dimg():
    for i, j in img:
        img[i, j] = x[0, i, j]  # convert to make gui.set_image() recognize


ti.root.deactivate_all()
act()

gui = ti.GUI('Test', n)
for frame in range(10000):
   paint(frame * 0.01)
   to2dimg()
   gui.set_image(img)
   gui.show()

archibate on 1 May 2020

👍2

I think what @archibate said is really close, but slightly off from what @mathusalem2020 wants, since s/he does want t to change with each invocation?

How about define a mask taichi tensor, which is bitmasked. E.g.

x = ti.var(ti.f32)
x_mask = ti.var(ti.i32)

ti.root.bitmasked(ti.ij, (n, n)).place(x_mask)
ti.root.dense(ti.l, steps).dense(ti.ij, (n, n)).place(x)

@ti.kernel
def act():
  for i, j in ti.ndrange(n, n):
    if ...:
      x_mask[i, j] = 1  # the value doesn't matter, just to activate

@ti.kernel
def  diffuse(t: ti.i32):
  for i, j in x_mask:  # will only iterate over those activated in |x_mask|
    x[t, i, j] = ...

act()
for i in range(steps):
  diffuse(i)
  ...

One nice thing is that, the struct-for loop will only return you the index, but not the actual element of the tensor being iterated. So you can leverage the sparsity from one tensor for iterating over another, I think.

k-ye on 1 May 2020

👍2

@ti.kernel
def act():
    for i, j in ti.ndrange(n, n):
        if (i>100)&(i<125)&(j>100)&(j<125):
            x[0, i, j] = 0  # activate `x[0, i, j]`, but not activate `x[1, i, j]`.

A simpler way to write this:

@ti.kernel
def act():
    for I in ti.grouped(ti.ndrange(1, (101, 125), (101, 125))):
        x[I] = 0

xumingkuan on 1 May 2020

👍2

One nice thing is that, the struct-for loop will only return you the index, but not the actual element of the tensor being iterated. So you can leverage the sparsity from one tensor for iterating over another, I think.

Cool! Your answer sounds much close. But there is two things to consider for your solution:

What if we want to maintain a mask for different t?
If we use pointer instead of bitmasked, then this won't save any memory since x is still dense.

archibate on 1 May 2020

What if we want to maintain a mask for different t?

Yeah, then I guess a static mask variable won’t work. In that case, I’m not sure if sparse data structure is advantageous in any way, since you will have to iterate through the entire mesh grid to figure out which places to activate for every diffuse() iteration?

If we use pointer instead of bitmasked, then this won't save any memory since x is still dense.

I suppose you meant the other way around? That’s true, but I don’;t see memory being a major concern in this issue.

k-ye on 2 May 2020

What if we want to maintain a mask for different t?

Yeah, then I guess a static mask variable won’t work. In that case, I’m not sure if sparse data structure is advantageous in any way, since you will have to iterate through the entire mesh grid to figure out which places to activate?

No, I mean do we have something like for i, j in x[0, :, :]?
Since for each t the bitmasked is independent, this should be possible.

If we use pointer instead of bitmasked, then this won't save any memory since x is still dense.

I suppose you meant the other way around? That’s true, but I don’;t see memory being a major concern in this issue.

Yes, what if it's a major concern?

archibate on 2 May 2020

Was this page helpful?

0 / 5 - 0 ratings