Chapel: Expose the "ugly but inefficient way" of counting non-zeroes in a sparse row

Created on 19 Sep 2017  Â·  13Comments  Â·  Source: chapel-lang/chapel

Bruce said this would be a good idea, too.

Based on this SO Reponse It's useful to know how many non-zeroes exists along a row or column in a sparse array. For instance, if you're doing something ridiculous like building a graph from a sparse matrix like an a*le. Two proposed solutions are

var t = 0; for c in A.domain.dimIter(2,v) do t = t+1;

and

for r in 1..n { 
  var row = SD.dimIter(2,r); 
  writeln("row ", r, " contains ", row.size, " elements"); 
}

But apparently there is a better way. And +1 to dimIter, very nice!

Libraries / Modules Feature Request user issue

All 13 comments

In spite of being confused with the boss, here's a drop-in solution that could be used until this is officially supported:

for r in 1..n do
  writeln("row ", r, " contains ", SD.nonZeroesInDim(2,r), " elements");

proc _domain.nonZeroesInDim(param d, ind) {
  return _value.nonZeroesInDim(d, ind);
}

proc CSDom.nonZeroesInDim(param d, ind) {
  if (d != 2 && this.compressRows) {
    compilerError("dimIter(1, ..) not supported on CS(compressRows=true) domains");
  } else if (d != 1 && !this.compressRows) {
    compilerError("dimIter(2, ..) not supported on CS(compressRows=false) domains");
  }
  return stopIdx[ind]-startIdx[ind]+1;
}

Barriers to officially supporting it are similar to those with dimIter() — how to generalize it for n-dimensional sparse matrices, and would we want to support any queries, or just the fast ones?

Everyone at Chapel is "Bruce" to me now. You have far too many "B-names" and I need to keep it all clear.

... WOW, I didn't know it was going to be THAT UGLY...

... WOW, I didn't know it was going to be THAT UGLY...

To be clear, the user part was just the first two lines. The rest of it was how we'd add support for the user part to the internal modules. The short-and-sweet-ugly user version would be:

  writeln("row ", r, " contains ",
          SD._value.stopIdx[r]-SD._value.startIdx[r]+1, " elements");

After two separate offline discussions with @buddha314 and @ben-albrecht, I just wanted to note here that I had implemented methods called dsiPartialThese for the module support for partial reductions. Practically, they are just more "wholesome" implementations of dimIter. Few bullets I brought up with discussion with Ben in terms of the advantages of 'dsiPartialThese':

  • I have implemented it for DefaulRectangular, DefaultSparse, CS, SparseBlock, Block, Cyclic and BlockCyclic.

  • They work on any dimension. Existing dimIters work only on the dimension that would be the most efficient to iterate over, halt otherwise.

  • I have implmeneted all 4 (standalone, l/f, serial) versions of them.

For reference dsiPartialThese methods are introduced among the self-contained partial reduction support tests: #4398. And these methods can be found in: https://github.com/chapel-lang/chapel/blob/master/test/users/engin/partial_reduction_support/modules/dsiMethods.chpl

Hi Engin --

Thanks for the pointer! For the sake of this discussion, can you characterize how you handled higher-dimensional arrays in the interface? E.g., dimIter() only was designed to work for 2D arrays and only for the inner dimension of the array. If dsiPartialThese() can be used to iterate over a 4D array (say), can you show what that loop nest would look like?

@bradcray

First of all, I am not sure if it's relevant to your question but dsiPartialThese currently only allows iterating in a single dimension. So, to be clear, if you have a higher-dimensional arrays you cannot pick multiple dimensions to iterate over.

To the main question -- here is how you can use it

(uncompiled code)

for i in sparseDom.partialThese(2,(4,3,7)) do
  writeln("Yielded index: ", i); // printout indices (4, X, 3, 7) for all X that exists in sparseDom

And the implementation of dsiPartialThese in default sparse is as follows:

  const otherIdxTup = chpl__tuplify(otherIdx);

  if onlyDim != this.rank {
    for i in nnzDom.low..#nnz do
      if indices[i].withoutIdx(onlyDim) == otherIdxTup then 
        yield indices[i][onlyDim];
  }
  else { //here we are sure that we are looking for the last index
    for i in __private_findRowRange(otherIdxTup) do
      yield indices[i][onlyDim];
  }

The true branch of if handles cases where the iteration dimension is not the inner-most dimension. And the implementation reads through all indices and yields them if they match what the user asked in the iterator signature. Parallel versions chunk up the entire index array onto tasks, so there may be a load imbalance.

Arguably, if there is support in Sort to sort tuples based on a value in an index, it could be used to make this more efficient in terms of performance. Obviously, at a cost of memory space to store temporarily sorted indices.

First of all, I am not sure if it's relevant to your question but dsiPartialThese currently only allows iterating in a single dimension. So, to be clear, if you have a higher-dimensional arrays you cannot pick multiple dimensions to iterate over.

Yeah, this is essentially what I was trying to understand and hoping for. Part of the goal/role of dimIter() (that was not achieved) was to support breaking a monolithic forall loop over a sparse domain:

forall (i,j) in MySparseDom do ...

into a per-dimension loop nest, e.g.:

forall i in MySparseDom.shape(1) do
  for j in MySparseDom.dimIter(2, i) do ...

And while dimIter() achieves this for 2D CS* domains being iterated in the friendly order, it wasn't generalized to support higher-dimensional arrays (or, somewhat less importantly, the unfriendly orders). E.g., imagine wanting something like this:

forall i in MySparse3DDom.shape(1) do
  for j in MySparse3DDom.dimIter(2, i) do
    for k in MySparse3DDom.dimIter(3, i, j) do ...

(where these signatures may not make sense or be sufficient in reality, but hopefully the point is clear: I'd like to iterate over each dimension independently).

If I'm understanding dsiPartialThese(), it could be used to iterate over a 3D sparse domain as follows:

forall i in MySparse3DDom.shape(1) do
  for j in MySparse3DDom.shape(2) do
    for k in MySparse3DDom.dsiPartialThese(3, (i,j)) do ...

but this approach would have the disadvantage of requiring O(n**2) time where the domain may be sufficiently sparse that the i and j dimensions should really skip over any completely empty planes/pencils, rather than considering all of them. At least, that's my intuition.

I think you meant dim not shape in your snippets.

I am still a bit confused (and trying to remember similar discussions we had in the past) your concern is the overhead of having to iterate the whole set of indices and find matching ones, is that correct? If so:

I got it wrong in my initial reply. Currently there is support for Comparators in the Sort module. So, that means for a COO domain I can imagine sorting the whole indices array in a very specific way to provide O(nnz_in_the_plane) iteration complexity (on top of O(num_indices(log(num_indices))) sorting cost). Is something like that what you have in mind? Or am I missing something? (Somehow I cannot intuitively say that something like that would be better asymptotically)

In thinking about asymptotic efficiency, I'm happy to focus only on cases that don't require sorting (use the natural iteration order).

Here's a motivating case that I always have in mind: The NAS MG benchmark has a very sparse input set — e.g., 20 nonzeroes in a 512x512x512 array (n=512). A full-domain/array iteration like:

forall (i,j,k) in Sparse3DDomain do ...

or

forall (i,j,k) in Sparse3DArray do ...

ought to have O(20) complexity and I think we'd achieve that today. But we ought to have a way to write that loop nest one dimension at a time as well and have that also take O(20) time rather than O(n**2) or O(n) time.

This issue has come up again for me. I'm looking for the number of neighbors in a graph that is represented as a sparse matrix. You know, the whole GraphBLAS thing. The matrix is square, with each row/column being a vertex. So if my matrix is

A = 
. 1 . . 
1 . . .
. . 1 1
1 . . 1

Then nonzeroes(A[3]).size() would be the number neighbors of vertex 3. It would also be nice to be able to do something like nonzeroes(A[1..3]]), nonzeroes(A[1,3]), etc. I don't care if it's called nonzeroes, this is just an example.

I think a good language-level way to express this would be:

(A.domain[3, ..]).size  // take a rank-change slice of A's domain and query its size (number of indices)

This could then be sugared in the linear algebra library by a nonzeroes() routine or method or whatever. Unfortunately, this doesn't work as expected today (it returns 4 rather than 2). I'm looking to see if I can understand what's going wrong and what the level of effort for a fix would be while waiting to see if I will be called to a jury.

Barriers to officially supporting it are similar to those with dimIter() — how to generalize it for n-dimensional sparse matrices, and would we want to support any queries, or just the fast ones?

@bradcray - In the interest of providing this frequently requested feature to cover one of the most common use-cases, do you think it would be reasonable to adopt something like dsiPartialThese specifically for the CS layout, since it is limited to the 2D case anyway?

do you think it would be reasonable to adopt something like dsiPartialThese specifically for the CS layout, since it is limited to the 2D case anyway?

I think that's an intriguing idea — particularly now that we have (I think... haven't tried it) forwarding on the these virtual field and wouldn't need to expose the interface all the way up to the general sparse matrix format.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Rapiz1 picture Rapiz1  Â·  3Comments

vasslitvinov picture vasslitvinov  Â·  3Comments

Aniket21mathur picture Aniket21mathur  Â·  3Comments

buddha314 picture buddha314  Â·  3Comments

ben-albrecht picture ben-albrecht  Â·  3Comments