Since there are some amount of demand on this #919 #773 This issue is for specification of Sparse Matrix support for mxnet. So interested devs can go ahead and do the job.
I think the first stage of wrapping will be done on the
A wrapping can be done on the language native side.
Gradient won't be propagated back to the sparse matrix side.
addto on gradient is much cheaper than writetoI think the basic answer is no, mainly because it is hard to optimize for general sparse pattern(sparse sparse addition). We want to be able to handle sparse input really well.
This is what we like to discussion
We have observed the same thing in EmbeddingOp for NLP, where users complain about its memory utilization due to choice 1. So I would expect different choices works for difference cases, and maybe we even need to have both, with clear naming convention for the second one
@antinucleon will be leading discussion on this issue
TensorFlow has sparse tensors and indexedslices tensors, which are sparse in first dimension but dense in other dimensions.
I think I would be easier to support the later first and it satisfies most use cases.
Sgd with momentum can also benefit from sparse update by recording iteration count of the last update.
One other feature we are missing is stride and general slicing.
I vote for the first choice. However, I am thinking about wrap indptr, indices and values into CSR matrix directly, and make transparent convert to scipy.csr
@antinucleon #1 doesn't work for large vocabulary unless we support row sparse (or indexslices) tensors.
We have around 300M features but each sample has only around 300 non-zeros.
If batch size is small, dense weight and dense gradient will be wasteful.
On Thu, Feb 25, 2016 at 1:22 PM, Eric Junyuan Xie [email protected]
wrote:
#1 doesn't really work for large vocabulary unless we support row sparse
(or indexslices) tensors.—
Reply to this email directly or view it on GitHub
https://github.com/dmlc/mxnet/issues/1524#issuecomment-188620605.
HONG Chuntao
System Research Group
Microsoft Research Asia
What's the progress of this issue?
no progress at all?
@ProgramCaiCai I've tried for supporting sparsity in MShadow. We may need two key functions. csrmm and csr2csc (Refer to http://docs.nvidia.com/cuda/cusparse/#cusparse-lt-t-gt-csrmm and http://docs.nvidia.com/cuda/cusparse/#cusparse-lt-t-gt-csr2csc). However, I meet some problems when I try to directly call the cuSparse APIs since the sparse matrix stored there is row-major while the dense matrix is column-major. For this reason I haven't continued the implementation. Following is the declaration of the csrmm function.
/*!
* \brief CPU/GPU: dst = alpha * op(A) * op(B) + beta * dst;
op(A) = trans_A ? A^T : A; op(B) = trans_B ? B^T : B;
A is stored in Compressed Sparse Row Format (CSR)
Refer to https://en.wikipedia.org/wiki/Sparse_matrix
* \param A_val value of the nnz elements in A
* \param A_row_ptr pointer to row elements
* \param A_col_ind column indices of the nnz elements
* \param B dense matrix
* \param alpha
* \param beta
* \param dst destination, during computation, the dst will be transposed
*/
template<bool trans_A, bool trans_B, typename DType>
inline void csrmm(const Tensor<gpu, 1, DType> &A_val,
const Tensor<gpu, 1, int> &A_row_ptr,
const Tensor<gpu, 1, int> &A_col_ind,
const Tensor<gpu, 2, DType> &B,
DType alpha,
DType beta,
Tensor<gpu, 2, DType> dst);
no progress at all?
Suggestion on the interface SparseInputEmbedding(data_indptr = indptr, data_indices = indices, data_value = values, weight=weight, bias = bias)
data_values should be optional. If not specified, assume all values are 1 -- this is a very common case.
We need to think about reading sparse inputs from disk too. Because each record will require variable storage on disk, depending on the number of non-zero entries. I see a couple of choices here:
1) We adopt a convention like "index=-1 means missing data" and require that all records get -1 padded out to the length of the longest record. This is simple to implement, and general, but space-inefficient.
2) Write custom InputIter classes that understand how to read variable-length records.
I/O option #1 (padding with -1's) gets much more attractive if the files are gzip encoded. That's nice in that it's also super general. (Thanks for the suggestion NB.)
when input vector is sparse, the vector's length is varying with different input. However, the parameter in_args(NDArrayHandle *)'s memories in function MXExecutorBindEX are pre-allocated and have fixed length. Obviously, the two are contradict.
hi, @tqchen @antinucleon, I understand that the initialization process of Static Graph have allocated memory for input or output variable, and that's static allocate memory. When the input is sparse, the using memory is varying. So I wonder static allocate memory can support sparse input. Recently, I want to support sparse input, Can you help me?
@moveforever I agree these two ideas contract: 1) sparse inputs have variable length, because the actual vector length is the number of non-zero entries in that sparse record; 2) pre-allocating fixed memory buffers. Also 3) minibatches, which acts a lot like (2).
I think a good way to solve this is by adopting a convention that a value of "-1" in the index input indicates that this entry should actually be treated as blank (sparse). This means that sparse input data will have to be padded out to a fixed size of maximum number of non-zero entries.
+1 Sparse Matrix support is also needed for Keras. #4173
Most helpful comment
What's the progress of this issue?