Start a thread to gather inputs/comments/suggestions. @pluskid @tqchen @piiswrong @mli @jermainewang @szha
There are two potential ways to represent sparse ndarrays, either create a new class, or add a type to the existing ndarray class. We need to figure out which one is more desirable.
Two python classes:
NDArray
Normal NDArray with data stored in dense format.
SparseNDArray
NDArray with data stored in sparse format.
sparse_mat = mx.sparse_nd(sparse_type = 'coo', indices = indices, values = values, shape = shape)
out = mx.sparse_nd.sparse_tensor_dense_matmul(sparse_mat, dense_mat)
dense_mat = mx.sparse_nd.to_dense(sparse_mat)
NDArray with sparse_type
Under the hood NDArray class will handle sparse-sparse, sparse-dense, dense-dense operators. If a operator is not implemented for sparse data type, the tensor will be converted to dense format first. Warnings will be thrown to notify users that the sparse_type of the ndarray is changed.
add, sub, mul, sparse_dense_matrix_mul, etcasnumpy, etcgetitem, setitemempty, zeros, etcequal, etcsparse_type, etc** we'll further discuss what operators to prioritize in another issue, since there are many to implement.
Add to original NDArray to avoid v-table and API change. A sparse tensor is represented by two dense tensors, index and data. There are two possible sparse formats:
Let K denote the number of tensor dimensions.
Let M be the number rows containing non-zero elements.
Let N be the number of non-zero values in all dimensions of the tensor.
Row sparse tensors (similar to IndexedSlice in tensorflow)
where each row of data data[i] corresponds to X[index[i]]. Rows with all zeros are omitted. Each row is a dense tensor.
-- shape: (K)
-- index shape: (M)
-- data shape: (M, K)
COO sparse tensor where X[tuple(index[i])] = data[i]
-- shape: (K)
-- index shape: (N, K)
-- data shape: (N, K)
Instead of creating another NDArray class, we could introduce ChunkType to help store the indices for sparse tensors. This avoids vtable lookups and API changes.
NDArray {
struct Chunk {
// auxiliary storage handle (e.g. indices)
Storage::Handle aux_shandle;
// storage handle from storage engine */
Storage::Handle shandle;
...
};
enum ChunkType {
kDense,
kRowSparse,
kCOOSparse,
// kMKL, etc
};
ChunkType chunk_type();
TBlob data() {
// Error when called on a sparse matrix
CHECK(chunk_type_ == kDense); ...
}
TBlob sparse_data() {
// blocks because length_ can change
this->WaitToRead(); ...
}
Tblob sparse_index() {
this->WaitToRead(); ...
}
void SparseResize() {
// Change size of actual data. Reallocate when upsizing.
// Need to have write lock before calling.
data_->Reallocate(); ...
}
NDArray to_dense();
private:
SparseType chunk_type_;
shared_ptr<Chunk> data_;
};
The reason for introducing ChunkType and making chunk_ opaque is to abstract away auxiliary memory allocated for NDArray(e.g. indices for sparse data, mkl_mem_, etc).
NDArray Op Registration with the following interface:
using FComputeNDArray = std::function<void(
const nnvm::NodeAttrs& attrs,
const OpContext& ctx,
const std::vector<NDArray>& inputs,
const std::vector<OpReqType>& req,
const std::vector<NDArray>& outputs)>;
For imperative execution, in MXImperativeInvoke, if at least one of input NDArray is sparse, try to use FComputeNDArray. If no such operator is registered but input tensor is sparse, call to_dense on all inputs.
Weights are initialized in dense format, gradients are initialized with specified dense/sparse types.
The memory for sparse tensors cannot be planned ahead of time and is allocated during op execution. For a graph with sparse tensor operations, we need to infer if any node in the graph takes sparse input, and try to share memory at execution time. There're two ways to infer the chunk type:
Add an InferChunkType pass, which infers the chunk types based on input chunk types. Ctx information will help infer mkl chunks.
Compose InferChunkType, InferShape, InferType into a single InferNDProperty pass, which performs all three. Type, shape and chunk type are wrapped into a struct. Additional NDArray constructor with this struct will be added.
Extend FComputeExecutor to execute FComputeNDArray besides FCompute. If the input chunk type is sparse while FCompute is expecting dense input, sparse data will be converted to dense format when OpExecutor::Run is invoked.
Since sparse tensors request memory at execution time, we should use a memory pool for operators to allocate/reallocate memory for its output buffer.
KVStore would take a sparse ndarray and perform the update. We can simply implement SGD op with the FComputeNDArray interface.
For communication, keys should be the row ids for the parameters.
suggestion for name change:
FComputeNDArray -> FCompute
The most common format is CSR (or CSC) for matrices (and the sliced indices for tensors). So this should be the top priority. The second priority is COO. This is usually converted to CSR before processing for efficiency.
For many algorithms, SpMv (sparse matrix * dense vector) operation is very common. The second most common is SpGEMM (sparse matrix * sparse matrix)
to add to this, the inputs (features) can also be sparse vectors (such as Bag of Words), which can then be transformed into dense via a random mapping. So that would be (dense matrix * sparse vector) computations.
Any reference where the random mapping actually works on sparse feature followed by deep learning ? We run sparse logistic...we tried factorization/svd etc to generate dense mapping for the sparse feature but it's not effective...things like locality sensitive hashing may work but then we can't go back to feature dimension...
Sparse features have been merged thanks to the hard works lead by @eric-haibin-lin and team members. Thanks!
Most helpful comment
to add to this, the inputs (features) can also be sparse vectors (such as Bag of Words), which can then be transformed into dense via a random mapping. So that would be (dense matrix * sparse vector) computations.