Gensim: `scipy.sparse.sparsetools` deprecated

Created on 23 Sep 2015  路  7Comments  路  Source: RaRe-Technologies/gensim

Importing gensim prints this deprecation warning:

gensim/home/ubuntu/.vew/ds26/lib/python2.6/site-packages/numpy/lib/utils.py:95: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!
scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.
  warnings.warn(depdoc, DeprecationWarning)

Find out why it's deprecated and, most importantly, what is the replacement?

We rely on sparsetools rather heavily in gensim, so this is critical.

Resources: scipy release notes, scipy mailing list...

bug difficulty medium

Most helpful comment

The issue is currently in works in this repo here- https://github.com/souravsingh/sparseutils
Finishing this should also fix #557

All 7 comments

Pauli Virtanen from scipy mailing list has suggested the following:

If I understand correctly, these are use cases that can be expressed
in terms of usual sparse matrix operations,

y = corpus * o
y += corpus * chunk

but you are using the internal sparsetools routines instead,
because of performance reasons? Is the performance difference
big in this case? Is the issue that you want in-place sparse AXPY,
or is it due to dealing with small matrices and avoiding
overheads?

There's currently no sparse axpy available in Scipy.
There probably should be though.

Gensim is not a pure-Python module, so one relatively straightforward
possibility is to just bundle a copy of the current sparsetools
module (or just the one routine you need) with it. There's no SWIG
nowadays involved, and it's independent of the rest of Scipy,
so it's probably doable.

https://mail.scipy.org/pipermail/scipy-user/2015-October/036712.html

Did you check that corpus * o doesn't materialize the sparse matrix corpus into a dense array?

If it's memory efficient (no extra memory overhead) then we can change it. I don't care (much) about performance here, I'm pretty sure this optimization was concerned with memory, not time. We cannot afford to convert corpus into a dense array, and I'm afraid that sparsetools were used for a reason there.

Re. bundling a copy: gensim is pure-python, the extensions are only optional. So bundling a hard dependency that needs compilation is not an option for us.

@tmylk Check for what sparse * dense does in terms of memory in recent scipys, then we'll know more.

And it looks like y += corpus * chunk would double the memory in any case (a temporary matrix for corpus * chunk before adding it to the existing y) :(

There may be no other way though, if scipy removes sparsetools. A sparse axpy routine in scipy would be great though, as Pauli suggests!

Feature request submitted to sci-py for sparse in-place AXPY

https://github.com/scipy/scipy/issues/5348

There is some progress happening on scipy https://github.com/scipy/scipy/pull/5775

The issue is currently in works in this repo here- https://github.com/souravsingh/sparseutils
Finishing this should also fix #557

Can't reproduce now, continue the discussion in #557.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

menshikh-iv picture menshikh-iv  路  3Comments

Laubeee picture Laubeee  路  3Comments

jeradf picture jeradf  路  4Comments

hhchen1105 picture hhchen1105  路  4Comments

johann-petrak picture johann-petrak  路  3Comments