Gensim: Lda Model does not work with numpy 1.13

Created on 19 Jan 2018  路  14Comments  路  Source: RaRe-Technologies/gensim

It gives that following warning,
RuntimeWarning: invalid value encountered in subtract result = psi(alpha) - psi(np.sum(alpha))

It also does not result proper words probabilities for topics, and the result of show_topics looks like this: nanw1 + nanw2

When I upgrade numpy to 1.14, LdaModel works properly.

bug difficulty medium

Most helpful comment

Looks like there are several issues about this on the numpy repo. I think it's an anaconda issue though and it seems like it only impacts OSX. I commented on this issue on the anaconda repo: https://github.com/ContinuumIO/anaconda-issues/issues/8070.

I can confirm, that at least for me on OSX, the issue is with numpy/anaconda, not gensim.

@Ronggui, can you confirm that you see the same issue with numpy? You can execute the follwoing code:

import numpy

v1 = numpy.asarray([0., 2.], dtype='f')
v2 = numpy.asarray([0., 1.], dtype='f')
print(numpy.dot(v1, v2))

If you get back 0, then the problem is with your numpy/anaconda install.

All 14 comments

This looks like a numpy-problem (we used numpy for all math).
@Ronggui can you add full example (code + data), that reproduce this behavior?

In [2]: dictionary = corpora.Dictionary.load("dictionary")

In [3]: mm_corpus = corpora.MmCorpus("2018.mm")
   ...: print(mm_corpus)
   ...:
MmCorpus(6928 documents, 2398 features, 183810 non-zero entries)

In [4]: k=5

In [5]: model = models.LdaModel(corpus=mm_corpus, num_topics=k, id2word=dictionary)
/Users/rghuang/anaconda3/lib/python3.6/site-packages/gensim/matutils.py:613: RuntimeWarning: invalid value encountered in subtract
  result = psi(alpha) - psi(np.sum(alpha))

In [7]: model.show_topics()
Out[7]:
[(0,
  'nan*"W1" + nan*"w2"....

In [8]: import numpy

In [9]: numpy.__version__
Out[9]: '1.13.3'

This numpy is installed from conda
conda install -c anaconda numpy

It is strange that when I pip install numpy=1.13.3, thinks work out perfectly.

Please attach files dictionary and 2018.mm

This can be related to the same problem from #1767
@Ronggui Thanks for files.

This can be related to the same problem from #1767
@Ronggui Thanks for files.

I'm not positive this is the same issue. It looks like this may be a numpy/anaconda bug (still not positive even after digging through the following): https://github.com/numpy/numpy/issues/9656

@Ronggui, are you running osx? If I use numpy 1.13 via anaconda on osx, I get the same error as you. But it definitely looks like a numpy error. For example, try running this on numpy 1.13:

>>> import numpy as np
>>> expElogthetad = np.array([ 0.0090071 ,  0.00909133,  0.00900161,  0.0090378 ,  0.0090441 ], dtype=np.float32)
>>> expElogbetad = np.array([ 0.00246125,  0.00356956,  0.002345  ,  0.00296673,  0.0030526 ], dtype=np.float32)

>>> np.dot(expElogthetad, expElogbetad)
3.6893488e+19

>>> np.dot(expElogthetad.astype(np.float64), expElogbetad)
0.00013015027514139328

>>> (expElogthetad * expElogbetad).sum()
0.00013015028

The bolded value is definitely wrong. It looks like the bug was fixed between numpy 1.13 and numpy 1.14 (at least on the anaconda builds) but I don't see a related commit on the numpy repo, which makes me think this was an anaconda issue.

Thanks for investigating! Can you clarify / confirm with NumPy and Anaconda devs?

Looks like there are several issues about this on the numpy repo. I think it's an anaconda issue though and it seems like it only impacts OSX. I commented on this issue on the anaconda repo: https://github.com/ContinuumIO/anaconda-issues/issues/8070.

I can confirm, that at least for me on OSX, the issue is with numpy/anaconda, not gensim.

@Ronggui, can you confirm that you see the same issue with numpy? You can execute the follwoing code:

import numpy

v1 = numpy.asarray([0., 2.], dtype='f')
v2 = numpy.asarray([0., 1.], dtype='f')
print(numpy.dot(v1, v2))

If you get back 0, then the problem is with your numpy/anaconda install.

@arlenk It can be confirmed that conda's numpy (Mac OX) is the source of problem.

 package                    |            build
---------------------------|-----------------
numpy-1.13.3               |   py36h8a80b8c_2         3.7 MB  anaconda

In [1]: import numpy

In [2]: v1 = numpy.asarray([0., 2.], dtype='f')
...: v2 = numpy.asarray([0., 1.], dtype='f')
...: print(numpy.dot(v1, v2))
...:
0.0

Rongguis-MacBook-Air:~ rghuang$ pip install numpy
Collecting numpy
Using cached numpy-1.14.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.14.0

In [1]: import numpy
...:
...: v1 = numpy.asarray([0., 2.], dtype='f')
...: v2 = numpy.asarray([0., 1.], dtype='f')
...: print(numpy.dot(v1, v2))
...:
2.0

Nice work, thanks @arlenk and @Ronggui for investigation!
I close this issue because this isn't a gensim bug.

probably related report - https://groups.google.com/forum/#!topic/gensim/6Uuj-yr5-Ls

CC: @piskvorky @Ronggui @arlenk

Thanks for everyone. But I have not found the solution for this issue. Could @menshikh-iv not close it so quickly? I have seen you closed another post quickly recently and it was not the same issue you referenced. If there is any pressure you need to close some threads (maybe KPI?), please let us know. Thanks.

are you sure you replied to the correct issue? this thread was regarding an issue in numpy 1.13.3. As the issue was in numpy (and was fixed in later numpy releases), there's really nothing gensim (or any other library) can do. Your best bet is moving to a more recent version of numpy, unfortunately,

Was this page helpful?
0 / 5 - 0 ratings

Related issues

coopwilliams picture coopwilliams  路  3Comments

bgokden picture bgokden  路  3Comments

johann-petrak picture johann-petrak  路  3Comments

ahmedbhabbas picture ahmedbhabbas  路  4Comments

sairampillai picture sairampillai  路  3Comments