Gensim: Lda Model does not work with numpy 1.13

Created on 19 Jan 2018 · 14Comments · Source: RaRe-Technologies/gensim

It gives that following warning,
RuntimeWarning: invalid value encountered in subtract result = psi(alpha) - psi(np.sum(alpha))

It also does not result proper words probabilities for topics, and the result of show_topics looks like this: nanw1 + nanw2

When I upgrade numpy to 1.14, LdaModel works properly.

bug difficulty medium

Source

Ronggui

Most helpful comment

Looks like there are several issues about this on the numpy repo. I think it's an anaconda issue though and it seems like it only impacts OSX. I commented on this issue on the anaconda repo: https://github.com/ContinuumIO/anaconda-issues/issues/8070.

I can confirm, that at least for me on OSX, the issue is with numpy/anaconda, not gensim.

@Ronggui, can you confirm that you see the same issue with numpy? You can execute the follwoing code:

import numpy

v1 = numpy.asarray([0., 2.], dtype='f')
v2 = numpy.asarray([0., 1.], dtype='f')
print(numpy.dot(v1, v2))

If you get back 0, then the problem is with your numpy/anaconda install.

arlenk on 25 Jan 2018

👍2 🎉1

All 14 comments

This looks like a numpy-problem (we used numpy for all math).
@Ronggui can you add full example (code + data), that reproduce this behavior?

menshikh-iv on 19 Jan 2018

In [2]: dictionary = corpora.Dictionary.load("dictionary")

In [3]: mm_corpus = corpora.MmCorpus("2018.mm")
   ...: print(mm_corpus)
   ...:
MmCorpus(6928 documents, 2398 features, 183810 non-zero entries)

In [4]: k=5

In [5]: model = models.LdaModel(corpus=mm_corpus, num_topics=k, id2word=dictionary)
/Users/rghuang/anaconda3/lib/python3.6/site-packages/gensim/matutils.py:613: RuntimeWarning: invalid value encountered in subtract
  result = psi(alpha) - psi(np.sum(alpha))

In [7]: model.show_topics()
Out[7]:
[(0,
  'nan*"W1" + nan*"w2"....

In [8]: import numpy

In [9]: numpy.__version__
Out[9]: '1.13.3'

This numpy is installed from conda
conda install -c anaconda numpy

Ronggui on 19 Jan 2018

It is strange that when I pip install numpy=1.13.3, thinks work out perfectly.

Ronggui on 19 Jan 2018

Please attach files dictionary and 2018.mm

menshikh-iv on 19 Jan 2018

sample_files.zip

Ronggui on 19 Jan 2018

This can be related to the same problem from #1767
@Ronggui Thanks for files.

menshikh-iv on 19 Jan 2018

This can be related to the same problem from #1767
@Ronggui Thanks for files.

I'm not positive this is the same issue. It looks like this may be a numpy/anaconda bug (still not positive even after digging through the following): https://github.com/numpy/numpy/issues/9656

@Ronggui, are you running osx? If I use numpy 1.13 via anaconda on osx, I get the same error as you. But it definitely looks like a numpy error. For example, try running this on numpy 1.13:

>>> import numpy as np
>>> expElogthetad = np.array([ 0.0090071 ,  0.00909133,  0.00900161,  0.0090378 ,  0.0090441 ], dtype=np.float32)
>>> expElogbetad = np.array([ 0.00246125,  0.00356956,  0.002345  ,  0.00296673,  0.0030526 ], dtype=np.float32)

>>> np.dot(expElogthetad, expElogbetad)
3.6893488e+19

>>> np.dot(expElogthetad.astype(np.float64), expElogbetad)
0.00013015027514139328

>>> (expElogthetad * expElogbetad).sum()
0.00013015028

The bolded value is definitely wrong. It looks like the bug was fixed between numpy 1.13 and numpy 1.14 (at least on the anaconda builds) but I don't see a related commit on the numpy repo, which makes me think this was an anaconda issue.

arlenk on 24 Jan 2018

👍1

Thanks for investigating! Can you clarify / confirm with NumPy and Anaconda devs?

piskvorky on 24 Jan 2018

I can confirm, that at least for me on OSX, the issue is with numpy/anaconda, not gensim.

@Ronggui, can you confirm that you see the same issue with numpy? You can execute the follwoing code:

import numpy

v1 = numpy.asarray([0., 2.], dtype='f')
v2 = numpy.asarray([0., 1.], dtype='f')
print(numpy.dot(v1, v2))

If you get back 0, then the problem is with your numpy/anaconda install.

arlenk on 25 Jan 2018

👍2 🎉1

@arlenk It can be confirmed that conda's numpy (Mac OX) is the source of problem.

 package                    |            build
---------------------------|-----------------
numpy-1.13.3               |   py36h8a80b8c_2         3.7 MB  anaconda

In [1]: import numpy

In [2]: v1 = numpy.asarray([0., 2.], dtype='f')
...: v2 = numpy.asarray([0., 1.], dtype='f')
...: print(numpy.dot(v1, v2))
...:
0.0

Rongguis-MacBook-Air:~ rghuang$ pip install numpy
Collecting numpy
Using cached numpy-1.14.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.14.0

In [1]: import numpy
...:
...: v1 = numpy.asarray([0., 2.], dtype='f')
...: v2 = numpy.asarray([0., 1.], dtype='f')
...: print(numpy.dot(v1, v2))
...:
2.0

Ronggui on 29 Jan 2018

👍2

Nice work, thanks @arlenk and @Ronggui for investigation!
I close this issue because this isn't a gensim bug.

menshikh-iv on 29 Jan 2018

probably related report - https://groups.google.com/forum/#!topic/gensim/6Uuj-yr5-Ls

CC: @piskvorky @Ronggui @arlenk

menshikh-iv on 2 Feb 2018

Thanks for everyone. But I have not found the solution for this issue. Could @menshikh-iv not close it so quickly? I have seen you closed another post quickly recently and it was not the same issue you referenced. If there is any pressure you need to close some threads (maybe KPI?), please let us know. Thanks.

are you sure you replied to the correct issue? this thread was regarding an issue in numpy 1.13.3. As the issue was in numpy (and was fixed in later numpy releases), there's really nothing gensim (or any other library) can do. Your best bet is moving to a more recent version of numpy, unfortunately,

arlenk on 17 Dec 2018

👍1

https://stackoverflow.com/questions/26812617/index-error-when-running-lda-in-gensim

Check this out, maybe it is the problem?