Steps to reproduce:
First, create fresh virtual environment, and then
pip install gensim==3.8.0
pip install nltk==3.4 sklearn matplotlib==3.0.3 networkx==2.3 pandas==0.24.2 statsmodels==0.9.0
wget https://raw.githubusercontent.com/mpenkov/gensim/numfocus/docs/src/gallery/020_howtos/run_howto_compare_lda.py
python run_howto_compare_lda.py
The segfault occurs on the 19th training pass. I'm unable to track it down further right now, but leaving this here for the record.
Not reproduced (tested on 2.7, 3.6 & 3.7, with scipy==1.3.0 and numpy==1.16.4 and numpy==1.17.0)
This is the trace I'm seeing on Ubuntu 18.04:
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff1683c02 in ?? () from /home/misha/envs/gensim-3.8.0/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so
(gdb) bt
#0 0x00007ffff1683c02 in ?? () from /home/misha/envs/gensim-3.8.0/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so
#1 0x00000000005ba4e1 in ?? ()
#2 0x000000000059419b in ?? ()
#3 0x000000000054ab38 in _PyEval_EvalCodeWithName ()
#4 0x00000000005d6be2 in _PyFunction_FastCallKeywords ()
#5 0x0000000000549dda in ?? ()
#6 0x000000000054dffd in _PyEval_EvalFrameDefault ()
#7 0x000000000054a8c1 in _PyEval_EvalCodeWithName ()
#8 0x00000000005d6be2 in _PyFunction_FastCallKeywords ()
#9 0x0000000000549dda in ?? ()
#10 0x000000000054dffd in _PyEval_EvalFrameDefault ()
#11 0x000000000054a8c1 in _PyEval_EvalCodeWithName ()
#12 0x00000000005d7ea9 in _PyFunction_FastCallDict ()
#13 0x0000000000589493 in ?? ()
#14 0x00000000005d73e9 in _PyObject_FastCallKeywords ()
#15 0x0000000000549fd1 in ?? ()
#16 0x000000000054dffd in _PyEval_EvalFrameDefault ()
#17 0x000000000054a8c1 in _PyEval_EvalCodeWithName ()
#18 0x000000000054cb83 in PyEval_EvalCode ()
#19 0x000000000062d012 in ?? ()
#20 0x000000000062d0ca in PyRun_FileExFlags ()
#21 0x000000000062ddd8 in PyRun_SimpleFileExFlags ()
#22 0x0000000000650ad5 in ?? ()
#23 0x0000000000650bfe in _Py_UnixMain ()
#24 0x00007ffff7a05b97 in __libc_start_main (main=0x4ba4a0 <main>, argc=2, argv=0x7fffffffd428, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd418)
at ../csu/libc-start.c:310
#25 0x00000000005dee4a in _start ()
I'm working on getting a more helpful version of it.
(gensim-3.8.0) misha@cabron:~/git/gensim/docs/src$ pip freeze
boto==2.49.0
boto3==1.9.200
botocore==1.12.200
certifi==2019.6.16
chardet==3.0.4
cycler==0.10.0
decorator==4.4.0
docutils==0.14
gensim==3.8.0
idna==2.8
jmespath==0.9.4
joblib==0.13.2
kiwisolver==1.1.0
matplotlib==3.0.3
networkx==2.3
nltk==3.4
numpy==1.17.0
pandas==0.24.2
patsy==0.5.1
pyparsing==2.4.2
python-dateutil==2.8.0
pytz==2019.2
requests==2.22.0
s3transfer==0.2.1
scikit-learn==0.21.3
scipy==1.3.0
singledispatch==3.4.0.3
six==1.12.0
sklearn==0.0
smart-open==1.8.4
statsmodels==0.9.0
urllib3==1.25.3
Another trace, this time with the faulthandler enabled:
Fatal Python error: Segmentation fault
Thread 0x00007fb7421c1700 (most recent call first):
File "/usr/lib/python3.7/threading.py", line 296 in wait
File "/usr/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
File "/usr/lib/python3.7/threading.py", line 870 in run
File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
Current thread 0x00007fb74a651740 (most recent call first):
File "/home/misha/git/gensim/gensim/models/ldamodel.py", line 519 in __init__
File "/home/misha/git/gensim/gensim/models/ldamulticore.py", line 184 in __init__
File "gallery/020_howtos/howto_compare_lda.py", line 67 in <module>
Segmentation fault (core dumped)
real 1m41.153s
user 5m21.362s
sys 2m58.306s
TODO for me: make a Dockerfile to help reproduce this.
We could not reproduce on @mpenkov's computer today any more. run_howto_compare_lda.py ran successfully.