I get a speed of 100k word/sec when running Word2Vec with one worker. Adding five workers result in the same speed with five CPUs utilized up to 20%.
Is that expected?
No. Sounds like some problem with Cython.
Can you post the value of word2vec.FAST_VERSION and Cython version? And maybe the log too, as a sanity check.
How long are your sentences? Anything special about the data?
All the answers are included in this ipython notebook
http://nbviewer.ipython.org/gist/aboSamoor/fe70098abbb425622ce4
I can't replicate this.
Can you manually modify this line https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx#L205 to be fast_sentence = fast_sentence2?
Let's see if it's connected to BLAS somehow.
I am using openblas and that is why it does not show up in scipy. When I
was using ATLAS the speed was 33k word/sec.
On Jan 23, 2014 1:18 PM, "Radim Řehůřek" [email protected] wrote:
I can't replicate this.
Can you manually modify this line
https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx#L205to be fast_sentence
= fast_sentence2?Let's see if it's connected to BLAS somehow.
—
Reply to this email directly or view it on GitHubhttps://github.com/piskvorky/gensim/issues/157#issuecomment-33152111
.
Ok, I switched my fast_sentence to version 2 which will use cython only without any blas. The speed is lower by 4x. However, the behaviour is the same! More workers do not buy you anything
http://nbviewer.ipython.org/gist/aboSamoor/68ee65496ce8ad7fa552
Ok, thanks. That means I'm out of ideas. Something wrong with releasing GIL in Cython, I suppose. The next step will be creating some simple, minimal Cython program to release the GIL and test that (no gensim).
But why are you upside down abo, are you Australian?
I tried on a machine with OpenBLAS (FAST_VERSION=1) and the same cython as you (0.19.2), but still couldn't replicate the problem. Speed went from 194k/s (1 worker) to 446k/s (4 workers).
Ok, I was able to fix the problem by adding the following line before the multi-wroker call
os.system("taskset -p 0xff %d" % os.getpid())
Before, 4 workers will run on the same CPU, each getting 25% utilization.
After adding the above line, I can see 4 CPU cores running 100%. The speed went up from 110K word/sec to 150k word/sec (not as good speedup as you get but maybe that is a different problem).
I would appreciate it if you let me know more about your OpenBLAS setup.
The solution is more explained here
http://stackoverflow.com/questions/15639779/what-determines-whether-different-python-processes-are-assigned-to-the-same-or-d/15641148#15641148
This was OpenBLAS straight from Debian (Ubuntu) package, no special tuning. NumPy and SciPy also from repo:
$ dpkg -l | grep -E 'openblas|numpy|scipy'
ii libopenblas-base 0.2.8-2 amd64 Optimized BLAS (linear algebra) library based on GotoBLAS2
ii libopenblas-dev 0.2.8-2 amd64 Optimized BLAS (linear algebra) library based on GotoBLAS2
ii python-numpy 1:1.7.1-1ubuntu1 amd64 Numerical Python adds a fast array facility to the Python language
ii python-scipy 0.12.0-2ubuntu1 amd64 scientific tools for Python
$ uname -a
Linux hetrad 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
What's the status here, @aboSamoor ? Did the taskset call resolve your issues?
Yes, it is resolved.
On Apr 10, 2014 4:51 AM, "Radim Řehůřek" [email protected] wrote:
What's the status here, @aboSamoor https://github.com/aboSamoor ? Did
the taskset call resolve your issues?—
Reply to this email directly or view it on GitHubhttps://github.com/piskvorky/gensim/issues/157#issuecomment-40055947
.
Most helpful comment
Ok, thanks. That means I'm out of ideas. Something wrong with releasing GIL in Cython, I suppose. The next step will be creating some simple, minimal Cython program to release the GIL and test that (no gensim).
But why are you upside down abo, are you Australian?