Gensim: Word2vec to use GPU

Created on 13 Sep 2015 · 15Comments · Source: RaRe-Technologies/gensim

Add an option (likely subclassing Word2vec) to train word2vec model using GPU.

difficulty hard feature wishlist

Source

piskvorky

👍16

Most helpful comment

@yutarochan Thanks for your interest. There has been an implemenation in Keras - would really appreciate if you could evaluate it and tell us what you think.
https://github.com/niitsuma/word2vec-keras-in-gensim

tmylk on 8 Jul 2016

👍3

All 15 comments

Careful with the issue numbers @ziky90 .

piskvorky on 24 Nov 2015

I'd like to help contribute a PR for this. For implementation with the GPU, what sort of dependency constraints or preferences do you have?

So far, libraries we can potentially use to implement the GPU versions are:

Gnumpy
PyCUDA & PyOpenCL
NumbaPro
Theano

Although I already have an implementation for Theano, but I was wondering whether there were specific types of preferences you have in terms of adding additional dependencies.

yutarochan on 8 Jul 2016

👍1

tmylk on 8 Jul 2016

👍3

I've successfully tested word2veckeras using keras 0.3.1 with theano backend.
I'll try to make it compatible with the current version of keras.

I also want to rewrite a part of Word2vec training using theano functions.

SimonPavlik on 18 Jul 2016

👍1

@SimonPavlik could you please post a link to the results of your experiments here?

tmylk on 5 Oct 2016

word2veckeras on GPU is slower than gensim on CPU. Results in

https://rare-technologies.com/gensim-word2vec-on-cpu-faster-than-word2veckeras-on-gpu-incubator-student-blog/

SimonPavlik on 13 Oct 2016

BTW Deeplearing4j are also working to resolve batching issues in order to make word2vec run on GPU faster than on CPU

tmylk on 16 Oct 2016

I have 4 Titan Xs sitting on a bus within the same Supermicro enclosure, overclocked to 1342Mhz.
If the software is stable and all it needs either TF or Theano I could attempt to benchmark it.

Note: Previously I observed Keras programs run 5 times faster when backed by Theano than when backed by TF.

phalexo on 19 Feb 2017

@phalexo Thanks a lot for volunteering the hardware! Adding a Titan benchmark to this list would be great https://github.com/RaRe-Technologies/gensim/pull/1033#issuecomment-273567836
Please ask @markroxor for the exact code he ran.

tmylk on 22 Feb 2017

@SimonPavlik : not sure I got this right from your article, but my understanding is that the GPU code was run with only one data loading thread. If that is true, then I can imagine that the speed bottleneck is at the data loading level, not at the GPU. Is there any comparison where the model has "enough" data loading threads ?

octavian-ganea on 4 Oct 2017

That's right @octavian-ganea, only one worker was used for the preprocessing. Even with the preprocessed data ready in the memory, a single threaded generator couldn't keep the GPU busy.

SimonPavlik on 5 Oct 2017

I don't know any GPU implementation that works faster as current CPU word2vec, if we have any benchmark results/good reference implementations - please post it here.