Add an option (likely subclassing Word2vec) to train word2vec model using GPU.
Careful with the issue numbers @ziky90 .
I'd like to help contribute a PR for this. For implementation with the GPU, what sort of dependency constraints or preferences do you have?
So far, libraries we can potentially use to implement the GPU versions are:
Although I already have an implementation for Theano, but I was wondering whether there were specific types of preferences you have in terms of adding additional dependencies.
@yutarochan Thanks for your interest. There has been an implemenation in Keras - would really appreciate if you could evaluate it and tell us what you think.
https://github.com/niitsuma/word2vec-keras-in-gensim
I've successfully tested word2veckeras using keras 0.3.1 with theano backend.
I'll try to make it compatible with the current version of keras.
I also want to rewrite a part of Word2vec training using theano functions.
@SimonPavlik could you please post a link to the results of your experiments here?
word2veckeras on GPU is slower than gensim on CPU. Results in
BTW Deeplearing4j are also working to resolve batching issues in order to make word2vec run on GPU faster than on CPU
I have 4 Titan Xs sitting on a bus within the same Supermicro enclosure, overclocked to 1342Mhz.
If the software is stable and all it needs either TF or Theano I could attempt to benchmark it.
Note: Previously I observed Keras programs run 5 times faster when backed by Theano than when backed by TF.
@phalexo Thanks a lot for volunteering the hardware! Adding a Titan benchmark to this list would be great https://github.com/RaRe-Technologies/gensim/pull/1033#issuecomment-273567836
Please ask @markroxor for the exact code he ran.
@SimonPavlik : not sure I got this right from your article, but my understanding is that the GPU code was run with only one data loading thread. If that is true, then I can imagine that the speed bottleneck is at the data loading level, not at the GPU. Is there any comparison where the model has "enough" data loading threads ?
That's right @octavian-ganea, only one worker was used for the preprocessing. Even with the preprocessed data ready in the memory, a single threaded generator couldn't keep the GPU busy.
I don't know any GPU implementation that works faster as current CPU word2vec, if we have any benchmark results/good reference implementations - please post it here.
Hi, I'm using doc2vec. What is the current state, is there a GPU acceleration for doc2vec? Maybe a GPU mode in gensim? Thanks alot
No, and there is no plan for adding that either. Let me close this issue.
Most helpful comment
@yutarochan Thanks for your interest. There has been an implemenation in Keras - would really appreciate if you could evaluate it and tell us what you think.
https://github.com/niitsuma/word2vec-keras-in-gensim