Gensim: Doc2Vec - checking similarity between words and docs

Created on 29 Oct 2015  Â·  2Comments  Â·  Source: RaRe-Technologies/gensim

I checked out the blog here on Word2Vec and Doc2Vec examples using gensim and tried to use the function as given there under the section of - 'Summarizing sentences & documents':

def get_vector(word):
   return model.syn0norm[model.vocab[word].index]
def calculate_similarity(sentence, word):
   vec_a = get_vector(sentence)
   vec_b = get_vector(word)
   sim = np.dot(vec_a, vec_b)
   return sim
calculate_similarity('SENT_47973, 'casual')

I used the IMDB dataset and the models as learnt from running the Doc2Vec ipython notebook example.

I made the change of syn0norm to syn0 in the return statement for get_vector(), but the function does not work when passed a doc_id as got by:

doc2vec_model = Doc2Vec.load('imdb-d2v.doc2vec')
doc_id = np.random.randint(doc2vec_model.docvecs.count)
print calculate_similarity(doc_id, 'movies')

Most helpful comment

Since gensim 0.12 document vectors are in a separate structure, the docvecs property of the main model. So you won't get a document's vector from the main model's syn0/syn0norm.

You can still compare word and document vectors with a few extra steps – each of the main model's and the docvecs model's similarity methods can take an external vector (instead of a lookup key). There's an example on the mailing list:

https://groups.google.com/d/msg/gensim/Fujja7aOH6E/C3WArofWbNIJ

All 2 comments

Since gensim 0.12 document vectors are in a separate structure, the docvecs property of the main model. So you won't get a document's vector from the main model's syn0/syn0norm.

You can still compare word and document vectors with a few extra steps – each of the main model's and the docvecs model's similarity methods can take an external vector (instead of a lookup key). There's an example on the mailing list:

https://groups.google.com/d/msg/gensim/Fujja7aOH6E/C3WArofWbNIJ

Thank you Gordon @gojomo for your lightning quick response, and for clarifying the difference between the storage of the word vectors and the document vectors.

I will check out your example as linked and take it from there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shubhvachher picture shubhvachher  Â·  4Comments

volj1 picture volj1  Â·  4Comments

bgokden picture bgokden  Â·  3Comments

franciscojavierarceo picture franciscojavierarceo  Â·  3Comments

ahmedbhabbas picture ahmedbhabbas  Â·  4Comments