I checked out the blog here on Word2Vec and Doc2Vec examples using gensim and tried to use the function as given there under the section of - 'Summarizing sentences & documents':
def get_vector(word):
return model.syn0norm[model.vocab[word].index]
def calculate_similarity(sentence, word):
vec_a = get_vector(sentence)
vec_b = get_vector(word)
sim = np.dot(vec_a, vec_b)
return sim
calculate_similarity('SENT_47973, 'casual')
I used the IMDB dataset and the models as learnt from running the Doc2Vec ipython notebook example.
I made the change of syn0norm to syn0 in the return statement for get_vector(), but the function does not work when passed a doc_id as got by:
doc2vec_model = Doc2Vec.load('imdb-d2v.doc2vec')
doc_id = np.random.randint(doc2vec_model.docvecs.count)
print calculate_similarity(doc_id, 'movies')
Since gensim 0.12 document vectors are in a separate structure, the docvecs property of the main model. So you won't get a document's vector from the main model's syn0/syn0norm.
You can still compare word and document vectors with a few extra steps – each of the main model's and the docvecs model's similarity methods can take an external vector (instead of a lookup key). There's an example on the mailing list:
https://groups.google.com/d/msg/gensim/Fujja7aOH6E/C3WArofWbNIJ
Thank you Gordon @gojomo for your lightning quick response, and for clarifying the difference between the storage of the word vectors and the document vectors.
I will check out your example as linked and take it from there.
Most helpful comment
Since gensim 0.12 document vectors are in a separate structure, the
docvecsproperty of the main model. So you won't get a document's vector from the main model'ssyn0/syn0norm.You can still compare word and document vectors with a few extra steps – each of the main model's and the docvecs model's similarity methods can take an external vector (instead of a lookup key). There's an example on the mailing list:
https://groups.google.com/d/msg/gensim/Fujja7aOH6E/C3WArofWbNIJ