I trained a model with the corpus, and saved it to the disk by:
mode.save(filename)
Then I load the model and try to call the 'infer_vector' to calculate the vector of a new sentence by:
model = Doc2Vec.load(filename)
words = ['This', 'is', 'an', 'example']
model.infer_vector(words)
However, I get an exception as:
AttributeError: 'Doc2Vec' object has no attribute 'syn0'
How can I fixed this? The same reason like https://github.com/RaRe-Technologies/gensim/issues/483 ?
Thanks
@xchangcheng can you explain your fix / reason for closing? Other people may google up this issue in the future.
@piskvorky Sorry for closing it before.
As in the https://github.com/RaRe-Technologies/gensim/issues/483, I found that when I tried to load my model, I haven't loaded syn0 & syn1 successfully. I think the model I have trained before may have some problems.
So I retrained it and the problem gone. The following may be the output for a successful loading :)
2016-07-12 17:22:28,782 - gensim.utils - INFO - loading Doc2Vec object from ./imdb.d2v
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading docvecs recursively from ./imdb.d2v.docvecs.* with mmap=None
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading syn1neg from ./imdb.d2v.syn1neg.npy with mmap=None
2016-07-12 17:22:29,596 - gensim.utils - INFO - loading syn0 from ./imdb.d2v.syn0.npy with mmap=None
2016-07-12 17:22:29,604 - gensim.utils - INFO - loading syn1 from ./imdb.d2v.syn1.npy with mmap=None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute syn0norm to None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute cum_table to None
I'm having a similar problem. I'm not sure exactly what steps are required to reproduce it because it doesn't seem to happen every time.
I have a script which trains a model on about 700,000 paragraphs, with a vocabulary of about 100,000 words and then immediately saves the trained model using model.save(). When I just run one epoch, everything works fine: The syn0 and syn1 matrices are saved and I can load the model and compute similarities. But every time I have trained the model with a larger number of epochs (I'm trying 20. This takes a while so I have only done it a handful of times), the syn0 and syn1 matrices are not saved. Furthermore, after trying to save, the model object no longer has syn0 or syn1 properties, so if I try to train it again, I get "RuntimeError: you must first finalize vocabulary before training the model".
I don't know if the number of epochs is making a difference or if it is just a coincidence...
This is the most relevant part of my code:
epochs = 20
max_alpha = 0.025
min_alpha = 0.0001
modelSettings = {'size':300,'min_count':5,'window':8,'workers':3,'dm_concat':0,'alpha':max_alpha,'min_alpha':max_alpha}
modelName = 'model_dm'
modelSettings['dm'] = dm
print 'Initializing',modelName
model = Doc2Vec(getTaggedParagraphs(paragraphs),**modelSettings)
for i in range(epochs):
alpha = (max_alpha-min_alpha)*(epochs-i-1)/(epochs-1) + min_alpha
print 'Training %s epoch %2d, alpha: %.4f' % (modelName,i,alpha)
model.alpha = alpha
model.min_alpha = alpha
random.shuffle(paragraphs)
model.train(getTaggedParagraphs(paragraphs))
print 'Saving',modelName
if not os.path.exists(modelDir):
os.makedirs(modelDir)
model.save(os.path.join(modelDir,modelName))
(My paragraphs object contains both a string and tag for each paragraph, so the shuffle isn't mixing those up)
The number of epochs shouldn't affect saving at all: the structures have the same size/shape no matter how much training has occurred.
If a save() is both failing, and leaving the model damaged, perhaps something odd caused a mid-save failure. But, that should be obvious from a thrown error, logging output, or both.
I suggest making sure you're using the latest gensim, enabling logging to the INFO level, and extending your code example to confirm the expected existence of syn0 etc before save and its absence after.
Unrelated notes about your code: by supplying a corpus to the Doc2Vec constructor, training will automatically occur. And, by a default iter value from Word2Vec, training will make 5 iterations over the supplied corpus. So in fact your code is doing (1+20) trains of 5 iterations each, 105 passes over your corpus.
Closing as abandoned
Most helpful comment
The number of epochs shouldn't affect saving at all: the structures have the same size/shape no matter how much training has occurred.
If a
save()is both failing, and leaving the model damaged, perhaps something odd caused a mid-save failure. But, that should be obvious from a thrown error, logging output, or both.I suggest making sure you're using the latest gensim, enabling logging to the INFO level, and extending your code example to confirm the expected existence of
syn0etc before save and its absence after.Unrelated notes about your code: by supplying a corpus to the Doc2Vec constructor, training will automatically occur. And, by a default
itervalue from Word2Vec, training will make 5 iterations over the supplied corpus. So in fact your code is doing (1+20) trains of 5 iterations each, 105 passes over your corpus.