Hello,
I am new to gensim and am trying to load an english word2vec model with my python script model.py and test it:
import gensim.models.word2vec
model = gensim.models.Word2Vec.load("en.model")
model.similarity('woman', 'man')
Now I googled and found out it is an error caused by pickling it. A suggestion is to use:
pickle.load(file_obj, encoding='latin1')
But how do I apply that suggestion? Or is there another way to solve the problem?
"C:\Program Files (x86)\Anaconda3\python.exe" "C:/Users/M/PycharmProjects/Twitter Sentiment Analysis/Word2Vec/Model.py"
Traceback (most recent call last):
File "C:/Users/M/PycharmProjects/Twitter Sentiment Analysis/Word2Vec/Model.py", line 5, in
model = gensim.models.Word2Vec.load("german.model")
File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\models\word2vec.py", line 1684, in loadmodel = super(Word2Vec, cls).load(*args, **kwargs)File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\utils.py", line 248, in load
obj = unpickle(fname)
File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\utils.py", line 911, in unpickle
return _pickle.loads(f.read())
_pickle.UnpicklingError: invalid load key, '6'.Process finished with exit code 1
How was the model in file 'en.model' initially created and saved? (Was it using the same versions of Python, gensim, OS, etc?)
I got the model from:
http://devmount.github.io/GermanWordEmbeddings/
I do not know of any specific versions that he used except that it was python and gensim.
Now I use python 3.5 on Windows 8 to load the model.
Looking at their code (https://github.com/devmount/GermanWordEmbeddings/blob/c2b603a07d968146995ee9dde54a25fd0aa8586a/training.py#L56), I see they've saved the model via save_word2vec_format() - which means you'd need to use Word2Vec.load_word2vec_format() to have a chance of loading.
I can also tell from the included notebook that Python 2.7.6 was used. (See the bottom of: https://raw.githubusercontent.com/devmount/GermanWordEmbeddings/master/code/training.ipynb). So if you still have problems after using load_word2vec_format(), you may want to try using Python 2.7.6.
@Max-programmer Did using Python 2 help? If yes, then I would like to close this issue.
@tmylk: I got sick, will try it out tomorrow or saturday and will post about the results here. Thanks in advance!
Thanks, you can close the issue now.
Most helpful comment
Looking at their code (https://github.com/devmount/GermanWordEmbeddings/blob/c2b603a07d968146995ee9dde54a25fd0aa8586a/training.py#L56), I see they've saved the model via
save_word2vec_format()- which means you'd need to useWord2Vec.load_word2vec_format()to have a chance of loading.I can also tell from the included notebook that Python 2.7.6 was used. (See the bottom of: https://raw.githubusercontent.com/devmount/GermanWordEmbeddings/master/code/training.ipynb). So if you still have problems after using
load_word2vec_format(), you may want to try using Python 2.7.6.