Gensim: word2vec pickle error help

Created on 19 Sep 2016  路  6Comments  路  Source: RaRe-Technologies/gensim

Hello,
I am new to gensim and am trying to load an english word2vec model with my python script model.py and test it:

import gensim.models.word2vec
model = gensim.models.Word2Vec.load("en.model")
model.similarity('woman', 'man')

Now I googled and found out it is an error caused by pickling it. A suggestion is to use:
pickle.load(file_obj, encoding='latin1')

But how do I apply that suggestion? Or is there another way to solve the problem?

"C:\Program Files (x86)\Anaconda3\python.exe" "C:/Users/M/PycharmProjects/Twitter Sentiment Analysis/Word2Vec/Model.py"
Traceback (most recent call last):
File "C:/Users/M/PycharmProjects/Twitter Sentiment Analysis/Word2Vec/Model.py", line 5, in
model = gensim.models.Word2Vec.load("german.model")
File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\models\word2vec.py", line 1684, in load

model = super(Word2Vec, cls).load(*args, **kwargs)

File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\utils.py", line 248, in load
obj = unpickle(fname)
File "C:\Program Files (x86)\Anaconda3\lib\site-packages\gensim\utils.py", line 911, in unpickle
return _pickle.loads(f.read())
_pickle.UnpicklingError: invalid load key, '6'.

Process finished with exit code 1

Most helpful comment

Looking at their code (https://github.com/devmount/GermanWordEmbeddings/blob/c2b603a07d968146995ee9dde54a25fd0aa8586a/training.py#L56), I see they've saved the model via save_word2vec_format() - which means you'd need to use Word2Vec.load_word2vec_format() to have a chance of loading.

I can also tell from the included notebook that Python 2.7.6 was used. (See the bottom of: https://raw.githubusercontent.com/devmount/GermanWordEmbeddings/master/code/training.ipynb). So if you still have problems after using load_word2vec_format(), you may want to try using Python 2.7.6.

All 6 comments

How was the model in file 'en.model' initially created and saved? (Was it using the same versions of Python, gensim, OS, etc?)

I got the model from:
http://devmount.github.io/GermanWordEmbeddings/
I do not know of any specific versions that he used except that it was python and gensim.
Now I use python 3.5 on Windows 8 to load the model.

Looking at their code (https://github.com/devmount/GermanWordEmbeddings/blob/c2b603a07d968146995ee9dde54a25fd0aa8586a/training.py#L56), I see they've saved the model via save_word2vec_format() - which means you'd need to use Word2Vec.load_word2vec_format() to have a chance of loading.

I can also tell from the included notebook that Python 2.7.6 was used. (See the bottom of: https://raw.githubusercontent.com/devmount/GermanWordEmbeddings/master/code/training.ipynb). So if you still have problems after using load_word2vec_format(), you may want to try using Python 2.7.6.

@Max-programmer Did using Python 2 help? If yes, then I would like to close this issue.

@tmylk: I got sick, will try it out tomorrow or saturday and will post about the results here. Thanks in advance!

Thanks, you can close the issue now.

Was this page helpful?
0 / 5 - 0 ratings