Gensim: Load Pre-trained word2vec vectors

Created on 15 Sep 2018 · 4Comments · Source: RaRe-Technologies/gensim

Hi,

I have heard a lot about gensim, but now when I am trying to use it, for probably the simplest task, i.e loading pre-trained embeddings, I am stuck for hours.

Consider:

from gensim.models import KeyedVectors
# Load vectors directly from the file
model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
# Access vectors for specific words with a keyed lookup:
vector = model['simple']

python word2vec.pyTraceback (most recent call last):  File "word2vec.py", line 3, in <module>
    model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
  File "/home/andy/anaconda3/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 1436, in load_word2vec_format    limit=limit, datatype=datatype)
  File "/home/andy/anaconda3/lib/python3.6/site-packages/gensim/models/utils_any2vec.py", line 178, in _load_word2vec_format
    result.vectors = zeros((vocab_size, vector_size), dtype=datatype)
MemoryError

Could gensim be used to load word2vec pre-trained embeddings released by google? How could I do so?

Cheers!

Source

andymancodes

Most helpful comment

Hello @andymancodes,

Could gensim be used to load word2vec pre-trained embeddings released by google?

of course, but google pre-trained vectors are really huge, you should have enough of RAM to use it. For reduce memory usage, you can load only part of this vector, for this, specify limit parameter, i.e.

model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True, limit=10 ** 5)

For future, please use mailing list for questions (github issues only for feature request/bug report).

menshikh-iv on 16 Sep 2018

👍2

All 4 comments

Hello @andymancodes,

Could gensim be used to load word2vec pre-trained embeddings released by google?

model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True, limit=10 ** 5)

For future, please use mailing list for questions (github issues only for feature request/bug report).

menshikh-iv on 16 Sep 2018

👍2

thanks @menshikh-iv ! it solved the issue, will use the mailinglist in the future, thanks for the link!

andymancodes on 16 Sep 2018

Nice!!!.
Also worked for me

wakamd on 31 Dec 2018

👍1

Hi,

I am trying to use gensim model but giving below error. I have been trying for 2 days and checked my RAM. I have 16 GB RAM and only 29% is used at the time of running this code unable to understand how to fix it. Please help.

Code snippet:

import gensim.downloader as api
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

Error:

C:\Python3.7\python.exe C:/Users/amitabhseth/IdeaProjects/class1/Test1.py

Traceback (most recent call last):
File "C:/Users/amitabhseth/IdeaProjects/class1/Test1.py", line 34, in
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
File "C:\Python3.7\lib\site-packages\gensim\downloader.py", line 502, in load
return module.load_data()
File "C:\Users\amitabhseth/gensim-data\fasttext-wiki-news-subwords-300__init__.py", line 8, in load_data
model = KeyedVectors.load_word2vec_format(path, binary=False)
File "C:\Python3.7\lib\site-packages\gensim\models\keyedvectors.py", line 1498, in load_word2vec_format
limit=limit, datatype=datatype)
File "C:\Python3.7\lib\site-packages\gensim\models\utils_any2vec.py", line 349, in _load_word2vec_format
result.vectors = zeros((vocab_size, vector_size), dtype=datatype)
MemoryError