Gensim: fastText models from 2.3.0 can't be loaded in 3.0.0

Created on 22 Oct 2017  路  6Comments  路  Source: RaRe-Technologies/gensim

Description

I do have a compatibility issue with fastText and version 3.0.0. In version 2.3.0, I used the fastText C++ wrapper to train a model based on the code available at that time from
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/FastText_Tutorial.ipynb

This code works in 2.3.0

from gensim.models.wrappers.fasttext import FastText as FT_wrapper
model = FT_wrapper.load(model_path)
if key in model:
    character_embedding = model[key]

In 3.0.0 it fails due to

File "scripts/foo.py", line 43, in reduce_fasttext_embedding
character_embedding = model[key]

File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 1345, in __getitem__
return self.wv.__getitem__(words)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/keyedvectors.py", line 602, in __getitem__
return self.word_vec(words)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/wrappers/fasttext.py", line 94, in word_vec
word_vec = np.zeros(self.syn0_ngrams.shape[1])
AttributeError: 'FastTextKeyedVectors' object has no attribute 'syn0_ngrams'

Expected Results

I expected the model from 2.3.0 to be loadable in 3.0.0. I was able to get my code working by downgrading to 2.3.0. I made some evaluations with trained models and I'd be happy to still use these models. Otherwise, I'm stuck at gensim 2.3.0

@menshikh-iv
I guess this has something to do with this commit https://github.com/RaRe-Technologies/gensim/commit/6e511565c1721636cfd14f88df3a08e124e14364#diff-cd6e655ec64f5b3927aa96ce5d006207 and split 'syn0_all' into 'syn0_vocab' and 'syn0_ngrams'. I'm guessing that models trained with 2.3.0 aren't compatible with version 3. Is it possible that the load method checks whether the model was trained in 2.3.0, loads the 2.3.0 method, and internally makes the same split?

bug difficulty medium

Most helpful comment

It will be great @chinmayapancholi13, I'm glad to see you here again :)

All 6 comments

Or another idea to solve this: Can you create a utilsscript that transforms a 2.3.0 model into a 3.0.0 model?

@Liebeck Thanks for the report

I think possible to check this in load method, wdyt @chinmayapancholi13?

Can you fix this bug and create PR @Liebeck @chinmayapancholi13?

I'm not sure if I understand enough of gensim's architecture to contribute a quick fix. I might be able to have a further look at in January :neutral_face:

@Liebeck Thanks for reporting this issue! Seems to be a problem in the load function.

@menshikh-iv Hey Ivan! I am a little occupied in this week. So I can take a look at this and try to get it resolved in the following week. I hope this is fine. I'll give an update about my progress here. :)

It will be great @chinmayapancholi13, I'm glad to see you here again :)

Fixed in #1723

Was this page helpful?
0 / 5 - 0 ratings