A bug in Fasttext native implementation causes syn0 to be equal to syn0_vocab at the end of training. This causes incorrect learning of vectors during online training.
from gensim.models.word2vec import LineSentence
from gensim.models.fasttext import FastText as FT_gensim
import os
import gensim
data_dir = '{}'.format(os.sep).join([gensim.__path__[0], 'test', 'test_data']) + os.sep
data_file = '{}lee_background.cor'.format(data_dir)
sentences = LineSentence(data_file)
model = FT_gensim(sg=1, hs=0,window=2, negative=5, iter=1)
model.build_vocab(sentences)
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter)
print (model.wv.syn0 == model.wv.syn0_vocab).all()
False
True
Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('gensim', '3.1.0')
('FAST_VERSION', 1)
@manneshiva good description, but if you have information/ideas where is the concrete problem / how to solve it, please add it to your report.
(off topic: regarding these weird '{}'.format constructions, have a look at os.path.join)
Most helpful comment
(off topic: regarding these weird
'{}'.formatconstructions, have a look atos.path.join)