Fasttext: Line breaks and training

Created on 9 May 2018 · 2Comments · Source: facebookresearch/fastText

I think it would be good for an option of allowing newlines to prevent across-lines training.

I created word vectors using a dataset where each entry was per line. Across-line correlations would be noise. The loss was low, but could be lower if it wasn't trying to predict across lines.

(I assume that newlines don't affect training... although I'd be happy to be told otherwise.)

Source

tom-adsfund

👍1

Most helpful comment

Hi @tom-adsfund,

fastText do use newlines to separate examples (for both supervised and unsupervised modes). Thus, when learning word vectors using skipgram or cbow, the words from the previous and next lines do not influence the learning of the current line. See line 347 of dictionary.cc for the corresponding code (newlines are replaced by EOS in the readWord method).

Best,
Edouard