Can anyone explain what is advantage of using pre trained vectors as input to fast text?
does the length of feature vector still remain equal to vocabulary length? or it becomes equal to size of vector dimension?
Hey omerarshad,
The reason that you would use a pretrained model is that you could shorten your own training time or have a better quality wordvector model.
For example google has a trained vector model on google news which is trained on a corpus which contains several billions of words. But if you would like to use this same model on a different type of text it might be useful to retrain it to another vector model.
Another usecase is classification, so you just train a vector model once and then use it to train several classification models.
But there are probably other reasons which i do not know of.
usually the vector and word embeddings are a much larger training set than just your sentences you're using to classify. I've heard people call this the "background vector"
@borissmidt any idea on how can fasttext skipgram/cbow vectors can be incrementally trained, i.e use the existing pretrained fasttext model.
Would love to know how to retrain the vectors based on a new dataset, too.
Most helpful comment
@borissmidt any idea on how can fasttext skipgram/cbow vectors can be incrementally trained, i.e use the existing pretrained fasttext model.