Seems like
https://github.com/zalandoresearch/flair/blob/master/flair/embeddings.py#L139
Does not make use of the OoV functionality present in FastText, seems like it would be a nice addition!
Absolutely, this would be a cool addition!
I also just recently noticed that the fastText corpus downloaded by Flair doesn't contain n-grams stored, is anyone taking the this? I could eventually look into it, but I guess it depends on having fastText models containing subword (i.e., n-gram) available.
Yes, we haven't gotten around to looking into this, so currently we only use FastText without the subword features. So any help here would be appreciated!
As a note of caution, we've observed ourselves and heard from other groups that the subword features often negatively impact downstream task performance. One guess is that this is because of how unknown words are handled - without subwords, they are simply marked as UNKs so the model can learn to deal with unknown words. But with subwords, an embedding is always produced based on ngrams which may not always be a good embedding and thus produce errors.
Hey guys. As a part of experimentation, I needed this feature and so I wrote one class to do the same. Anyone who needs this functionality, feel free to use it - https://github.com/pranaychandekar/fasttext-embeddings-with-flair
Hey that's great - can't wait to try it out! Would you like to add this in a PR?
Sure @alanakbik ! Would love to do that.
Closed by #879 :)
Most helpful comment
Hey guys. As a part of experimentation, I needed this feature and so I wrote one class to do the same. Anyone who needs this functionality, feel free to use it - https://github.com/pranaychandekar/fasttext-embeddings-with-flair