Flair: Add class for FastText out-of-vocabulary word embeddings

Created on 11 Jul 2018 · 7Comments · Source: flairNLP/flair

Seems like

https://github.com/zalandoresearch/flair/blob/master/flair/embeddings.py#L139

Does not make use of the OoV functionality present in FastText, seems like it would be a nice addition!

good first issue help wanted

Source

joelkuiper

👍1

Most helpful comment

Hey guys. As a part of experimentation, I needed this feature and so I wrote one class to do the same. Anyone who needs this functionality, feel free to use it - https://github.com/pranaychandekar/fasttext-embeddings-with-flair

pranaychandekar on 12 Jul 2019

👍5

All 7 comments

Absolutely, this would be a cool addition!

alanakbik on 11 Jul 2018

I also just recently noticed that the fastText corpus downloaded by Flair doesn't contain n-grams stored, is anyone taking the this? I could eventually look into it, but I guess it depends on having fastText models containing subword (i.e., n-gram) available.

davidsbatista on 7 Feb 2019

👍1

Yes, we haven't gotten around to looking into this, so currently we only use FastText without the subword features. So any help here would be appreciated!

As a note of caution, we've observed ourselves and heard from other groups that the subword features often negatively impact downstream task performance. One guess is that this is because of how unknown words are handled - without subwords, they are simply marked as UNKs so the model can learn to deal with unknown words. But with subwords, an embedding is always produced based on ngrams which may not always be a good embedding and thus produce errors.

alanakbik on 8 Feb 2019

pranaychandekar on 12 Jul 2019

👍5

Hey that's great - can't wait to try it out! Would you like to add this in a PR?

alanakbik on 12 Jul 2019

Sure @alanakbik ! Would love to do that.

pranaychandekar on 13 Jul 2019

🎉1

Closed by #879 :)

alanakbik on 16 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings