Flair: Difference between ELMO and FLAIR embeddings

Created on 19 Apr 2019  路  5Comments  路  Source: flairNLP/flair

Hello. Let me first say that I'm not a computer scientist or experienced machine learning engineer or something similar, so I hope my question is okay :) I'm struggling with seeing the difference between ELMO and Flair embeddings. Are they not both using a character bi-lstm and both are a concatenation of the output from the hidden state?

question

Most helpful comment

Hello @xraycat123 - yes both ELMo and Flair are very similar in that they extract embeddings from language models. The main difference is that ELMo is a word-level language model, whereas Flair is purely character-based and is trained without an explicit notion of what a word is. This means that we extract embeddings in Flair from the first and last character states of each word to generate a word embedding.

Computationally, going character-level has the nice property that the vocabulary size is very small (there are only a few hundred distinct characters vs potentially millions of distinct words), making these embeddings easy to train. Also, character-level models are shown to deal well with rare and out-of-vocabulary words and morphologically rich languages. For a better overview, I would suggest going through the Flair paper or this overview article.

All 5 comments

Hello @xraycat123 - yes both ELMo and Flair are very similar in that they extract embeddings from language models. The main difference is that ELMo is a word-level language model, whereas Flair is purely character-based and is trained without an explicit notion of what a word is. This means that we extract embeddings in Flair from the first and last character states of each word to generate a word embedding.

Computationally, going character-level has the nice property that the vocabulary size is very small (there are only a few hundred distinct characters vs potentially millions of distinct words), making these embeddings easy to train. Also, character-level models are shown to deal well with rare and out-of-vocabulary words and morphologically rich languages. For a better overview, I would suggest going through the Flair paper or this overview article.

I see, that makes sense and thank you for the articles. I am using Flair for clinical entity recognition, and I often have to deal with rare and OUV words, and so far Flair is just brilliant compared to other frameworks I have tried so far in terms of accuracy and usability. 馃憤

Great, glad to hear :) Let us know if you have more questions / results to share.

Hello, Would be possible to train flair embeddings instead of (chars or words), in a word piece tokenization basics?

Hello @alejandrojcastaneira generally it should be possible, but it's not yet supported. One would need to add the word piece tokenization to the data loader and the language model. From our side, this is not a high priority so it will probably not happen anytime soon, but next time we do a bigger refactoring of the LM training maybe we can also add this. Of course, community contributions are always welcome!

Was this page helpful?
0 / 5 - 0 ratings