Flair: Is Glove embedding going to be updated

Created on 27 Mar 2019 · 11Comments · Source: flairNLP/flair

From the following code, I'm not sure if the glove embedding is going to be updated or just simply stay as it is.

from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings

# create a StackedEmbedding object that combines glove and forward/backward flair embeddings
stacked_embeddings = StackedEmbeddings([
                                        WordEmbeddings('glove'), 
                                        FlairEmbeddings('news-forward'), 
                                        FlairEmbeddings('news-backward'),
                                       ])

question

Source

allanj

Most helpful comment

@allanj @Huijun-Cui sorry for the delayed reponse (still travelling) but you are correct: By default we put a fully connected layer on each embedding. The motivation here is that most implementations use standard word embeddings to initialize the embedding layer, i.e. the linear map that takes a one-hot encoded word and produces an embedding. So in most implementations this embedding layer is fine-tuned on the downstream task. In our implementation, we instead do a simple lookup in Gensim for the word embedding. This means that there is no linear map and so no fine-tuning is possible here. To address this, we add a fully connected layer on top that is trainable to achieve a similar effect. Hope this clarifies!

alanakbik on 4 Apr 2019

👍3

All 11 comments

no , it is fixed

Huijun-Cui on 27 Mar 2019

Only CharacterEmbeddings (here) are updated, as proposed by Lample et al. :)

stefan-it on 27 Mar 2019

So is the CharacterEmbeddings also used to reproduce the number for the CoNLL-2003 NER task?

allanj on 28 Mar 2019

Hello @allanj the current best known configuration for CoNLL NER is listed here and uses only pooled flair embeddings and glove embeddings, i.e. no CharacterEmbeddings. In our COLING paper, we evaluated different settings and found that the CharacterEmbeddings are not really necessary when already using FlairEmbeddings.

W.r.t. updating embeddings: The base GloVe and Flair embeddings never get updated, but we by default have a fully connected layer on top of the embedding layer before passing the embeddings into the RNN. This 'reprojection' layer may function similarly to updating embeddings since it takes the original embeddings in and outputs a modified version.

alanakbik on 28 Mar 2019

What does it mean by a fully connected layer on top of embedding?

fully connected layer on top of the embedding layer before passing the embeddings into the RNN

I thought after we have the contextual embedding of each word, we then feed into the BiLSTM, then a CRF layer?

allanj on 28 Mar 2019

@allanj I don`t represent the official , I had read the code , in my opinion , both the CharacterEmbeddings and ohter embedding is fixed , before we feed the embedding into the neural , we use the Linear Layer , so we can treat the Linear Layer as a represent the embedding of word

Huijun-Cui on 28 Mar 2019

I see. I think I understand what you mean. But it is the same architecture as traditional BiLSTM-CRF, am I right?

allanj on 29 Mar 2019

@allanj yeah , the default architecture is embedding layer + bilstm(1 layer) + crf , in the decoding step implement viterbi algorithm

Huijun-Cui on 29 Mar 2019

After reading the code, I think I have understood what @alanakbik is saying:
the architecture should be:
embedding layer (Fixed with pre-trained contextualized embedding) + fully connected layer for each word + BiLSTM + CRF

It seems this kind of settings is not mentioned in neither the paper nor supplementary material.
I'm trying to reproduce the number 93 (for CoNLL) using the flair embeddings offline (e.g., using Flair embeddings and our own BiLSTM-CRF code). I will appreciate that if there is any specific configurations I should be aware of.

allanj on 1 Apr 2019

alanakbik on 4 Apr 2019

👍3

Only CharacterEmbeddings (here) are updated, as proposed by Lample et al. :)

But Lample et al update/fine-tune word embeddings as well. From the paper:

Embeddings are pretrained using skip-n-gram (Ling et al., 2015a), a variation of word2vec (Mikolov et al., 2013a) that accounts for word order. These embeddings are fine-tuned during training

@stefan-it