Flair: Problem with max_sequence_length in BertEmbeddings

Created on 9 Apr 2020 · 7Comments · Source: flairNLP/flair

Currently, BertEmbeddings does not account for the maximum sequence length supported by the underlying (transformers) BertModel. Since BERT creates subtokens, it becomes somewhat challenging to check sequence-length and trim sentence externally before feeding it to BertEmbeddings in flair.

I see a problem in https://github.com/flairNLP/flair/blob/master/flair/embeddings.py#L2678--L2687

        # first, find longest sentence in batch
        longest_sentence_in_batch: int = len(
            max(
                [
                    self.tokenizer.tokenize(sentence.to_tokenized_string())
                    for sentence in sentences
                ],
                key=len,
            )
        )

This is passed to

        # prepare id maps for BERT model
        features = self._convert_sentences_to_features(
            sentences, longest_sentence_in_batch
        )

which sets max_sequence_length in:

https://github.com/flairNLP/flair/blob/master/flair/embeddings.py#L2620-L2622
```python
_convert_sentences_to_features(
self, sentences, max_sequence_length: int
)
````

But this does not account for or check the max-sequence-length supported by the BERT model, which is accessible in either of the above functions through self.model.config.max_position_embeddings.

bug

Source

ayushjaiswal

Most helpful comment

Thanks for the pointer - yes this looks promising so we might integrate it!

alanakbik on 15 Apr 2020

👍2

All 7 comments

Hi @ayushjaiswal we are in the process of refactoring the transformer-based embeddings classes. See #1494. Instead of separate classes for each transformer embedding, we will have a unified class that gets the transformer model as string in the constructor. So initialization will be like this:

# example sentence
sentence = Sentence('The grass is green')

# a BERT model
embeddings = TransformerWordEmbeddings(model="bert-base-uncased", layers="-1", pooling_operation='first')
embeddings.embed(sentence)

# a roBERTa model
embeddings = TransformerWordEmbeddings(model="distilroberta-base", layers="-1", pooling_operation='first')
embeddings.embed(sentence)

There is now also a corresponding TransformerDocumentEmbeddings class in case you want document embeddings out of the transformer.

We're also looking at different ways for handling overlong sequences as part of the refactoring. We will add handling for this soon.

alanakbik on 9 Apr 2020

👍1

@alanakbik Thanks for the quick response! Great to hear about the refactoring and handling of overlong sequences. self.model.config.max_position_embeddings definitely needs to be accounted for so that the input sequence during forward pass of the BertModel does not encounter sequences of length greater than that. Currently, when the length does exceed the limit, a RuntimeError occurs caused by a CUDA AssertionError which corrupts the CUDA context and requires re-initialization of the CUDA session. Even if the input sequence is trimmed, I guess it will create a problem with assigning embeddings to Sentence tokens. It seems somewhat tricky 😅

ayushjaiswal on 9 Apr 2020

👍1

@alanakbik
Maybe a sliding window approach, as implemented here , might be a good idea to tackle the length limitation of BERT.
I've resorted a lot to using the linked package instead of flair, solely for this feature, as the results seem to be better compared to simply truncating the sentences.

Would love to see this feature in flair!

plc-dev on 15 Apr 2020

👍1

Thanks for the pointer - yes this looks promising so we might integrate it!

alanakbik on 15 Apr 2020

👍2

Looking forward to this 😄

ayushjaiswal on 16 Apr 2020

@alanakbik is there any update on this? 🙂

ayushjaiswal on 5 May 2020

Unfortunately, we haven't gotten around to this yet. But you could try the recently added "longformer" models which can handle longer sequences:

embeddings = TransformerWordEmbeddings('allenai/longformer-base-4096')

embeddings.embed(sentence)

alanakbik on 8 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

NER task using Flair BertEmbeddings VS HuggingFace scripts

ChessMateK · 3Comments

Stacked Embedding Classification

jewl123 · 3Comments

Getting tagwise accuracy/fscore metrics in sequence tagger

mnishant2 · 3Comments

Iterating data fetcher for large training data sets

alanakbik · 3Comments

Sentiment analysis pipeline not working

happypanda5 · 3Comments