Flair: Problem with max_sequence_length in BertEmbeddings

Created on 9 Apr 2020  路  7Comments  路  Source: flairNLP/flair

Currently, BertEmbeddings does not account for the maximum sequence length supported by the underlying (transformers) BertModel. Since BERT creates subtokens, it becomes somewhat challenging to check sequence-length and trim sentence externally before feeding it to BertEmbeddings in flair.

I see a problem in https://github.com/flairNLP/flair/blob/master/flair/embeddings.py#L2678--L2687

        # first, find longest sentence in batch
        longest_sentence_in_batch: int = len(
            max(
                [
                    self.tokenizer.tokenize(sentence.to_tokenized_string())
                    for sentence in sentences
                ],
                key=len,
            )
        )

This is passed to

        # prepare id maps for BERT model
        features = self._convert_sentences_to_features(
            sentences, longest_sentence_in_batch
        )

which sets max_sequence_length in:

https://github.com/flairNLP/flair/blob/master/flair/embeddings.py#L2620-L2622
```python
_convert_sentences_to_features(
self, sentences, max_sequence_length: int
)
````

But this does not account for or check the max-sequence-length supported by the BERT model, which is accessible in either of the above functions through self.model.config.max_position_embeddings.

bug

Most helpful comment

Thanks for the pointer - yes this looks promising so we might integrate it!

All 7 comments

Hi @ayushjaiswal we are in the process of refactoring the transformer-based embeddings classes. See #1494. Instead of separate classes for each transformer embedding, we will have a unified class that gets the transformer model as string in the constructor. So initialization will be like this:

# example sentence
sentence = Sentence('The grass is green')

# a BERT model
embeddings = TransformerWordEmbeddings(model="bert-base-uncased", layers="-1", pooling_operation='first')
embeddings.embed(sentence)

# a roBERTa model
embeddings = TransformerWordEmbeddings(model="distilroberta-base", layers="-1", pooling_operation='first')
embeddings.embed(sentence)

There is now also a corresponding TransformerDocumentEmbeddings class in case you want document embeddings out of the transformer.

We're also looking at different ways for handling overlong sequences as part of the refactoring. We will add handling for this soon.

@alanakbik Thanks for the quick response! Great to hear about the refactoring and handling of overlong sequences. self.model.config.max_position_embeddings definitely needs to be accounted for so that the input sequence during forward pass of the BertModel does not encounter sequences of length greater than that. Currently, when the length does exceed the limit, a RuntimeError occurs caused by a CUDA AssertionError which corrupts the CUDA context and requires re-initialization of the CUDA session. Even if the input sequence is trimmed, I guess it will create a problem with assigning embeddings to Sentence tokens. It seems somewhat tricky 馃槄

@alanakbik
Maybe a sliding window approach, as implemented here , might be a good idea to tackle the length limitation of BERT.
I've resorted a lot to using the linked package instead of flair, solely for this feature, as the results seem to be better compared to simply truncating the sentences.

Would love to see this feature in flair!

Thanks for the pointer - yes this looks promising so we might integrate it!

Looking forward to this 馃槃

@alanakbik is there any update on this? 馃檪

Unfortunately, we haven't gotten around to this yet. But you could try the recently added "longformer" models which can handle longer sequences:

embeddings = TransformerWordEmbeddings('allenai/longformer-base-4096')

embeddings.embed(sentence)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

ChessMateK picture ChessMateK  路  3Comments

jewl123 picture jewl123  路  3Comments

mnishant2 picture mnishant2  路  3Comments

alanakbik picture alanakbik  路  3Comments

happypanda5 picture happypanda5  路  3Comments