Flair: document_embeddings gives an unexpected error using RoBERTa model

Created on 20 Jan 2020 · 4Comments · Source: flairNLP/flair

document_embeddings gives an error "IndexError: index 0 is out of bounds for dimension 0 with size 0" for some sentences using RoBERTa model.

It appears using both DocumentRNNEmbeddings and DocumentPoolEmbeddings

To Reproduce

from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence, DocumentRNNEmbeddings
from flair.embeddings import RoBERTaEmbeddings
embedding = RoBERTaEmbeddings(pooling_operation="mean")
document_embeddings_roberta = DocumentRNNEmbeddings([embedding]) #, fine_tune_mode='nonlinear')
s = 'negative reconnaissance it' 
sentence = Sentence(s)
document_embeddings_roberta.embed(sentence)
vector = sentence.get_embedding()
vector

Expected behavior
I expect to get a vector of embedding in the form of Tensor.

Screenshots

However...

If I add a "." at the end of the sentence - it is starting to perform as expected.

code:

from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence,DocumentRNNEmbeddings
from flair.embeddings import RoBERTaEmbeddings
embedding = RoBERTaEmbeddings(pooling_operation="mean")
document_embeddings_roberta = DocumentRNNEmbeddings([embedding]) #, fine_tune_mode='nonlinear')
s = 'negative reconnaissance it.' 
sentence = Sentence(s)
document_embeddings_roberta.embed(sentence)
vector = sentence.get_embedding()
vector

Environment (please complete the following information):
Ubuntu

Additional context
What is interesting that there is no such behavior using the BERT model.

bug

Source

trokhymovych

All 4 comments

Hi @trokhymovych ,

could you try use the latest master version of Flair? I did some tokenization fixes for the GPT2-based models :)

stefan-it on 20 Jan 2020

👍1

Thank you, @stefan-it
It is working with latest master

trokhymovych on 20 Jan 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 19 May 2020

We've just released Flair 0.5, where you can get document embeddings directly out of the transformer:

from flair.embeddings import TransformerDocumentEmbeddings

# init embedding
embedding = TransformerDocumentEmbeddings('roberta-base')

# create a sentence
sentence = Sentence('The grass is green .')

# embed the sentence
embedding.embed(sentence)