Flair: document_embeddings gives an unexpected error using RoBERTa model

Created on 20 Jan 2020  路  4Comments  路  Source: flairNLP/flair

document_embeddings gives an error "IndexError: index 0 is out of bounds for dimension 0 with size 0" for some sentences using RoBERTa model.

It appears using both DocumentRNNEmbeddings and DocumentPoolEmbeddings

To Reproduce

from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence, DocumentRNNEmbeddings
from flair.embeddings import RoBERTaEmbeddings
embedding = RoBERTaEmbeddings(pooling_operation="mean")
document_embeddings_roberta = DocumentRNNEmbeddings([embedding]) #, fine_tune_mode='nonlinear')
s = 'negative reconnaissance it' 
sentence = Sentence(s)
document_embeddings_roberta.embed(sentence)
vector = sentence.get_embedding()
vector

Expected behavior
I expect to get a vector of embedding in the form of Tensor.

Screenshots
image
image

However...

If I add a "." at the end of the sentence - it is starting to perform as expected.
image
code:

from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentPoolEmbeddings, Sentence,DocumentRNNEmbeddings
from flair.embeddings import RoBERTaEmbeddings
embedding = RoBERTaEmbeddings(pooling_operation="mean")
document_embeddings_roberta = DocumentRNNEmbeddings([embedding]) #, fine_tune_mode='nonlinear')
s = 'negative reconnaissance it.' 
sentence = Sentence(s)
document_embeddings_roberta.embed(sentence)
vector = sentence.get_embedding()
vector

Environment (please complete the following information):
Ubuntu
image

Additional context
What is interesting that there is no such behavior using the BERT model.

bug

All 4 comments

Hi @trokhymovych ,

could you try use the latest master version of Flair? I did some tokenization fixes for the GPT2-based models :)

Thank you, @stefan-it
It is working with latest master

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

We've just released Flair 0.5, where you can get document embeddings directly out of the transformer:

from flair.embeddings import TransformerDocumentEmbeddings

# init embedding
embedding = TransformerDocumentEmbeddings('roberta-base')

# create a sentence
sentence = Sentence('The grass is green .')

# embed the sentence
embedding.embed(sentence)
Was this page helpful?
0 / 5 - 0 ratings