Flair: 'Length of all samples has to be greater than 0' error on pytorch

Created on 2 Aug 2018  Â·  16Comments  Â·  Source: flairNLP/flair

Hi,

When running an experiment with NER (run_ner.py), I got the following error:

File "./run_ner.py", line 79, in <module>
    main()
  File "./run_ner.py", line 76, in main
    train(train_file_path, dev_file_path, test_file_path)
  File "./run_ner.py", line 63, in train
    train_with_dev=True, anneal_mode=True)
  File "/home/user/projects/ner/flair/flair/trainer.py", line 80, in train
    loss = self.model.neg_log_likelihood(batch, self.model.tag_type)
  File "/home/user/projects/ner/flair/flair/tagging_model.py", line 285, in neg_log_likelihood
    feats, tags = self.forward(sentences)
  File "/home/user/projects/ner/flair/flair/tagging_model.py", line 213, in forward
    packed = torch.nn.utils.rnn.pack_padded_sequence(tagger_states, lengths)
  File "/home/user/miniconda3/lib/python3.6/site-packages/torch/onnx/__init__.py", line 57, in wrapper
    return fn(*args, **kwargs)
  File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 124, in pack_padded_sequence
    data, batch_sizes = PackPadded.apply(input, lengths, batch_first)
  File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/_functions/packing.py", line 12, in forward
    raise ValueError("Length of all samples has to be greater than 0, "
ValueError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

Do you have any suggestion why this error happens?

Thanks.

bug

Most helpful comment

Hello Duc,

thanks for reporting this! It seems this error occurs when you pass an empty sentence (i.e. a sentence with no words) to the learning step. The current data fetcher methods will assume sentence boundaries at every empty line, but some data files might have multiple empty lines in which case empty sentences get added. This is a bug - we will fix it for release 0.2 (out shortly).

If you can wait a few days, the bug will be fixed in the next version. If you would like to get started immediately, you can remove all empty sentences from your corpus, like this:

# 1. get the corpus
corpus: TaggedCorpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03)
print(corpus)

# remove empty sentences
corpus.train = [sentence for sentence in corpus.train if len(sentence) > 0]
corpus.test = [sentence for sentence in corpus.test if len(sentence) > 0]
corpus.dev = [sentence for sentence in corpus.dev if len(sentence) > 0]
print(corpus)

Does this fix the error?

All 16 comments

Hello Duc,

thanks for reporting this! It seems this error occurs when you pass an empty sentence (i.e. a sentence with no words) to the learning step. The current data fetcher methods will assume sentence boundaries at every empty line, but some data files might have multiple empty lines in which case empty sentences get added. This is a bug - we will fix it for release 0.2 (out shortly).

If you can wait a few days, the bug will be fixed in the next version. If you would like to get started immediately, you can remove all empty sentences from your corpus, like this:

# 1. get the corpus
corpus: TaggedCorpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03)
print(corpus)

# remove empty sentences
corpus.train = [sentence for sentence in corpus.train if len(sentence) > 0]
corpus.test = [sentence for sentence in corpus.test if len(sentence) > 0]
corpus.dev = [sentence for sentence in corpus.dev if len(sentence) > 0]
print(corpus)

Does this fix the error?

The code in your comment indeed fixed the error.

Thank you for the reply.

Great!

Release 0.2 fixes this bug. git pull or pip install flair --upgrade to get the newest version.

I'm running flair 0.4.2 and I'm seeing this error. When I try to remove the empty sentences as shown above, I get an error:
AttributeError: can't set attribute

Hello @tjchambers32 we just pushed PR to master that adds a new function to remove empty sentences. Call with:

# call .filter_empty_sentences() to remove empty sentences
corpus.filter_empty_sentences()
print(corpus)

Could you check and see if it works for you?

Hello @alanakbik . I have the same issue and just installed the last version from master. The function did work but didn't remove any sentences from the corpus I am working with. Running training fails with the same error so probably the reason is somwhere else.

The trace is different from the original comment so I post it here:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-4b6ec475c77f> in <module>
     16           learning_rate=0.05,
     17           mini_batch_size=8,
---> 18           max_epochs=300)

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/trainers/trainer.py in train(self, base_path, evaluation_metric, learning_rate, mini_batch_size, eval_mini_batch_size, max_epochs, anneal_factor, patience, train_with_dev, monitor_train, embeddings_in_memory, checkpoint, save_final_model, anneal_with_restarts, shuffle, param_selection_mode, num_workers, **kwargs)
    195                 for batch_no, batch in enumerate(batch_loader):
    196 
--> 197                     loss = self.model.forward_loss(batch)
    198 
    199                     optimizer.zero_grad()

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py in forward_loss(self, sentences, sort)
    315         self, sentences: Union[List[Sentence], Sentence], sort=True
    316     ) -> torch.tensor:
--> 317         features = self.forward(sentences)
    318         return self._calculate_loss(features, sentences)
    319 

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py in forward(self, sentences)
    370         self.zero_grad()
    371 
--> 372         self.embeddings.embed(sentences)
    373 
    374         sentences.sort(key=lambda x: len(x), reverse=True)

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in embed(self, sentences, static_embeddings)
    143 
    144         for embedding in self.embeddings:
--> 145             embedding.embed(sentences)
    146 
    147     @property

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in embed(self, sentences)
     74 
     75         if not everything_embedded or not self.static_embeddings:
---> 76             self._add_embeddings_internal(sentences)
     77 
     78         return sentences

~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in _add_embeddings_internal(self, sentences)
    904 
    905             packed = torch.nn.utils.rnn.pack_padded_sequence(
--> 906                 character_embeddings, chars2_length
    907             )
    908 

~/miniconda3/envs/flair/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
    266 
    267     data, batch_sizes = \
--> 268         torch._C._VariableFunctions._pack_padded_sequence(input, lengths, batch_first)
    269     return PackedSequence(data, batch_sizes, sorted_indices)
    270 

RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

@isenilov could you print out the corpus object (to get sentence counts) before and after calling the function?

Yes, it gives the same output Corpus: 333 train + 82 dev + 82 test sentences

Do you know if there are still empty sentences in this corpus, i.e. iy you do:

for sentence in corpus.train:
    print(len(sentence))

does it print any sentences of length 0?

No, there are no zero-length sentences. Moreover, I tested adding \n\tO\n to the end of the train file but when I open it using ColumnCorpus it shows 1 as the length of the sentence: Sentence: "" - 1 Tokens.

Ok, so maybe it is a different error. Can you isolate the sentence that is causing the problem?

Could you try constructing a minimal example so I can reproduce? With a small corpus consisting only a handful of sentences?

I think I found the cause of the error.
My data file contains this snippet that violates CONLL format:

:   O
IT  B-VatRegNo
 000000000  I-VatRegNo
:   O

The important point here is that there is a space before number and it is incorrectly parsed by ColumnCorpus such that empty string becomes a token and number becomes NER tag name. It might be a good idea to add checking for such a case.

RuntimeError: Length of all samples has to be greater than 0, but found an element in ‘lengths’ that is <= 0

The code in your comment indeed fixed the error.

Thank you for the reply.

where I write this code?

Was this page helpful?
0 / 5 - 0 ratings