Hi,
When running an experiment with NER (run_ner.py), I got the following error:
File "./run_ner.py", line 79, in <module>
main()
File "./run_ner.py", line 76, in main
train(train_file_path, dev_file_path, test_file_path)
File "./run_ner.py", line 63, in train
train_with_dev=True, anneal_mode=True)
File "/home/user/projects/ner/flair/flair/trainer.py", line 80, in train
loss = self.model.neg_log_likelihood(batch, self.model.tag_type)
File "/home/user/projects/ner/flair/flair/tagging_model.py", line 285, in neg_log_likelihood
feats, tags = self.forward(sentences)
File "/home/user/projects/ner/flair/flair/tagging_model.py", line 213, in forward
packed = torch.nn.utils.rnn.pack_padded_sequence(tagger_states, lengths)
File "/home/user/miniconda3/lib/python3.6/site-packages/torch/onnx/__init__.py", line 57, in wrapper
return fn(*args, **kwargs)
File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 124, in pack_padded_sequence
data, batch_sizes = PackPadded.apply(input, lengths, batch_first)
File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/_functions/packing.py", line 12, in forward
raise ValueError("Length of all samples has to be greater than 0, "
ValueError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0
Do you have any suggestion why this error happens?
Thanks.
Hello Duc,
thanks for reporting this! It seems this error occurs when you pass an empty sentence (i.e. a sentence with no words) to the learning step. The current data fetcher methods will assume sentence boundaries at every empty line, but some data files might have multiple empty lines in which case empty sentences get added. This is a bug - we will fix it for release 0.2 (out shortly).
If you can wait a few days, the bug will be fixed in the next version. If you would like to get started immediately, you can remove all empty sentences from your corpus, like this:
# 1. get the corpus
corpus: TaggedCorpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03)
print(corpus)
# remove empty sentences
corpus.train = [sentence for sentence in corpus.train if len(sentence) > 0]
corpus.test = [sentence for sentence in corpus.test if len(sentence) > 0]
corpus.dev = [sentence for sentence in corpus.dev if len(sentence) > 0]
print(corpus)
Does this fix the error?
The code in your comment indeed fixed the error.
Thank you for the reply.
Great!
Release 0.2 fixes this bug. git pull or pip install flair --upgrade to get the newest version.
I'm running flair 0.4.2 and I'm seeing this error. When I try to remove the empty sentences as shown above, I get an error:
AttributeError: can't set attribute
Hello @tjchambers32 we just pushed PR to master that adds a new function to remove empty sentences. Call with:
# call .filter_empty_sentences() to remove empty sentences
corpus.filter_empty_sentences()
print(corpus)
Could you check and see if it works for you?
Hello @alanakbik . I have the same issue and just installed the last version from master. The function did work but didn't remove any sentences from the corpus I am working with. Running training fails with the same error so probably the reason is somwhere else.
The trace is different from the original comment so I post it here:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-4b6ec475c77f> in <module>
16 learning_rate=0.05,
17 mini_batch_size=8,
---> 18 max_epochs=300)
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/trainers/trainer.py in train(self, base_path, evaluation_metric, learning_rate, mini_batch_size, eval_mini_batch_size, max_epochs, anneal_factor, patience, train_with_dev, monitor_train, embeddings_in_memory, checkpoint, save_final_model, anneal_with_restarts, shuffle, param_selection_mode, num_workers, **kwargs)
195 for batch_no, batch in enumerate(batch_loader):
196
--> 197 loss = self.model.forward_loss(batch)
198
199 optimizer.zero_grad()
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py in forward_loss(self, sentences, sort)
315 self, sentences: Union[List[Sentence], Sentence], sort=True
316 ) -> torch.tensor:
--> 317 features = self.forward(sentences)
318 return self._calculate_loss(features, sentences)
319
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py in forward(self, sentences)
370 self.zero_grad()
371
--> 372 self.embeddings.embed(sentences)
373
374 sentences.sort(key=lambda x: len(x), reverse=True)
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in embed(self, sentences, static_embeddings)
143
144 for embedding in self.embeddings:
--> 145 embedding.embed(sentences)
146
147 @property
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in embed(self, sentences)
74
75 if not everything_embedded or not self.static_embeddings:
---> 76 self._add_embeddings_internal(sentences)
77
78 return sentences
~/miniconda3/envs/flair/lib/python3.7/site-packages/flair/embeddings.py in _add_embeddings_internal(self, sentences)
904
905 packed = torch.nn.utils.rnn.pack_padded_sequence(
--> 906 character_embeddings, chars2_length
907 )
908
~/miniconda3/envs/flair/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
266
267 data, batch_sizes = \
--> 268 torch._C._VariableFunctions._pack_padded_sequence(input, lengths, batch_first)
269 return PackedSequence(data, batch_sizes, sorted_indices)
270
RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0
@isenilov could you print out the corpus object (to get sentence counts) before and after calling the function?
Yes, it gives the same output Corpus: 333 train + 82 dev + 82 test sentences
Do you know if there are still empty sentences in this corpus, i.e. iy you do:
for sentence in corpus.train:
print(len(sentence))
does it print any sentences of length 0?
No, there are no zero-length sentences. Moreover, I tested adding \n\tO\n to the end of the train file but when I open it using ColumnCorpus it shows 1 as the length of the sentence: Sentence: "" - 1 Tokens.
Ok, so maybe it is a different error. Can you isolate the sentence that is causing the problem?
Could you try constructing a minimal example so I can reproduce? With a small corpus consisting only a handful of sentences?
I think I found the cause of the error.
My data file contains this snippet that violates CONLL format:
: O
IT B-VatRegNo
000000000 I-VatRegNo
: O
The important point here is that there is a space before number and it is incorrectly parsed by ColumnCorpus such that empty string becomes a token and number becomes NER tag name. It might be a good idea to add checking for such a case.
RuntimeError: Length of all samples has to be greater than 0, but found an element in ‘lengths’ that is <= 0
The code in your comment indeed fixed the error.
Thank you for the reply.
where I write this code?
Most helpful comment
Hello Duc,
thanks for reporting this! It seems this error occurs when you pass an empty sentence (i.e. a sentence with no words) to the learning step. The current data fetcher methods will assume sentence boundaries at every empty line, but some data files might have multiple empty lines in which case empty sentences get added. This is a bug - we will fix it for release 0.2 (out shortly).
If you can wait a few days, the bug will be fixed in the next version. If you would like to get started immediately, you can remove all empty sentences from your corpus, like this:
Does this fix the error?