gets index not initialized error, while embedding the sentences
Environment (please complete the following information):
<ipython-input-88-53d61387c238> in get_similarities(qa)
3
4 def get_similarities(qa):
----> 5 sents = list(map(Sentence,qa))
6 embeddings.embed(sents)
7 return list(map(get_cosim,sents))
~/.local/lib/python3.6/site-packages/flair/data.py in __init__(self, text, use_tokenizer, labels)
337 word += char
338 # increment for last token in sentence if not followed by whtespace
--> 339 index += 1
340 if len(word) > 0:
341 token = Token(word, start_position=index-len(word))
UnboundLocalError: local variable 'index' referenced before assignment
Hi @SatyaRamGV - the error seems to occur during creation of the Sentence object. Could you paste a full minimum code snippet including example sentence you want to embed?
@SatyaRamGV were you able to resolve this issue? If not, could you paste a minimum example including an example sentence that throws this error?
Closing due to inactivity.
I ran into this very issue right now. The problem is that the index variable isn't defined prior to the increment in line 339. I'd submit a PR, but the question is: is the index intially the same as for use_tokenizer? If so, https://github.com/zalandoresearch/flair/blob/51ea483c4bed0e0c8d71516af61eb3aa5e47505e/flair/data.py#L302 could just be to line L290 or something.
PS: I'm on 0.4.0, but shouldn't matter, as the master has the same issue obviously.
@ChristianSch could you post a full minimum code example to reproduce the error?
On it. As OP I run over some corpus and try to generate sentences. It seems that Sentence('') is the problem.
Ah great - thanks for spotting this! sentence = Sentence('') reproduces the error.
The problem is that the index never gets initialized, so if there is no text at all in the sentence, the index stays None which causes the error in the index += 1 line. Probably the easiest fix is to always initializedthe index to 0.
However, this would mean that Flair would allow completely empty sentences to get initialized. Is this desired behavior?
Perhaps we should also log a warning whenever empty sentences get initialized. @tabergma what do you think?
also ran into this.. it could return a None or throw an exception, but it shouldn't throw a cryptic error like it is now.
Ah great - thanks for spotting this!
sentence = Sentence('')reproduces the error.The problem is that the index never gets initialized, so if there is no text at all in the sentence, the index stays None which causes the error in the
index += 1line. Probably the easiest fix is to always initializedthe index to 0.However, this would mean that Flair would allow completely empty sentences to get initialized. Is this desired behavior?
Perhaps we should also log a warning whenever empty sentences get initialized. @tabergma what do you think?
Thanks for the Solution. if sentence=='': continue in the loop solve the problem
Hm yes, so currently we have inconsistent behavior depending on whether or not we use the tokenizer:
from flair.data import Sentence
# this creates an empty sentence with 0 tokens
sentence = Sentence('', use_tokenizer=True)
print(sentence)
# this throws an error
sentence = Sentence('')
print(sentence)
We should probably either raise an error or return a Sentence object without tokens. Returning None in such cases will make the code break elsewhere and make it difficult for people to find why the code is breaking. How about returning an empty Sentence and also logging a warning at the same time?
@alanakbik This currently leads to verbose logging output when e.g. importing the GermEval dataset with corpus = flair.datasets.GERMEVAL(). I could see tons of those warnings then 馃槦
Yeah noticed this today - PR #761 contains the fix for this. I'll merge and the error should go away!
Most helpful comment
Yeah noticed this today - PR #761 contains the fix for this. I'll merge and the error should go away!