Flair: UnboundLocalError: local variable 'index' referenced before assignment

Created on 24 Dec 2018  路  12Comments  路  Source: flairNLP/flair

gets index not initialized error, while embedding the sentences

Environment (please complete the following information):

  • OS [ Linux]:
  • Version [flair-0.4]:
<ipython-input-88-53d61387c238> in get_similarities(qa)
      3 
      4 def get_similarities(qa):
----> 5     sents = list(map(Sentence,qa))
      6     embeddings.embed(sents)
      7     return list(map(get_cosim,sents))

~/.local/lib/python3.6/site-packages/flair/data.py in __init__(self, text, use_tokenizer, labels)
    337                         word += char
    338                 # increment for last token in sentence if not followed by whtespace
--> 339                 index += 1
    340                 if len(word) > 0:
    341                     token = Token(word, start_position=index-len(word))

UnboundLocalError: local variable 'index' referenced before assignment
bug

Most helpful comment

Yeah noticed this today - PR #761 contains the fix for this. I'll merge and the error should go away!

All 12 comments

Hi @SatyaRamGV - the error seems to occur during creation of the Sentence object. Could you paste a full minimum code snippet including example sentence you want to embed?

@SatyaRamGV were you able to resolve this issue? If not, could you paste a minimum example including an example sentence that throws this error?

Closing due to inactivity.

I ran into this very issue right now. The problem is that the index variable isn't defined prior to the increment in line 339. I'd submit a PR, but the question is: is the index intially the same as for use_tokenizer? If so, https://github.com/zalandoresearch/flair/blob/51ea483c4bed0e0c8d71516af61eb3aa5e47505e/flair/data.py#L302 could just be to line L290 or something.

PS: I'm on 0.4.0, but shouldn't matter, as the master has the same issue obviously.

@ChristianSch could you post a full minimum code example to reproduce the error?

On it. As OP I run over some corpus and try to generate sentences. It seems that Sentence('') is the problem.

Ah great - thanks for spotting this! sentence = Sentence('') reproduces the error.

The problem is that the index never gets initialized, so if there is no text at all in the sentence, the index stays None which causes the error in the index += 1 line. Probably the easiest fix is to always initializedthe index to 0.

However, this would mean that Flair would allow completely empty sentences to get initialized. Is this desired behavior?

Perhaps we should also log a warning whenever empty sentences get initialized. @tabergma what do you think?

also ran into this.. it could return a None or throw an exception, but it shouldn't throw a cryptic error like it is now.

Ah great - thanks for spotting this! sentence = Sentence('') reproduces the error.

The problem is that the index never gets initialized, so if there is no text at all in the sentence, the index stays None which causes the error in the index += 1 line. Probably the easiest fix is to always initializedthe index to 0.

However, this would mean that Flair would allow completely empty sentences to get initialized. Is this desired behavior?

Perhaps we should also log a warning whenever empty sentences get initialized. @tabergma what do you think?

Thanks for the Solution. if sentence=='': continue in the loop solve the problem

Hm yes, so currently we have inconsistent behavior depending on whether or not we use the tokenizer:

from flair.data import Sentence

# this creates an empty sentence with 0 tokens
sentence = Sentence('', use_tokenizer=True)
print(sentence)

# this throws an error
sentence = Sentence('')
print(sentence)

We should probably either raise an error or return a Sentence object without tokens. Returning None in such cases will make the code break elsewhere and make it difficult for people to find why the code is breaking. How about returning an empty Sentence and also logging a warning at the same time?

@alanakbik This currently leads to verbose logging output when e.g. importing the GermEval dataset with corpus = flair.datasets.GERMEVAL(). I could see tons of those warnings then 馃槦

Yeah noticed this today - PR #761 contains the fix for this. I'll merge and the error should go away!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ciaochiaociao picture ciaochiaociao  路  3Comments

jannenev picture jannenev  路  3Comments

Aditya715 picture Aditya715  路  3Comments

isanvicente picture isanvicente  路  3Comments

Y4rd13 picture Y4rd13  路  3Comments