It seems that when iterating through a Doc containing a single-letter token (e.g. "I" or "a") and calling the similarity() method in a pairwise fashion, you get a TypeError:
>>> tokens = nlp("a phrase")
>>> for token1 in tokens:
...: for token2 in tokens:
...: print(token1.similarity(token2))
TypeError: 'spacy.tokens.token.Token' object does not support indexing
This seems to have something to do with the loop, since the following works:
>>> tokens[0].similarity(tokens[1])
0.26559153
And it works if you get rid of the one-letter token:
>>> tokens = nlp("an phrase")
This is the first time I've tried opening an issue on an open source repo, so apologies if I did anything wrong, and feedback much appreciated.
Environment
spaCy 2.0.9
Python 3.6.3 (using Pyenv virtualenv)
Ubuntu 17.10
@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?
This should have been fixed in v2.0.12 – see here:
Fix issue #2219: Fix token similarity of single-letter tokens.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?