Spacy: TypeError when calling similarity() with a loop including a single-letter token

Created on 9 Mar 2018  Â·  3Comments  Â·  Source: explosion/spaCy

It seems that when iterating through a Doc containing a single-letter token (e.g. "I" or "a") and calling the similarity() method in a pairwise fashion, you get a TypeError:

>>> tokens = nlp("a phrase")
>>> for token1 in tokens:
...:    for token2 in tokens:
...:        print(token1.similarity(token2))

TypeError: 'spacy.tokens.token.Token' object does not support indexing

This seems to have something to do with the loop, since the following works:

>>> tokens[0].similarity(tokens[1])
0.26559153

And it works if you get rid of the one-letter token:

>>> tokens = nlp("an phrase")

This is the first time I've tried opening an issue on an open source repo, so apologies if I did anything wrong, and feedback much appreciated.

Environment
spaCy 2.0.9
Python 3.6.3 (using Pyenv virtualenv)
Ubuntu 17.10

bug feat / vectors

Most helpful comment

@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?

All 3 comments

@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?

This should have been fixed in v2.0.12 – see here:

Fix issue #2219: Fix token similarity of single-letter tokens.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

enerrio picture enerrio  Â·  3Comments

peterroelants picture peterroelants  Â·  3Comments

prashant334 picture prashant334  Â·  3Comments

nadachaabani1 picture nadachaabani1  Â·  3Comments

armsp picture armsp  Â·  3Comments