Spacy: TypeError when calling similarity() with a loop including a single-letter token

Created on 9 Mar 2018 · 3Comments · Source: explosion/spaCy

It seems that when iterating through a Doc containing a single-letter token (e.g. "I" or "a") and calling the similarity() method in a pairwise fashion, you get a TypeError:

>>> tokens = nlp("a phrase")
>>> for token1 in tokens:
...:    for token2 in tokens:
...:        print(token1.similarity(token2))

TypeError: 'spacy.tokens.token.Token' object does not support indexing

This seems to have something to do with the loop, since the following works:

>>> tokens[0].similarity(tokens[1])
0.26559153

And it works if you get rid of the one-letter token:

>>> tokens = nlp("an phrase")

This is the first time I've tried opening an issue on an open source repo, so apologies if I did anything wrong, and feedback much appreciated.

Environment
spaCy 2.0.9
Python 3.6.3 (using Pyenv virtualenv)
Ubuntu 17.10

bug feat / vectors

Source

norrishd

👍4

Most helpful comment

@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?

dasguptar on 19 Mar 2018

👍2

All 3 comments

@honnibal @ines I am facing the same issue with spaCy 2.0.9 and Python 3.6.3
What is the reason for this? And is there any workaround?

dasguptar on 19 Mar 2018

👍2

This should have been fixed in v2.0.12 – see here:

Fix issue #2219: Fix token similarity of single-letter tokens.

ines on 12 Sep 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 12 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

`is_stop` depends on capitalisation

peterroelants · 3Comments

DocBin.to_bytes fails for empty DocBin

notnami · 3Comments

Compare operator (==) behaves unexpectedly on spacy tokens

ank-26 · 3Comments

EntityLinker, pipes.pyx KeyError: '0_12' using sample code given in guides

curiousgeek0 · 3Comments

PhraseMatcher returns only 1 match while more than 1 rules are verified

cverluise · 3Comments