Spacy: doc similarity is different between GPU version and CPU version

Created on 9 Jan 2020  路  7Comments  路  Source: explosion/spaCy

How to reproduce the behaviour


Introducing GPU makes doc.similarity return <class 'cupy.core.core.ndarray'> of size 1 instead of a scalar. On CPU the same call returns a scalar.

Reproduce:

import spacy
nlp1 = spacy.load("en_vectors_web_lg")
doc1 = nlp1("Hey there how are you?")
doc2 = nlp1("I'm good and you?")
doc1.similarity(doc2)

0.9182046417319748

Then, when requiring GPU:

spacy.require_gpu()

True

nlp2 = spacy.load("en_vectors_web_lg")
doc1 = nlp2("Hey there how are you?")
doc2 = nlp2("I'm good and you?")
doc1.similarity(doc2)

array(0.9182048, dtype=float32)

Your Environment

  • spaCy version: 2.2.3
  • Platform: Linux-4.15.0-1064-azure-x86_64-with-debian-stretch-sid
  • Python version: 3.7.5
bug feat / vectors gpu 馃敎 v3.0

Most helpful comment

Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3

All 7 comments

A quick workaround is to call .item() in any case.

The difference is so small that I would argue that it is negligible/rounding noise. I do agree that it'd be better if in both cases, the same type is returned.

Thanks for the report! This boils down to the difference in output type of the dot method between numpy and cupy. numpy automatically converts an array with a single element to a scalar, which cupy doesn't. Calling item() makes sense, I'll update the codebase accordingly so there are no surprises.

Thanks for the report! This boils down to the difference in output type of the dot method between numpy and cupy. numpy automatically converts an array with a single element to a scalar, which cupy doesn't. Calling item() makes sense, I'll update the codebase accordingly so there are no surprises.

Hi. I think that's a good decision, indeed! Make sure to document this well, though, because in itself this breaks the way it was implemented before (different type returned). I don't know how many people use this in their code base, but it might cause unexpected bugs when suddenly the item() instead of the array is returned. Therefore, documenting seems very important.

Yea, I was just thinking about that, we'll probably introduce it as part of v.3 ;-)

Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings