Spacy: doc similarity is different between GPU version and CPU version

Created on 9 Jan 2020 · 7Comments · Source: explosion/spaCy

How to reproduce the behaviour

Introducing GPU makes doc.similarity return <class 'cupy.core.core.ndarray'> of size 1 instead of a scalar. On CPU the same call returns a scalar.

Reproduce:

import spacy
nlp1 = spacy.load("en_vectors_web_lg")
doc1 = nlp1("Hey there how are you?")
doc2 = nlp1("I'm good and you?")
doc1.similarity(doc2)

0.9182046417319748

Then, when requiring GPU:

spacy.require_gpu()

True

nlp2 = spacy.load("en_vectors_web_lg")
doc1 = nlp2("Hey there how are you?")
doc2 = nlp2("I'm good and you?")
doc1.similarity(doc2)

array(0.9182048, dtype=float32)

Your Environment

spaCy version: 2.2.3
Platform: Linux-4.15.0-1064-azure-x86_64-with-debian-stretch-sid
Python version: 3.7.5

bug feat / vectors gpu 🔜 v3.0

Source

omri374

Most helpful comment

Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3

svlandeg on 12 Feb 2020

🚀2

All 7 comments

A quick workaround is to call .item() in any case.

omri374 on 9 Jan 2020

The difference is so small that I would argue that it is negligible/rounding noise. I do agree that it'd be better if in both cases, the same type is returned.

BramVanroy on 15 Jan 2020

Thanks for the report! This boils down to the difference in output type of the dot method between numpy and cupy. numpy automatically converts an array with a single element to a scalar, which cupy doesn't. Calling item() makes sense, I'll update the codebase accordingly so there are no surprises.

svlandeg on 4 Feb 2020

👍1

Thanks for the report! This boils down to the difference in output type of the dot method between numpy and cupy. numpy automatically converts an array with a single element to a scalar, which cupy doesn't. Calling item() makes sense, I'll update the codebase accordingly so there are no surprises.

Hi. I think that's a good decision, indeed! Make sure to document this well, though, because in itself this breaks the way it was implemented before (different type returned). I don't know how many people use this in their code base, but it might cause unexpected bugs when suddenly the item() instead of the array is returned. Therefore, documenting seems very important.

BramVanroy on 4 Feb 2020

👍1

Yea, I was just thinking about that, we'll probably introduce it as part of v.3 ;-)

svlandeg on 4 Feb 2020

👍1

Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3

svlandeg on 12 Feb 2020

🚀2

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.