Introducing GPU makes doc.similarity return <class 'cupy.core.core.ndarray'> of size 1 instead of a scalar. On CPU the same call returns a scalar.
Reproduce:
import spacy
nlp1 = spacy.load("en_vectors_web_lg")
doc1 = nlp1("Hey there how are you?")
doc2 = nlp1("I'm good and you?")
doc1.similarity(doc2)
0.9182046417319748
Then, when requiring GPU:
spacy.require_gpu()
True
nlp2 = spacy.load("en_vectors_web_lg")
doc1 = nlp2("Hey there how are you?")
doc2 = nlp2("I'm good and you?")
doc1.similarity(doc2)
array(0.9182048, dtype=float32)
A quick workaround is to call .item() in any case.
The difference is so small that I would argue that it is negligible/rounding noise. I do agree that it'd be better if in both cases, the same type is returned.
Thanks for the report! This boils down to the difference in output type of the
dotmethod between numpy and cupy. numpy automatically converts an array with a single element to a scalar, whichcupydoesn't. Callingitem()makes sense, I'll update the codebase accordingly so there are no surprises.
Hi. I think that's a good decision, indeed! Make sure to document this well, though, because in itself this breaks the way it was implemented before (different type returned). I don't know how many people use this in their code base, but it might cause unexpected bugs when suddenly the item() instead of the array is returned. Therefore, documenting seems very important.
Yea, I was just thinking about that, we'll probably introduce it as part of v.3 ;-)
Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Fixed by PR https://github.com/explosion/spaCy/pull/4969 - will be in spaCy v.3