Spacy: English models returning incorrect results

Created on 9 Feb 2020  路  6Comments  路  Source: explosion/spaCy

Trying to run a simple example from the website --

import spacy
spacy.require_gpu()
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text, token.pos_, token.dep_)

Outputs --

Apple VERB dep
is VERB dep
looking VERB dep
at VERB dep
buying VERB dep
U.K. VERB dep
startup VERB dep
for VERB dep
$ VERB dep
1 VERB dep
billion VERB ROOT

I've tried with the md and lg models as well -- here are the results for those resp. --

en_core_web_md --

Apple PROPN dep
is PROPN dep
looking PROPN dep
at PROPN dep
buying PROPN dep
U.K. PROPN dep
startup PROPN dep
for PROPN dep
$ PROPN dep
1 PROPN dep
billion PROPN ROOT

en_core_web_lg --

Apple PROPN dep
is AUX dep
looking ADJ dep
at PROPN dep
buying PROPN dep
U.K. PROPN dep
startup PROPN dep
for PROPN dep
$ PROPN dep
1 PROPN dep
billion PROPN ROOT

Not sure if I missed something. Using CUDA 10.2, and cupy 7.1.1. The result is the same whether I use the GPU or not though.

Info about spaCy

  • spaCy version: 2.2.3
  • Platform: Windows-10-10.0.18362-SP0
  • Python version: 3.7.1
bug gpu windows

All 6 comments

That definitely doesn't look right!

This reminds me of an issue in thinc we found when using Windows + GPU, and which was fixed recently: https://github.com/explosion/thinc/pull/149

Are you 100% certain that these regressions happen also without GPU? I haven't been able to reproduce this.

If it's only on GPU, we can consider this issue fixed.

Yep. I think that was it. I tried again with the GPU disabled, and the output was as expected. Guess thinc was to blame.

Will try to get that fix.

Thanks!

Ok, happy to hear it was the old bug and not a new one! :-)

With respect to getting the fix: thinc has been entirely revamped since that fix, which means that you can't just upgrade thinc - spaCy will break. We're currently working on a new version of spaCy to work with the new thinc, but that's not ready yet.

In short, you're probably best of to just adjust the _murmur3.cu file on your system for now, cf. https://github.com/explosion/thinc/pull/149/files

Worked! Thank you :)

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings