Spacy: Using GPU on Windows leads to unexpected results

Created on 3 Dec 2019 · 23Comments · Source: explosion/spaCy

As reported here and here, using a GPU on Windows returns unexpected parsing results. This topic was made per request of @adrianeboyd. An example of sentences with their tokenisation, unexpected POS-tags, and unexpected DEP labels:

s = "The decrease in 2008 primarily relates to the decrease in cash and cash equivalents 1.\n"
['The', 'decrease', 'in', '2008', 'primarily', 'relates', 'to', 'the', 'decrease', 'in', 'cash', 'and', 'cash', 'equivalents', '1', '.', '\n']
['VERB', 'PRON', 'PROPN', 'NOUN', 'VERB', 'ADV', 'VERB', 'NUM', 'PRON', 'NOUN', 'VERB', 'PROPN', 'PROPN', 'VERB', 'VERB', 'NOUN', 'SPACE']
['dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'ROOT', '']

s = "The Company's current liabilities of &euro;32.6 million primarily relate to deferred income from collaborative arrangements and trade payables.\n"
['The Company', "'s", 'current', 'liabilities', 'of', '&', 'euro;32.6', 'million', 'primarily', 'relate', 'to', 'deferred', 'income', 'from', 'collaborative', 'arrangements', 'and', 'trade', 'payables', '.', '\n']
['NOUN', 'VERB', 'AUX', 'NOUN', 'NOUN', 'PROPN', 'PROPN', 'PROPN', 'VERB', 'VERB', 'ADV', 'VERB', 'VERB', 'NOUN', 'NOUN', 'PROPN', 'NOUN', 'PROPN', 'VERB', 'NUM', 'NOUN', 'SPACE']
['dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'punct', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'ROOT', '']

s = 'The increase in deferred income is related to new deals with partners.\n'
['The', 'increase', 'in', 'deferred', 'income', 'is', 'related', 'to', 'new', 'deals', 'with', 'partners', '.', '\n']
['NOUN', 'PROPN', 'PROPN', 'VERB', 'NOUN', 'NOUN', 'NOUN', 'VERB', 'ADV', 'VERB', 'NOUN', 'VERB', 'NOUN', 'SPACE']
['dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'dep', 'punct', 'dep', 'dep', 'ROOT', '']

Example repo with data here.

Is it perhaps possible to include Windows in the integrated testing? It might be interesting to have an idea about how many people who use this package are actually on Windows. (I use it for testing and prototyping, mainly.)

spaCy version info

spaCy version: 2.2.3
Platform: Windows-10-10.0.18362-SP0
Python version: 3.7.5

bug compat gpu perf / accuracy windows

Source

BramVanroy

All 23 comments

Spacy has many windows users and windows is included in the CI setup (using Azure Pipelines), but there's not an option to test with a GPU, for any OS.

adrianeboyd on 4 Dec 2019

Spacy has many windows users and windows is included in the CI setup (using Azure Pipelines), but there's not an option to test with a GPU, for any OS.

Oh, that's interesting - and a pity.

When I find the time, I can try using earlier cupy versions/thinc_gpu_opts/CUDA versions. Is there anything specific I should look out for?

If you need to run tests on Windows+GPU you can always ping me on Twitter or LinkedIn.

BramVanroy on 4 Dec 2019

I honestly have no idea about the cupy details here, but to see if the spacy models are working as expected, I'd suggest running tests from https://github.com/explosion/spacy-models. An example:

pytest tests --model en_core_web_sm --lang en --has-parser --has-tagger --has-ner

To get it to test on GPU, add spacy.prefer_gpu() to tests/conftest.py. There are tests for short simple examples and with small test corpora (for English and German) that should fail when the performance looks like what you have above.

adrianeboyd on 4 Dec 2019

Can you guide me in how to install spacy from the clone with GPU support? E.g. a variant of pip install -e . but I'm not sure how to add the CUDA option.

BramVanroy on 4 Dec 2019

I don't think you need to install anything from source, but the pip command would be pip install -e .[cuda100]. I think that will only get you the latest version of cupy, though.

To install an older version, I think you just install cupy by hand and then it will be available for spacy (it shouldn't be needed to compile anything, just when you run spacy with the GPU):

pip install cupy-cuda100==VERSION

I was mainly wondering if v6 has any major differences compared to v7, but from the requirements it looks like you could try back to 5.0.0b4. I've been using the most recent v7 in linux (now v7.0.0rc1) without noticing any differences with spacy, but maybe there's something related to windows that has changed? I figured it was worth testing a few versions back so we can maybe give helpful pointers to people in your position, but it's very possible this won't make a difference.

adrianeboyd on 4 Dec 2019

For starters, I just installed spaCy (no GPU) and bumped in some language-specific failures. Are these expected?

=================================================================================== FAILURES ===================================================================================
___________________________________________________________________ test_fi_lex_attrs_like_number[yksi-True] ___________________________________________________________________

fi_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271C82628B8>, text = 'yksi', match = True

    @pytest.mark.parametrize(
        "text,match",
        [
            ("10", True),
            ("1", True),
            ("10000", True),
            ("10,00", True),
            ("-999,0", True),
            ("yksi", True),
            ("kolmetoista", True),
            ("viisikymmentä", True),
            ("tuhat", True),
            ("1/2", True),
            ("hevonen", False),
            (",", False),
        ],
    )
    def test_fi_lex_attrs_like_number(fi_tokenizer, text, match):
        tokens = fi_tokenizer(text)
        assert len(tokens) == 1
>       assert tokens[0].like_num == match
E       assert False == True
E        +  where False = yksi.like_num

tests\lang\fi\test_text.py:27: AssertionError
_______________________________________________________________ test_fi_lex_attrs_like_number[kolmetoista-True] ________________________________________________________________

fi_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271C82628B8>, text = 'kolmetoista', match = True

    @pytest.mark.parametrize(
        "text,match",
        [
            ("10", True),
            ("1", True),
            ("10000", True),
            ("10,00", True),
            ("-999,0", True),
            ("yksi", True),
            ("kolmetoista", True),
            ("viisikymmentä", True),
            ("tuhat", True),
            ("1/2", True),
            ("hevonen", False),
            (",", False),
        ],
    )
    def test_fi_lex_attrs_like_number(fi_tokenizer, text, match):
        tokens = fi_tokenizer(text)
        assert len(tokens) == 1
>       assert tokens[0].like_num == match
E       assert False == True
E        +  where False = kolmetoista.like_num

tests\lang\fi\test_text.py:27: AssertionError
_____________________________________________________________ test_fi_lex_attrs_like_number[viisikymment\xe4-True] _____________________________________________________________

fi_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271C82628B8>, text = 'viisikymmentä', match = True

    @pytest.mark.parametrize(
        "text,match",
        [
            ("10", True),
            ("1", True),
            ("10000", True),
            ("10,00", True),
            ("-999,0", True),
            ("yksi", True),
            ("kolmetoista", True),
            ("viisikymmentä", True),
            ("tuhat", True),
            ("1/2", True),
            ("hevonen", False),
            (",", False),
        ],
    )
    def test_fi_lex_attrs_like_number(fi_tokenizer, text, match):
        tokens = fi_tokenizer(text)
        assert len(tokens) == 1
>       assert tokens[0].like_num == match
E       assert False == True
E        +  where False = viisikymmentä.like_num

tests\lang\fi\test_text.py:27: AssertionError
__________________________________________________________________ test_fi_lex_attrs_like_number[tuhat-True] ___________________________________________________________________

fi_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271C82628B8>, text = 'tuhat', match = True

    @pytest.mark.parametrize(
        "text,match",
        [
            ("10", True),
            ("1", True),
            ("10000", True),
            ("10,00", True),
            ("-999,0", True),
            ("yksi", True),
            ("kolmetoista", True),
            ("viisikymmentä", True),
            ("tuhat", True),
            ("1/2", True),
            ("hevonen", False),
            (",", False),
        ],
    )
    def test_fi_lex_attrs_like_number(fi_tokenizer, text, match):
        tokens = fi_tokenizer(text)
        assert len(tokens) == 1
>       assert tokens[0].like_num == match
E       assert False == True
E        +  where False = tuhat.like_num

tests\lang\fi\test_text.py:27: AssertionError
__________________________________________ test_fi_tokenizer_hyphenated_words[1700-luvulle sijoittuva taide-elokuva-expected_tokens0] __________________________________________

fi_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271C82628B8>, text = '1700-luvulle sijoittuva taide-elokuva'
expected_tokens = ['1700-luvulle', 'sijoittuva', 'taide-elokuva']

    @pytest.mark.parametrize("text,expected_tokens", HYPHENATED_TESTS)
    def test_fi_tokenizer_hyphenated_words(fi_tokenizer, text, expected_tokens):
        tokens = fi_tokenizer(text)
        token_list = [token.text for token in tokens if not token.is_space]
>       assert expected_tokens == token_list
E       AssertionError: assert ['1700-luvull...aide-elokuva'] == ['1700-luvull...-', 'elokuva']
E         At index 2 diff: 'taide-elokuva' != 'taide'
E         Right contains 2 more items, first extra item: '-'
E         Use -v to get the full diff

tests\lang\fi\test_tokenizer.py:34: AssertionError
____________________________________________________________________ test_lb_tokenizer_handles_exc_in_text _____________________________________________________________________

lb_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271CA3D3480>

    def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
        text = "Mee 't ass net evident, d'Liewen."
        tokens = lb_tokenizer(text)
>       assert len(tokens) == 9
E       AssertionError: assert 10 == 9
E        +  where 10 = len(Mee 't ass net evident, d'Liewen.)

tests\lang\lb\test_exceptions.py:19: AssertionError
______________________________________________________________________ test_lb_norm_exceptions[dass-datt] ______________________________________________________________________

lb_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271CA3D3480>, text = 'dass', norm = 'datt'

    @pytest.mark.parametrize("text,norm", [("dass", "datt"), ("viläicht", "vläicht")])
    def test_lb_norm_exceptions(lb_tokenizer, text, norm):
        tokens = lb_tokenizer(text)
>       assert tokens[0].norm_ == norm
E       AssertionError: assert 'dass' == 'datt'
E         - dass
E         + datt

tests\lang\lb\test_exceptions.py:26: AssertionError
_______________________________________________________________ test_lb_norm_exceptions[vil\xe4icht-vl\xe4icht] ________________________________________________________________

lb_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271CA3D3480>, text = 'viläicht', norm = 'vläicht'

    @pytest.mark.parametrize("text,norm", [("dass", "datt"), ("viläicht", "vläicht")])
    def test_lb_norm_exceptions(lb_tokenizer, text, norm):
        tokens = lb_tokenizer(text)
>       assert tokens[0].norm_ == norm
E       AssertionError: assert 'viläicht' == 'vläicht'
E         - viläicht
E         ?  -
E         + vläicht

tests\lang\lb\test_exceptions.py:26: AssertionError
_______________________________ test_lb_tokenizer_handles_examples[Am Grand-Duch\xe9 ass d'Liewen sch\xe9in, mee 't g\xebtt ze vill Autoen.-14] ________________________________

lb_tokenizer = <spacy.tokenizer.Tokenizer object at 0x00000271CA3D3480>, text = "Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", length = 14

    @pytest.mark.parametrize(
        "text,length",
        [
            ("»Wat ass mat mir geschitt?«, huet hie geduecht.", 13),
            ("“Dëst fréi Opstoen”, denkt hien, “mécht ee ganz duercherneen. ", 15),
            ("Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", 14)
        ],
    )
    def test_lb_tokenizer_handles_examples(lb_tokenizer, text, length):
        tokens = lb_tokenizer(text)
>       assert len(tokens) == length
E       AssertionError: assert 17 == 14
E        +  where 17 = len(Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.)

tests\lang\lb\test_text.py:24: AssertionError
_________________________________________________________________________ test_parser_set_sent_starts __________________________________________________________________________

en_vocab = <spacy.vocab.Vocab object at 0x00000271C72FCB48>

    def test_parser_set_sent_starts(en_vocab):
        words = ['Ein', 'Satz', '.', 'Außerdem', 'ist', 'Zimmer', 'davon', 'überzeugt', ',', 'dass', 'auch', 'epige-', '\n', 'netische', 'Mechanismen', 'eine', 'Rolle', 'spielen', ',', 'also', 'Vorgänge', ',', 'die', '\n', 'sich', 'darauf', 'auswirken', ',', 'welche', 'Gene', 'abgelesen', 'werden', 'und', '\n', 'welche', 'nicht', '.', '\n']
        heads = [1, 0, -1, 27, 0, -1, 1, -3, -1, 8, 4, 3, -1, 1, 3, 1, 1, -11, -1, 1, -9, -1, 4, -1, 2, 1, -6, -1, 1, 2, 1, -6, -1, -1, -17, -31, -32, -1]
        deps = ['nk', 'ROOT', 'punct', 'mo', 'ROOT', 'sb', 'op', 'pd', 'punct', 'cp', 'mo', 'nk', '', 'nk', 'sb', 'nk', 'oa', 're', 'punct', 'mo', 'app', 'punct', 'sb', '', 'oa', 'op', 'rc', 'punct', 'nk', 'sb', 'oc', 're', 'cd', '', 'oa', 'ng', 'punct', '']
        doc = get_doc(
            en_vocab, words=words, deps=deps, heads=heads
        )
        for i in range(len(words)):
            if i == 0 or i == 3:
>               assert doc[i].is_sent_start == True
E               assert None == True
E                +  where None = Außerdem.is_sent_start

tests\parser\test_parse.py:162: AssertionError
________________________________________________________________________________ test_issue4707 ________________________________________________________________________________

    def test_issue4707():
        """Tests that disabled component names are also excluded from nlp.from_disk
        by default when loading a model.
        """
        nlp = English()
        nlp.add_pipe(nlp.create_pipe("sentencizer"))
        nlp.add_pipe(nlp.create_pipe("entity_ruler"))
        assert nlp.pipe_names == ["sentencizer", "entity_ruler"]
        exclude = ["tokenizer", "sentencizer"]
        with make_tempdir() as tmpdir:
            nlp.to_disk(tmpdir, exclude=exclude)
>           new_nlp = load_model_from_path(tmpdir, disable=exclude)

tests\regression\test_issue4707.py:21:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\spacy\util.py:211: in load_model_from_path
    return nlp.from_disk(model_path)
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\spacy\language.py:941: in from_disk
    util.from_disk(path, deserializers, exclude)
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\spacy\util.py:654: in from_disk
    reader(path / key)
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\spacy\language.py:928: in <lambda>
    p, exclude=["vocab"]
tokenizer.pyx:524: in spacy.tokenizer.Tokenizer.from_disk
    ???
C:\Users\bramv\AppData\Local\Programs\Python\Python36\Lib\pathlib.py:1183: in open
    opener=self._opener)
C:\Users\bramv\AppData\Local\Programs\Python\Python36\Lib\pathlib.py:1037: in _opener
    return self._accessor.open(self, flags, mode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

pathobj = WindowsPath('C:/Users/bramv/AppData/Local/Temp/tmpflz_ssgk/tokenizer'), args = (32896, 438)

    @functools.wraps(strfunc)
    def wrapped(pathobj, *args):
>       return strfunc(str(pathobj), *args)
E       FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\bramv\\AppData\\Local\\Temp\\tmpflz_ssgk\\tokenizer'

C:\Users\bramv\AppData\Local\Programs\Python\Python36\Lib\pathlib.py:387: FileNotFoundError
=============================================================================== warnings summary ===============================================================================
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327
  c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.slow - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================================ 11 failed, 1799 passed, 691 skipped, 58 xfailed, 1 warning in 80.24s (0:01:20) ================================================

Running with GPU takes a _very_ long time it seems, so I'll update you when I have some results.

BramVanroy on 4 Dec 2019

Hmm, I'm not sure what's going on? I suspect you have one version installed through pip and you're running tests from the source for a slightly different version. Sometimes when you switch from installing from source to installing binary packages pip doesn't clean things up very well, so maybe try a new virtual environment with nothing installed from source? New environment and to get started just:

pip install -U spacy[cuda100]

I didn't mean the spacy tests, since they don't do much with statistical models (and I do see that there are some things that will fail or hang with a GPU enabled, but those are separate problems, since they fail on linux, too).

I meant the tests for spacy-models, which is a different repository with tests just for the statistical models. On my older server the model test I mentioned above for en_core_web_sm takes 30 seconds, with or without GPU:

pytest spacy-models/tests --model en_core_web_sm --lang en --has-parser --has-tagger --has-ner

Then try installing earlier versions of cupy-cuda100 (or whatever CUDA version you have):

pip install cupy-cuda100==6.5.0
[run model tests]
pip install cupy-cuda100==5.4.0
[run model tests]

And see if anything changes?

If you don't have time, don't stress about it, we can just plan to warn people more explicitly that it might not work in windows.

adrianeboyd on 4 Dec 2019

Okay, got it. Sorry for my confusion. I just tested spacy[cuda92] + cupy-cuda92==7.0.0rc1. Will add new replies with other environments.

============================================================================================== FAILURES ===============================================================================================
______________________________________________________________________________________ test_parser_sanity_checks ______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>
example_text = 'Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.'

    @pytest.mark.requires("parser")
    def test_parser_sanity_checks(NLP, example_text):
        doc = NLP(example_text)
        # check that sentences are split
>       assert len(list(doc.sents)) > 1
E       assert 1 > 1
E        +  where 1 = len([Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.])
E        +    where [Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.] = list(<generator object at 0x00000206D4890AF8>)
E        +      where <generator object at 0x00000206D4890AF8> = Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturers...here are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?..sents

spacy-models\tests\test_common.py:131: AssertionError
_________________________________________________________________________________________ test_en_ner_example _________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_ner_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup for $1 billion")
        ents = [
            ("Apple", 0, 5, "ORG"),
            ("U.K.", 27, 31, "GPE"),
            ("$1 billion", 44, 54, "MONEY"),
        ]
>       assert len(doc.ents) == 3
E       assert 0 == 3
E        +  where 0 = len(())
E        +    where () = Apple is looking at buying U.K. startup for $1 billion.ents

spacy-models\tests\lang\en\test_ner.py:14: AssertionError
_______________________________________________________________________________________ test_en_parser_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_parser_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        deps = ["nsubj", "aux", "ROOT", "prep", "pcomp", "compound", "dobj"]
        for token, expected_dep in zip(doc, deps):
>           assert token.dep_ == expected_dep
E           AssertionError: assert 'dep' == 'nsubj'
E             - dep
E             + nsubj

spacy-models\tests\lang\en\test_parser.py:17: AssertionError
_____________________________________________________________________ test_en_parser_corpus[masc-penn-treebank-sample.json-82-78] _____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, test_file = 'masc-penn-treebank-sample.json', uas_threshold = 82, las_threshold = 78

    @pytest.mark.parametrize(
        "test_file,uas_threshold,las_threshold",
        [("masc-penn-treebank-sample.json", 82, 78)],
    )
    def test_en_parser_corpus(NLP, test_file, uas_threshold, las_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)
>       assert scorer.uas > uas_threshold
E       assert 23.752030836864073 > 82
E        +  where 23.752030836864073 = <spacy.scorer.Scorer object at 0x00000206E158CBE0>.uas

spacy-models\tests\lang\en\test_parser.py:31: AssertionError
______________________________________________________________________________________ test_en_parser_issue1207 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_parser_issue1207(NLP):
        doc = NLP("Employees are recruiting talented staffers from overseas.")
>       assert [i.text for i in doc.noun_chunks] == ["Employees", "talented staffers"]
E       AssertionError: assert ['Employees a...om overseas.'] == ['Employees',...ted staffers']
E         At index 0 diff: 'Employees are recruiting talented staffers from overseas.' != 'Employees'
E         Right contains one more item: 'talented staffers'
E         Use -v to get the full diff

spacy-models\tests\lang\en\test_parser.py:90: AssertionError
_______________________________________________________________________________________ test_en_parser_issue693 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_parser_issue693(NLP):
        """Test that doc.noun_chunks parses the complete sentence."""
        text1 = "the TopTown International Airport Board and the Goodwill Space Exploration Partnership."
        text2 = "the Goodwill Space Exploration Partnership and the TopTown International Airport Board."
        doc1 = NLP(text1)
        doc2 = NLP(text2)
        chunks1 = [chunk for chunk in doc1.noun_chunks]
        chunks2 = [chunk for chunk in doc2.noun_chunks]
>       assert len(chunks1) == 2
E       assert 1 == 2
E        +  where 1 = len([the TopTown International Airport Board and the Goodwill Space Exploration Partnership.])

spacy-models\tests\lang\en\test_parser.py:103: AssertionError
______________________________________________________________________________________ test_en_tagger_tag_names _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_tagger_tag_names(NLP):
        doc = NLP("I ate pizzas with anchovies.", disable=["parser"])
        assert type(doc[2].pos) == int
        assert isinstance(doc[2].pos_, unicode_)
        assert isinstance(doc[2].dep_, unicode_)
>       assert doc[2].tag_ == "NNS"
E       AssertionError: assert 'VBZ' == 'NNS'
E         - VBZ
E         + NNS

spacy-models\tests\lang\en\test_tagger.py:25: AssertionError
_______________________________________________________________________________________ test_en_tagger_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>

    def test_en_tagger_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        pos = ["PROPN", "AUX", "VERB", "ADP", "VERB", "PROPN", "NOUN"]
        tags = ["NNP", "VBZ", "VBG", "IN", "VBG", "NNP", "NN"]
        for token, expected_pos in zip(doc, pos):
>           assert token.pos_ == expected_pos
E           AssertionError: assert 'VERB' == 'PROPN'
E             - VERB
E             + PROPN

spacy-models\tests\lang\en\test_tagger.py:33: AssertionError
____________________________________________________________________________ test_en_tagger_corpus[en_pud-ud-test.json-94] ____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, test_file = 'en_pud-ud-test.json', accuracy_threshold = 94

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 5.973743860974688 > 94
E        +  where 5.973743860974688 = <spacy.scorer.Scorer object at 0x00000206D4A0BBE0>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
______________________________________________________________________ test_en_tagger_corpus[masc-penn-treebank-sample.json-89] _______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, test_file = 'masc-penn-treebank-sample.json', accuracy_threshold = 89

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 4.449704142011835 > 89
E        +  where 4.449704142011835 = <spacy.scorer.Scorer object at 0x00000206E1589A90>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
___________________________________________________________________ test_en_tagger_lemma_issue401_issue719[Jane's got a new car-1] ____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text = "Jane's got a new car", i = 1

    @pytest.mark.parametrize(
        "text,i", [("Jane's got a new car", 1), ("Jane thinks that's a nice car", 3)]
    )
    def test_en_tagger_lemma_issue401_issue719(NLP, text, i):
        """Text that 's in contractions is not lemmatized as ' or empty string."""
        doc = NLP(text)
>       assert doc[i].lemma_ != "'"
E       assert "'" != "'"
E        +  where "'" = 's.lemma_

spacy-models\tests\lang\en\test_tagger.py:148: AssertionError
_______________________________________________________________ test_en_tagger_lemma_issue401_issue719[Jane thinks that's a nice car-3] _______________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text = "Jane thinks that's a nice car", i = 3

    @pytest.mark.parametrize(
        "text,i", [("Jane's got a new car", 1), ("Jane thinks that's a nice car", 3)]
    )
    def test_en_tagger_lemma_issue401_issue719(NLP, text, i):
        """Text that 's in contractions is not lemmatized as ' or empty string."""
        doc = NLP(text)
>       assert doc[i].lemma_ != "'"
E       assert "'" != "'"
E        +  where "'" = 's.lemma_

spacy-models\tests\lang\en\test_tagger.py:148: AssertionError
______________________________________________________________________ test_en_tagger_lemma_issue717[You're happy-You are happy] ______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text1 = "You're happy", text2 = 'You are happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'are'
E         - be
E         + are

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
_________________________________________________________________________ test_en_tagger_lemma_issue717[I'm happy-I am happy] _________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text1 = "I'm happy", text2 = 'I am happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'am'
E         - be
E         + am

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[He is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text = 'He is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'He' == '-PRON-'
E         - He
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[he is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000206C33EBC50>, text = 'he is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
========================================================================================== warnings summary ===========================================================================================
C:\Users\bramv\AppData\Local\Programs\Python\Python36\lib\site-packages\_pytest\mark\structures.py:327
  C:\Users\bramv\AppData\Local\Programs\Python\Python36\lib\site-packages\_pytest\mark\structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.requires - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================= 16 failed, 40 passed, 58 skipped, 1 xfailed, 2 xpassed, 1 warning in 46.63s =============================================================

BramVanroy on 4 Dec 2019

spacy[cuda92] + cupy-cuda92==6.5.0 is slightly better with one less failure.

============================================================================================== FAILURES ===============================================================================================
______________________________________________________________________________________ test_parser_sanity_checks ______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>
example_text = 'Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.'

    @pytest.mark.requires("parser")
    def test_parser_sanity_checks(NLP, example_text):
        doc = NLP(example_text)
        # check that sentences are split
>       assert len(list(doc.sents)) > 1
E       assert 1 > 1
E        +  where 1 = len([Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.])
E        +    where [Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.] = list(<generator object at 0x0000017D4CDBDB88>)
E        +      where <generator object at 0x0000017D4CDBDB88> = Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturers...here are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?..sents

spacy-models\tests\test_common.py:131: AssertionError
_________________________________________________________________________________________ test_en_ner_example _________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_ner_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup for $1 billion")
        ents = [
            ("Apple", 0, 5, "ORG"),
            ("U.K.", 27, 31, "GPE"),
            ("$1 billion", 44, 54, "MONEY"),
        ]
>       assert len(doc.ents) == 3
E       assert 0 == 3
E        +  where 0 = len(())
E        +    where () = Apple is looking at buying U.K. startup for $1 billion.ents

spacy-models\tests\lang\en\test_ner.py:14: AssertionError
_______________________________________________________________________________________ test_en_parser_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_parser_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        deps = ["nsubj", "aux", "ROOT", "prep", "pcomp", "compound", "dobj"]
        for token, expected_dep in zip(doc, deps):
>           assert token.dep_ == expected_dep
E           AssertionError: assert 'dep' == 'nsubj'
E             - dep
E             + nsubj

spacy-models\tests\lang\en\test_parser.py:17: AssertionError
_____________________________________________________________________ test_en_parser_corpus[masc-penn-treebank-sample.json-82-78] _____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, test_file = 'masc-penn-treebank-sample.json', uas_threshold = 82, las_threshold = 78

    @pytest.mark.parametrize(
        "test_file,uas_threshold,las_threshold",
        [("masc-penn-treebank-sample.json", 82, 78)],
    )
    def test_en_parser_corpus(NLP, test_file, uas_threshold, las_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)
>       assert scorer.uas > uas_threshold
E       assert 23.56099587714532 > 82
E        +  where 23.56099587714532 = <spacy.scorer.Scorer object at 0x0000017D1DE0D198>.uas

spacy-models\tests\lang\en\test_parser.py:31: AssertionError
______________________________________________________________________________________ test_en_parser_issue1207 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_parser_issue1207(NLP):
        doc = NLP("Employees are recruiting talented staffers from overseas.")
>       assert [i.text for i in doc.noun_chunks] == ["Employees", "talented staffers"]
E       AssertionError: assert [] == ['Employees',...ted staffers']
E         Right contains 2 more items, first extra item: 'Employees'
E         Use -v to get the full diff

spacy-models\tests\lang\en\test_parser.py:90: AssertionError
_______________________________________________________________________________________ test_en_parser_issue693 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_parser_issue693(NLP):
        """Test that doc.noun_chunks parses the complete sentence."""
        text1 = "the TopTown International Airport Board and the Goodwill Space Exploration Partnership."
        text2 = "the Goodwill Space Exploration Partnership and the TopTown International Airport Board."
        doc1 = NLP(text1)
        doc2 = NLP(text2)
        chunks1 = [chunk for chunk in doc1.noun_chunks]
        chunks2 = [chunk for chunk in doc2.noun_chunks]
>       assert len(chunks1) == 2
E       assert 1 == 2
E        +  where 1 = len([the TopTown International Airport Board and the Goodwill Space Exploration Partnership.])

spacy-models\tests\lang\en\test_parser.py:103: AssertionError
______________________________________________________________________________________ test_en_tagger_tag_names _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_tagger_tag_names(NLP):
        doc = NLP("I ate pizzas with anchovies.", disable=["parser"])
        assert type(doc[2].pos) == int
        assert isinstance(doc[2].pos_, unicode_)
        assert isinstance(doc[2].dep_, unicode_)
>       assert doc[2].tag_ == "NNS"
E       AssertionError: assert 'VBZ' == 'NNS'
E         - VBZ
E         + NNS

spacy-models\tests\lang\en\test_tagger.py:25: AssertionError
_______________________________________________________________________________________ test_en_tagger_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_tagger_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        pos = ["PROPN", "AUX", "VERB", "ADP", "VERB", "PROPN", "NOUN"]
        tags = ["NNP", "VBZ", "VBG", "IN", "VBG", "NNP", "NN"]
        for token, expected_pos in zip(doc, pos):
>           assert token.pos_ == expected_pos
E           AssertionError: assert 'NUM' == 'ADP'
E             - NUM
E             + ADP

spacy-models\tests\lang\en\test_tagger.py:33: AssertionError
____________________________________________________________________________ test_en_tagger_corpus[en_pud-ud-test.json-94] ____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, test_file = 'en_pud-ud-test.json', accuracy_threshold = 94

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 5.879297317718171 > 94
E        +  where 5.879297317718171 = <spacy.scorer.Scorer object at 0x0000017D375F2F28>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
______________________________________________________________________ test_en_tagger_corpus[masc-penn-treebank-sample.json-89] _______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, test_file = 'masc-penn-treebank-sample.json', accuracy_threshold = 89

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 4.633136094674556 > 89
E        +  where 4.633136094674556 = <spacy.scorer.Scorer object at 0x0000017D375DED30>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
___________________________________________________________________________________ test_en_tagger_lemma_issue1305 ____________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>

    def test_en_tagger_lemma_issue1305(NLP):
        """Test lemmatization of English VBZ."""
        assert NLP.vocab.morphology.lemmatizer("works", "verb") == ["work"]
        doc = NLP("This app works well")
>       assert doc[2].lemma_ == "work"
E       AssertionError: assert 'works' == 'work'
E         - works
E         ?     -
E         + work

spacy-models\tests\lang\en\test_tagger.py:139: AssertionError
______________________________________________________________________ test_en_tagger_lemma_issue717[You're happy-You are happy] ______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, text1 = "You're happy", text2 = 'You are happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'are'
E         - be
E         + are

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
_________________________________________________________________________ test_en_tagger_lemma_issue717[I'm happy-I am happy] _________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, text1 = "I'm happy", text2 = 'I am happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'am'
E         - be
E         + am

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[He is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, text = 'He is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[he is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000017D375AB438>, text = 'he is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
========================================================================================== warnings summary ===========================================================================================
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327
  c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.requires - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================= 15 failed, 41 passed, 58 skipped, 1 xfailed, 2 xpassed, 1 warning in 46.18s =============================================================

BramVanroy on 4 Dec 2019

spacy[cuda92] + cupy-cuda92==6.0.0 is a bit worse again with 17 failures.

============================================================================================== FAILURES ===============================================================================================
______________________________________________________________________________________ test_parser_sanity_checks ______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>
example_text = 'Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.'

    @pytest.mark.requires("parser")
    def test_parser_sanity_checks(NLP, example_text):
        doc = NLP(example_text)
        # check that sentences are split
>       assert len(list(doc.sents)) > 1
E       assert 1 > 1
E        +  where 1 = len([Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.])
E        +    where [Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturer...ere are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?.] = list(<generator object at 0x0000021DC25BD828>)
E        +      where <generator object at 0x0000021DC25BD828> = Apple is looking at buying U.K. startup for $1 billion. Autonomous cars shift insurance liability toward manufacturers...here are you?. Who is the president of France?. What is the capital of the United States?. When was Barack Obama born?..sents

spacy-models\tests\test_common.py:131: AssertionError
_________________________________________________________________________________________ test_en_ner_example _________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_ner_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup for $1 billion")
        ents = [
            ("Apple", 0, 5, "ORG"),
            ("U.K.", 27, 31, "GPE"),
            ("$1 billion", 44, 54, "MONEY"),
        ]
>       assert len(doc.ents) == 3
E       assert 0 == 3
E        +  where 0 = len(())
E        +    where () = Apple is looking at buying U.K. startup for $1 billion.ents

spacy-models\tests\lang\en\test_ner.py:14: AssertionError
_______________________________________________________________________________________ test_en_parser_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_parser_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        deps = ["nsubj", "aux", "ROOT", "prep", "pcomp", "compound", "dobj"]
        for token, expected_dep in zip(doc, deps):
>           assert token.dep_ == expected_dep
E           AssertionError: assert 'dep' == 'nsubj'
E             - dep
E             + nsubj

spacy-models\tests\lang\en\test_parser.py:17: AssertionError
_____________________________________________________________________ test_en_parser_corpus[masc-penn-treebank-sample.json-82-78] _____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, test_file = 'masc-penn-treebank-sample.json', uas_threshold = 82, las_threshold = 78

    @pytest.mark.parametrize(
        "test_file,uas_threshold,las_threshold",
        [("masc-penn-treebank-sample.json", 82, 78)],
    )
    def test_en_parser_corpus(NLP, test_file, uas_threshold, las_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)
>       assert scorer.uas > uas_threshold
E       assert 23.796919348151928 > 82
E        +  where 23.796919348151928 = <spacy.scorer.Scorer object at 0x0000021DCB5119E8>.uas

spacy-models\tests\lang\en\test_parser.py:31: AssertionError
______________________________________________________________________________________ test_en_parser_issue1207 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_parser_issue1207(NLP):
        doc = NLP("Employees are recruiting talented staffers from overseas.")
>       assert [i.text for i in doc.noun_chunks] == ["Employees", "talented staffers"]
E       AssertionError: assert [] == ['Employees',...ted staffers']
E         Right contains 2 more items, first extra item: 'Employees'
E         Use -v to get the full diff

spacy-models\tests\lang\en\test_parser.py:90: AssertionError
_______________________________________________________________________________________ test_en_parser_issue693 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_parser_issue693(NLP):
        """Test that doc.noun_chunks parses the complete sentence."""
        text1 = "the TopTown International Airport Board and the Goodwill Space Exploration Partnership."
        text2 = "the Goodwill Space Exploration Partnership and the TopTown International Airport Board."
        doc1 = NLP(text1)
        doc2 = NLP(text2)
        chunks1 = [chunk for chunk in doc1.noun_chunks]
        chunks2 = [chunk for chunk in doc2.noun_chunks]
>       assert len(chunks1) == 2
E       assert 1 == 2
E        +  where 1 = len([the TopTown International Airport Board and the Goodwill Space Exploration Partnership.])

spacy-models\tests\lang\en\test_parser.py:103: AssertionError
______________________________________________________________________________________ test_en_tagger_tag_names _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_tagger_tag_names(NLP):
        doc = NLP("I ate pizzas with anchovies.", disable=["parser"])
        assert type(doc[2].pos) == int
        assert isinstance(doc[2].pos_, unicode_)
        assert isinstance(doc[2].dep_, unicode_)
>       assert doc[2].tag_ == "NNS"
E       AssertionError: assert 'VBD' == 'NNS'
E         - VBD
E         + NNS

spacy-models\tests\lang\en\test_tagger.py:25: AssertionError
_______________________________________________________________________________________ test_en_tagger_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_tagger_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        pos = ["PROPN", "AUX", "VERB", "ADP", "VERB", "PROPN", "NOUN"]
        tags = ["NNP", "VBZ", "VBG", "IN", "VBG", "NNP", "NN"]
        for token, expected_pos in zip(doc, pos):
>           assert token.pos_ == expected_pos
E           AssertionError: assert 'VERB' == 'PROPN'
E             - VERB
E             + PROPN

spacy-models\tests\lang\en\test_tagger.py:33: AssertionError
____________________________________________________________________________ test_en_tagger_corpus[en_pud-ud-test.json-94] ____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, test_file = 'en_pud-ud-test.json', accuracy_threshold = 94

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 5.510955799017756 > 94
E        +  where 5.510955799017756 = <spacy.scorer.Scorer object at 0x0000021DB2C50828>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
______________________________________________________________________ test_en_tagger_corpus[masc-penn-treebank-sample.json-89] _______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, test_file = 'masc-penn-treebank-sample.json', accuracy_threshold = 89

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 4.609467455621301 > 89
E        +  where 4.609467455621301 = <spacy.scorer.Scorer object at 0x0000021DC42F97F0>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
___________________________________________________________________________________ test_en_tagger_lemma_issue1305 ____________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>

    def test_en_tagger_lemma_issue1305(NLP):
        """Test lemmatization of English VBZ."""
        assert NLP.vocab.morphology.lemmatizer("works", "verb") == ["work"]
        doc = NLP("This app works well")
>       assert doc[2].lemma_ == "work"
E       AssertionError: assert 'works' == 'work'
E         - works
E         ?     -
E         + work

spacy-models\tests\lang\en\test_tagger.py:139: AssertionError
___________________________________________________________________ test_en_tagger_lemma_issue401_issue719[Jane's got a new car-1] ____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text = "Jane's got a new car", i = 1

    @pytest.mark.parametrize(
        "text,i", [("Jane's got a new car", 1), ("Jane thinks that's a nice car", 3)]
    )
    def test_en_tagger_lemma_issue401_issue719(NLP, text, i):
        """Text that 's in contractions is not lemmatized as ' or empty string."""
        doc = NLP(text)
>       assert doc[i].lemma_ != "'"
E       assert "'" != "'"
E        +  where "'" = 's.lemma_

spacy-models\tests\lang\en\test_tagger.py:148: AssertionError
_______________________________________________________________ test_en_tagger_lemma_issue401_issue719[Jane thinks that's a nice car-3] _______________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text = "Jane thinks that's a nice car", i = 3

    @pytest.mark.parametrize(
        "text,i", [("Jane's got a new car", 1), ("Jane thinks that's a nice car", 3)]
    )
    def test_en_tagger_lemma_issue401_issue719(NLP, text, i):
        """Text that 's in contractions is not lemmatized as ' or empty string."""
        doc = NLP(text)
>       assert doc[i].lemma_ != "'"
E       assert "'" != "'"
E        +  where "'" = 's.lemma_

spacy-models\tests\lang\en\test_tagger.py:148: AssertionError
______________________________________________________________________ test_en_tagger_lemma_issue717[You're happy-You are happy] ______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text1 = "You're happy", text2 = 'You are happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'are'
E         - be
E         + are

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
_________________________________________________________________________ test_en_tagger_lemma_issue717[I'm happy-I am happy] _________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text1 = "I'm happy", text2 = 'I am happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'am'
E         - be
E         + am

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[He is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text = 'He is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[he is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x0000021DAC97ABE0>, text = 'he is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
========================================================================================== warnings summary ===========================================================================================
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327
  c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.requires - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================= 17 failed, 39 passed, 58 skipped, 1 xfailed, 2 xpassed, 1 warning in 39.41s =============================================================

BramVanroy on 4 Dec 2019

spacy[cuda92] + cupy-cuda92==5.0.0 is a bit better (12 failures).

============================================================================================== FAILURES ===============================================================================================
_________________________________________________________________________________________ test_en_ner_example _________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_ner_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup for $1 billion")
        ents = [
            ("Apple", 0, 5, "ORG"),
            ("U.K.", 27, 31, "GPE"),
            ("$1 billion", 44, 54, "MONEY"),
        ]
>       assert len(doc.ents) == 3
E       assert 0 == 3
E        +  where 0 = len(())
E        +    where () = Apple is looking at buying U.K. startup for $1 billion.ents

spacy-models\tests\lang\en\test_ner.py:14: AssertionError
_______________________________________________________________________________________ test_en_parser_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_parser_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        deps = ["nsubj", "aux", "ROOT", "prep", "pcomp", "compound", "dobj"]
        for token, expected_dep in zip(doc, deps):
>           assert token.dep_ == expected_dep
E           AssertionError: assert 'ROOT' == 'nsubj'
E             - ROOT
E             + nsubj

spacy-models\tests\lang\en\test_parser.py:17: AssertionError
_____________________________________________________________________ test_en_parser_corpus[masc-penn-treebank-sample.json-82-78] _____________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, test_file = 'masc-penn-treebank-sample.json', uas_threshold = 82, las_threshold = 78

    @pytest.mark.parametrize(
        "test_file,uas_threshold,las_threshold",
        [("masc-penn-treebank-sample.json", 82, 78)],
    )
    def test_en_parser_corpus(NLP, test_file, uas_threshold, las_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)
>       assert scorer.uas > uas_threshold
E       assert 23.855844362940523 > 82
E        +  where 23.855844362940523 = <spacy.scorer.Scorer object at 0x00000179977181D0>.uas

spacy-models\tests\lang\en\test_parser.py:31: AssertionError
______________________________________________________________________________________ test_en_parser_issue1207 _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_parser_issue1207(NLP):
        doc = NLP("Employees are recruiting talented staffers from overseas.")
>       assert [i.text for i in doc.noun_chunks] == ["Employees", "talented staffers"]
E       AssertionError: assert ['Employees',...om overseas.'] == ['Employees',...ted staffers']
E         At index 1 diff: 'are recruiting talented staffers from overseas.' != 'talented staffers'
E         Use -v to get the full diff

spacy-models\tests\lang\en\test_parser.py:90: AssertionError
______________________________________________________________________________________ test_en_tagger_tag_names _______________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_tagger_tag_names(NLP):
        doc = NLP("I ate pizzas with anchovies.", disable=["parser"])
        assert type(doc[2].pos) == int
        assert isinstance(doc[2].pos_, unicode_)
        assert isinstance(doc[2].dep_, unicode_)
>       assert doc[2].tag_ == "NNS"
E       AssertionError: assert 'NNP' == 'NNS'
E         - NNP
E         + NNS

spacy-models\tests\lang\en\test_tagger.py:25: AssertionError
_______________________________________________________________________________________ test_en_tagger_example ________________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_tagger_example(NLP):
        doc = NLP("Apple is looking at buying U.K. startup")
        pos = ["PROPN", "AUX", "VERB", "ADP", "VERB", "PROPN", "NOUN"]
        tags = ["NNP", "VBZ", "VBG", "IN", "VBG", "NNP", "NN"]
        for token, expected_pos in zip(doc, pos):
>           assert token.pos_ == expected_pos
E           AssertionError: assert 'VERB' == 'AUX'
E             - VERB
E             + AUX

spacy-models\tests\lang\en\test_tagger.py:33: AssertionError
____________________________________________________________________________ test_en_tagger_corpus[en_pud-ud-test.json-94] ____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, test_file = 'en_pud-ud-test.json', accuracy_threshold = 94

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 5.435398564412543 > 94
E        +  where 5.435398564412543 = <spacy.scorer.Scorer object at 0x00000179F23346D8>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
______________________________________________________________________ test_en_tagger_corpus[masc-penn-treebank-sample.json-89] _______________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, test_file = 'masc-penn-treebank-sample.json', accuracy_threshold = 89

    @pytest.mark.parametrize(
        "test_file,accuracy_threshold",
        [("en_pud-ud-test.json", 94), ("masc-penn-treebank-sample.json", 89)],
    )
    def test_en_tagger_corpus(NLP, test_file, accuracy_threshold):
        data_path = TEST_FILES_DIR / test_file
        if not data_path.exists():
            raise FileNotFoundError("Test corpus not found", data_path)
        corpus = GoldCorpus(data_path, data_path)
        dev_docs = list(corpus.dev_docs(NLP, gold_preproc=False))
        scorer = NLP.evaluate(dev_docs)

>       assert scorer.tags_acc > accuracy_threshold
E       assert 4.650887573964497 > 89
E        +  where 4.650887573964497 = <spacy.scorer.Scorer object at 0x00000179F253D940>.tags_acc

spacy-models\tests\lang\en\test_tagger.py:50: AssertionError
___________________________________________________________________________________ test_en_tagger_lemma_issue1305 ____________________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>

    def test_en_tagger_lemma_issue1305(NLP):
        """Test lemmatization of English VBZ."""
        assert NLP.vocab.morphology.lemmatizer("works", "verb") == ["work"]
        doc = NLP("This app works well")
>       assert doc[2].lemma_ == "work"
E       AssertionError: assert 'works' == 'work'
E         - works
E         ?     -
E         + work

spacy-models\tests\lang\en\test_tagger.py:139: AssertionError
_________________________________________________________________________ test_en_tagger_lemma_issue717[I'm happy-I am happy] _________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, text1 = "I'm happy", text2 = 'I am happy'

    @pytest.mark.parametrize(
        "text1,text2",
        [
            ("You're happy", "You are happy"),
            ("I'm happy", "I am happy"),
            ("he's happy", "he's happy"),
        ],
    )
    def test_en_tagger_lemma_issue717(NLP, text1, text2):
        """Test that contractions are assigned the correct lemma."""
        doc1 = NLP(text1)
        doc2 = NLP(text2)
>       assert doc1[1].lemma_ == doc2[1].lemma_
E       AssertionError: assert 'be' == 'am'
E         - be
E         + am

spacy-models\tests\lang\en\test_tagger.py:164: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[He is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, text = 'He is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
____________________________________________________________________________ test_en_tagger_lemma_issue686[he is the man] _____________________________________________________________________________

NLP = <spacy.lang.en.English object at 0x00000179F233A048>, text = 'he is the man'

    @pytest.mark.parametrize("text", ["He is the man", "he is the man"])
    def test_en_tagger_lemma_issue686(NLP, text):
        """Test that pronoun lemmas are assigned correctly."""
        tokens = NLP(text)
>       assert tokens[0].lemma_ == "-PRON-"
E       AssertionError: assert 'he' == '-PRON-'
E         - he
E         + -PRON-

spacy-models\tests\lang\en\test_tagger.py:185: AssertionError
========================================================================================== warnings summary ===========================================================================================
c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327
  c:\users\bramv\.virtualenvs\spactest-qxouqrzm\lib\site-packages\_pytest\mark\structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.requires - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================================== 12 failed, 44 passed, 58 skipped, 1 xfailed, 2 xpassed, 1 warning in 64.22s (0:01:04) ========================================================

BramVanroy on 4 Dec 2019

@adrianeboyd I included the whole output of failures in the posts above for completeness' sake. It seems that all versions of cupy that I tested (7.0.0rc1, 6.5.0, 6.0.0, 5.0.0) show this incorrect behaviour. I'm curious to know why this happens. That being said, cupy does not seem like a reliable cross-platform accelerator. In an ideal world, wouldn't working with tensors and field-tested frameworks such as Pytorch be a better fit? I understand that they might seem overkill, and I'm not suggesting that everything needs to be changed to another framework. Just, hypothetically, can pytorch (or tensorflow or what have you) replace cupy?

I imagine that this is hard to debug (since you don't have the hardware available) and possibly even harder to fix, so for now I think that a highlighted remark in the documentation is the best start. Perhaps even go so far as to make spacy[cuda] (and variants) unix only?

BramVanroy on 4 Dec 2019

Thanks for trying things out! We still haven't heard about this problem from anyone else, but I bet Windows + GPU is rare for spacy users, so who knows. It certainly can't hurt to add a warning.

I'm not sure whether spacy[cuda] can be restricted (I don't think so?) but maybe we can add some warnings there, too.

adrianeboyd on 6 Dec 2019

It would be nice to get confirmation that others can reproduce this issue... I only have one CUDA Windows machine available unfortunately. I did reinstall CUDA from scratch and used new venv's for all tests, but still reproducibility confirmation would be nice.

As stated before, I'm sure that using GPU is rare especially if you're not training and if you're not training. SpaCy is already so fast on CPU anyway. Still, it would be odd that after all this time no one has encountered this issue.

BramVanroy on 7 Dec 2019

https://github.com/explosion/spaCy/issues/4816 could be related - investigating further.

svlandeg on 17 Dec 2019

@BramVanroy : are you certain you verified the same behaviour on 2.1.8? Because https://github.com/explosion/spaCy/issues/4724#issue-529569581 seems to mention it didn't install properly because of issues with thinc_gpu_ops (which wouldn't surprise me because I ran into the same issues)

svlandeg on 17 Dec 2019

@svlandeg You are right (so I deleted my previous message). I tried it again now just to be sure. Installing works, though (using Python 3.6.8 now but same behaviour as before), loading the model works, but parsing doesn't (as mentioned in my topic that you linked):

Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> spacy.require_gpu()
True
>>> nlp = spacy.load('en')
>>> doc = nlp('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\spacy\language.py", line 402, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "pipes.pyx", line 392, in spacy.pipeline.pipes.Tagger.__call__
  File "pipes.pyx", line 411, in spacy.pipeline.pipes.Tagger.predict
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\model.py", line 169, in __call__
    return self.predict(x)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 40, in predict
    X = layer(X)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\model.py", line 169, in __call__
    return self.predict(x)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 310, in predict
    X = layer(layer.ops.flatten(seqs_in, pad=pad))
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\model.py", line 169, in __call__
    return self.predict(x)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 40, in predict
    X = layer(X)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\model.py", line 169, in __call__
    return self.predict(x)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\model.py", line 133, in predict
    y, _ = self.begin_update(X, drop=None)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 379, in uniqued_fwd
    Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "C:\Users\bramv\.virtualenvs\spactest-QxouQRzM\lib\site-packages\thinc\neural\_classes\hash_embed.py", line 59, in begin_update
    keys = self.ops.hash(ids, self.seed) % self.nV
  File "ops.pyx", line 967, in thinc.neural.ops.CupyOps.hash
AttributeError: module 'thinc_gpu_ops' has no attribute 'hash'

BramVanroy on 17 Dec 2019

Yea, thinc_gpu_ops has some problems installing on Windows - I run into the same. The good news is that https://github.com/explosion/thinc/pull/117 has resolved that - the bad news is that this may be causing the recent regressions on GPU. That's why I wanted to verify that you hadn't seen this behaviour on 2.1.8 (and you haven't - which is good in a sense). I'll continue further down the rabbit hole :-)

svlandeg on 17 Dec 2019

👍1

Good luck! By the way, does that mean that you can reproduce the strange behaviour that I'm seeing mentioned in the OP?

BramVanroy on 17 Dec 2019

Unfortunately, yes, though slightly different:

['The', 'decrease', 'in', '2008', 'primarily', 'relates', 'to', 'the', 'decrease', 'in', 'cash', 'and', 'cash', 'equivalents', '1', '.', '\n']
['PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'SPACE']
['ROOT', 'pobj', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', 'neg', '']

This is with en_core_web_sm. ~~With en_core_web_md for instance the parse looks perfectly ok.~~

svlandeg on 17 Dec 2019

👀1

@BramVanroy : Finally found and fixed the issue :-) cf. PR https://github.com/explosion/thinc/pull/149

Thanks for all your testing !

svlandeg on 30 Dec 2019

👍1

Looking great! Nice to see that Windows isn't completely left behind in the development world. Even though I run my heavy-duty stuff on our servers, I do a lot of testing at home so I love every bit of cross-platform compatibility that I see.

Thanks a lot for your work!

Closing this after the fix by @svlandeg over at https://github.com/explosion/thinc/pull/149

BramVanroy on 30 Dec 2019

🎉1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.