Spacy: Random AssertionError in thinc.neural.ops.Ops.unflatten

Created on 17 Apr 2019 · 10Comments · Source: explosion/spaCy

Hello,
i do not know what happen but i randomly get this error:

Traceback (most recent call last):
  File "/home/damiano/lavoro/python/parser/parser/nlp/ner/trainer/learning.py", line 65, in <module>
    losses=losses,
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy/language.py", line 452, in update
    proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
  File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
  File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
  File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/api.py", line 275, in begin_update
    return layer.ops.unflatten(X, lengths, pad=pad), finish_update
  File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError

my code is very simple:

nlp = spacy.load('/home/damiano/model')

if "ner" not in nlp.pipe_names:
    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner, last=True)
else:
    ner = nlp.get_pipe("ner")

# add labels
ner.add_label('NAME')
ner.add_label('SURNAME')
ner.add_label('ADDRESS')
ner.add_label('BIRTHDATE')
ner.add_label('CITIZENSHIP')
ner.add_label('CITY')
ner.add_label('DATE')
ner.add_label('EMAIL')
ner.add_label('PROFESSION')
ner.add_label('REGION')
ner.add_label('TELEPHONE')
ner.add_label('URL')
ner.add_label('ZIPCODE')

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

with nlp.disable_pipes(*other_pipes):  # only train NER
    # reset and initialize the weights randomly – but only if we're
    # training a new model
    nlp.begin_training()

    for itn in range(N_ITER):
        random.shuffle(TRAIN_DATA)
        losses = {}
        # batch up the examples using spaCy's minibatch
        batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))

        for batch in batches:

            texts, annotations = zip(*batch)

            nlp.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                drop=0.25,  # dropout - make it harder to memorise data
                losses=losses,
            )

it is basically the same code i found in the doc.

Your Environment

spaCy version 2.1.3
Location /home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy
Platform Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
Python version 3.6.7
Models it

bug 🔮 thinc

Source

damianoporta

Most helpful comment

I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.

Cortysus on 22 Jun 2019

👍2

All 10 comments

Hello, I can confirm this happen when the document has 0 characters.
Maybe should we skip them without the AssertionError?

damianoporta on 18 Apr 2019

I've just run into this issue also. The strange thing is that it doesn't happen for every zero-length document. My pipeline gets through about 2k zero-length sentences found within the longer iterable of texts before throwing this error. It happens at the same location in the list of sentences, but if I shuffle the list of texts, I get the assertion error at different places.

ned2 on 23 Apr 2019

I've experienced this error, too. I tend to agree with @damianoporta since I solved it by replacing null/NaN strings with 'nan' strings.

Anyhow, it is pretty interesting to notice that although I had null string all across the Series I streamed, the exception was throwed after having processed different null documents and while processing an arbitrary non-null document. Non sure what's going on under the hood!

rusiano on 25 Apr 2019

I've been facing the same issue when try to training a model in Portuguese for NER. The interesting thing is that the error seems to appear in differents interations of the training script as you can see above.

Loaded model 'pt_core_news_sm'
0
Losses {'ner': 6539.10400141068}
1
Losses {'ner': 6198.809629522756}
2
Losses {'ner': 6379.065996479177}
3
Traceback (most recent call last):
File "trainingLoopV2.py", line 3084, in
plac.call(main)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), *kwargs)
File "trainingLoopV2.py", line 3056, in main
losses=losses,
File "C:\Anaconda\lib\site-packages\spacy\language.py", line 452, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
File "C:\Anaconda\lib\site-packages\thinc\neural_classes\feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "C:\Anaconda\lib\site-packages\thinc\api.py", line 275, in begin_update
return layer.ops.unflatten(X, lengths, pad=pad), finish_update
File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError

This error did NOT appear when I was training to add a new entity type. This error just appears when I'm trying to update a model and get better accuracy with the NER.

gilvandroneto on 30 Apr 2019

I've got the same issue while processing the raw 20Newsgroups dataset.

gabrer on 9 May 2019

I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.

Cortysus on 22 Jun 2019

👍2

@Cortysus , I had similar case, batch_size=1 solved the problem for me.

bilalghanem on 5 Jul 2019

This should be fixed in Thinc v7.0.5: https://github.com/explosion/thinc/releases/tag/v7.0.5 Thanks for you patience 🙂

ines on 10 Jul 2019

The easiest way to reproduce the bug:

import spacy
NLP = spacy.load("en_core_web_sm")
for doc in NLP.pipe(['some text', '']):
    pass

srizvan on 19 Jul 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 18 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Compare operator (==) behaves unexpectedly on spacy tokens

ank-26 · 3Comments

💫 Finalise vector support and add vector specs to model meta

ines · 3Comments

Usage Examples return TypeError

besirkurtulmus · 3Comments

does char level features using charCNN are used for NER in spacy?

prashant334 · 3Comments

EntityLinker, pipes.pyx KeyError: '0_12' using sample code given in guides

curiousgeek0 · 3Comments